Google’s Spanner – Holy Grail of DBs or not?

Google just made it’s internal DB called Spanner open to public via it’s cloud offerings couple of days ago, and it’s already being touted as somewhat of a game changer. But is it really?


CAP Theorem

So basically there is this term CAP, often referred to as the CAP Theorem, that is an acronym for Consistency, Availability and Partition Tolerance. Consistency refers to the idea that all data in every node and cluster should have the same value at a given point in time. Availability signals at 100% uptime for both read and write executions. And partition tolerance refers to whether the database continues to function correctly if communication between servers is interrupted for some reason. Now, CAP Theorem says you can have only two of the three, and must sacrifice the third. Basically, you can either have CA, CP or AP. But not all three simultaneously.

It’s always been about A

Now, the person who initially coined the CAP Theorem was Eric Brewer of Google. He just wrote an article yesterday on valentines (a true romantic) where he claims that it’s always been about A. That is that 100% availability has always been the most important of the trinity. You can live with outdated data, as long as some data, even if its not the most recent, returns successfully.

How Google Beat Time

In a truly distributed database, where you have data centers strewn across the world, having real time or near real time consistency has been an issue. The reason Spanner is making waves the last few days is basically due to the claim that Google has been able to somehow bend time. How have they done that? Basically by developing an advanced and sophisticated timekeeping mechanism. It uses GPS receivers and atomic clocks to keep its own track of time rather than depending on NTP. Google calls this TrueTime. A key factor in achieving this hyper accuracy is the fact that Spanner runs on Google’s private network. Google not only has a global footprint like no other company, but also runs and controls its own WAN.

RDBMS vs. NoSQL vs. Spanner

Typically relational databases (RDBMS) like SQL Server, Oracle, MySQL, etc. scale-up. That is you can throw more RAM and processing power at them. Problem is at one point, you reach a limit. NoSQL databases get around this by scaling-out i.e. adding more servers or nodes. Problem with that then becomes synchronization and consistency. So NoSQL databases like Cassandra have specialized replication algorithms where nodes send each other updates to keep data fresh and synchronized between updates. Well, Spanner basically brings the relational quality of RDBMS with the distributed architecture of the NoSQL database. In Brewer’s own words:

Spanner is Google’s highly available, global SQL database. It manages replicated data at great scale, both in terms of size of data and volume of transactions. It assigns globally consistent real-time timestamps to every datum written to it, and clients can do globally consistent reads across the entire database without locking.

But is it really? Is it?

Strictly speaking, no you cannot have 100% availability. What Spanner claims you can have though is near 100% availability, with near consistency, while operating over a wide area network. But that near may be just good enough. Google claims that Spanner offers five 9s availability meaning less than 1 in 10^5 calls. That is good enough for a lot of businesses.

Is Spanner the DB Holy Grail?

I think that remains to be seen. What will make a difference is that now that near CAP is possible, do companies really need it? If you are a multinational running global operations, are you going to be ok with other NoSQL choices like MongoDB and Cassandra or even running local scaled-up RDBMS that are cut up by regions and business units? Do business really need all three tenants of CAP, or is it just a cool bit of technology.

Further Readings
Inside Cloud Spanner and the CAP Theorem
Why Google’s Spanner Database Won’t Do As Well As Its Clone
Google Launches Cloud Spanner — A NewSQL Database For Enterprises
CAP Confusion: Problems with ‘partition tolerance’
Google’s Spanner – Holy Grail of DBs or not?

Microservices and Asynchronous Conversations

I have been interested in MicroServices architecture for a while, and came across this wonderful article on Given that I spend most of my work day at Starbucks hiding from my kids and single-handedly trying to bankrupt Starbucks with their unlimited coffee offer for gold members (Oh Behave!), this article hit close to home.

It presents a very interesting real world analogy on asynchronous conversation happening between services, in this case a customer, a cashier and a barista. Each of these could represent a service talking to each other in an asynchronous fashion. In a non-asynchronous approach, all three parties would be tied until the transaction was complete. This would definitely lead to longer wait times, reduced profits and an overall inefficient system.

Monolithic Synchronous Approach

Here are a few of the problems from an architectural perspective with this approach:

  • The customer and cashier services are both tied up for the entire duration of the complete transaction. The cashier service is unable to do anything else until the current customer’s request is completed.
  • The barista service is not being fully utilized and is waiting around doing nothing while customer and cashier services are busy doing business that has nothing to do with the barista service. Meanwhile, there may be other customers waiting in line. Note that adding more instances of the barista service would not help.
Microservices Asynchronous Approach

In comparison, imagine a microservices based architecture, where each of the services is communicating via an asynchronous messaging bus. Clearly, this is a more efficient system where each service is only tied up doing what it does best. It receives a request, carries out its function, and places a response back on the bus. Here are the clear advantages of this approach.

  • The customer service is not tied up for the entire duration of the transaction.
  • The cashier service is also not tied up for the entire duration of the transation, and further can service the next customer request.
  • The barista is not tied up waiting around while the cashier and the customer chat it up, and is able to work on the next task as soon as it comes in.

There is nothing monolithic vs. microservices that we have discussed here so far. So let’s see how that might look in comparison to each other if a lot of orders started coming in. Both architectures can handle this, but they way they do it is different.

Monolithic architecture can scale horizontally, which would mean we could deploy additional instances of our cafe application comprising of the cashier and barista libraries. A load balancer would then assign incoming requests to one of the many instances of our application. In most monolithic applications, you would have to deploy the entire application i.e. for each additional barista, there would also be an additional cashier. In case of a cafe, this would obviously be not great for business as you would have to pay the salary of an extra cashier. In an enterprise solution, this may mean deploying and maintaining additional resources and other possible overhead. Also, maybe a cashier’s average task length is 2 minutes, whereas it takes a barista 4 minutes to make a drink. What if a customer ordered 10 drinks and the cashier still takes only 2 minutes to take the order, but the barista will now take 20 minutes to make all the drinks.

Vertical Scaling in Microservices

This is where a microservices architecture would shine in comparison. Since the cashier and barista services would be developed and deployed individually, you can easily scale these services vertically. So you can have 10 instances of barista service running, while perhaps only two of the cashier service.

Decentralized Design Choices

The last point in favour of microservices in this case would be that since microservices are independently built, deployed and scaled, it is also possible to design the cashier and barista services differently. The cashier service may use a traditional relational database such as SQL Server or Oracle to store customer and order data. Whereas perhaps the barista service may benefit more from a NoSQL key pair type database where each drink id correlated to its ingredients and recipe. This flexibility in design choice of each service, is precisely the purpose of a microservice architecture.


The software architecture world seems to be moving towards microservices style of architecture for some of the reasons explored in this article. While it is definitely worthwhile to consider, there certainly may be applications, where a traditional 3-tier monolith application may be just as worthy a choice, if not preferable. One instance of this would be where the out-of-process cost of interservice communication in a microservices architecture may be too much to bear, and may be disregarded in favour of traditional in-process function calls inside interdependent libraries used in a monolithic application.

Further Readings
Starbucks Does Not Use Two-Phase Commit
The Art of Scale: Microservices, The Scale Cube and Load Balancing
Microservices and Asynchronous Conversations

Watson Analytics – Getting Started

Watson Analytics is a separate effort by IBM to bring Watson’s cognitive capabilities to analytics. It basically exposes Tableau like functionality except it is in the cloud and it uses IBM Watson in the back.

There are three subscription levels: Free, Plus and Professional. You can sign up for a free account that gives you 30 days of access to all of Plus and Professional level features. Once you have done that, you can import up to four of any of the sample data samples available. From there, you can follow one of the use cases available on the IBM Watson Analytics’ website.


Once you get a hang of things, and have some of your own data, you can go ahead and upload new data either from a flat csv file or from one of the several other data input sources available via Watson Analytics including twitter.


All of the functionality of Watson Analytics is distributed between the three Ds (Data, Discover and Display) that you can reach by clicking on the tabs available on the home page.


You would use the Data tab to import Watson Analytics sample data, or upload your own data or pull in from various data feeds. This is also where you would review and refine it imported data.

After you are done massaging and refining the data, you would move on over to the Discovery tab and this where most of the magic would happen. This is where the Watson cognitive services part of analytics starts to shine. For starters, you will be presented with a bunch of queries that Watson Analytics thinks would be important to you.


But what you would likely rather do is based on your data, type in some specific questions. For example, you may type in a question saying “How are Monthly charges related to Churn”. Immediately you are presented with some relevant queries. The image next to the presented queries specify what kind of visualization you are going to get.


You can then click on one of the presented queries, or even build your own, to start the visualization.


These visualizations are active and responsive. As you learn more, you can further drill down by filtering or create new visualizations. For example, we wanted to see the breakdown of InternetService and Contract, and filter the results by Churn column identifying the rows where the customers were lost. This led to the following visualization, that leads us to conclude that customers who had Fiber Optic and were on the month-to-month made up the largest section of the customers lost.


Now we may want to look at churn as a whole, so we can remove that filter for churn for lost customers, and just add it as a whole. This way we end up with a view for Lost and Retained Customers, split by Internet Service, and each segment showing customers that were on month-to-month, one year and two year contracts.


You may want to read this article on IBM detailing tips on asking questions and building queries for watson analytics.

Finally you can move on to the last Display tab to create dashboards using your findings.

To learn more, you can follow one of the use cases available on the IBM Watson Analytics’ website.


Watson Analytics – Getting Started

Microsoft Bot Framework – Building a Santa Bot

Ok, I’ll be the first to admit, I don’t like the idea of a Santa bot. I love holidays and the Christmas time, but my idea of a Santa is a big burly old man, with beard white as snow that one could lose a cookie or two inside. So when I was first approached by my colleague Sean with the idea of developing a Santa bot, I wasn’t so sure. But then it did mean I get to spend some time playing with IBM Watson and Microsoft Bot Framework. What came out of it was a series of articles that my colleague wrote here and here, and a fairly hip Santa Bot that you can chat with on skype here.


The Santa bot is a fully conversational bot I built using Microsoft Bot Framework, that understand natural language using and was deployed as an Azure app service accessible via various channels including Skype, Facebook messenger, and even SMS Text Messages (twilio). In the coming weeks, I may write a how-to as well as showcase some IBM Watson stuff. So stay tuned.

Microsoft Bot Framework – Building a Santa Bot

An Introduction to SAP Hybris Development

As of the time of this writing, Hybris holds their development knowledge locked down in a wiki available only to partners. Even if you have access to it, a great resource as it is, it is easy to sometime get lost in the breadth of information that is available to you.

After passing my SAP Hybris core and commerce developer certifications, I was inclined to jot down all my knowledge in an easy to follow overview of the Hybris development experience. You can find the complete article here.


An Introduction to SAP Hybris Development

Tutorial: SQL Server Integration Service

Here is another tutorial I wrote a while back on SQL Server Integration Services. It is a step by step tutorial on creating and deploying your first SSIS package that will leave you with a good understanding of not just a simple how-to, but also how to go about achieving more complex workflows in SSIS. Check out the complete article here.


Tutorial: SQL Server Integration Service

Developing with Commerce Server 2009

In 2010, I prepared a 3 day intensive training bootcamp targeted at developers using Microsoft Commerce Server. I presented these bootcamps in Stockholm, Sweden (I thought Ottawa was the coldest civilized city you could go to in February until I went to Stockholm in February), Dallas, Texas at the Microsoft Office and Seattle, Washington at the Microsoft Head Office. Following the bootcamps, I prepared a three part overview of the development experience around Microsoft Commerce Server. You can find the first, second and third part of this series linked here.

Developing with Commerce Server 2009