Thursday, 15 October 2009

Herding Cats

The almost universal response to the problem of saving data across a grid is not to solve the obvious problem - speed up the transactions! Instead, developers either build a new type of datastore - BigTable/Hadoop, SimpleDB, CouchDB - or 'roll their own'. New data approaches will require a lot of investment (new skills, new integration approaches) and waste the existing investment in management information and analytic engines that feed off databases. (The reaction against this attitude is well documented: the-rdbms-is-doomed-yada-yada-yada. Roll your own is high risk and hard to maintain.

What I couldn't understand when I started looking at this area was - why doesn't someone fix the darn problem for databases as they are now? I suspect the answer is that it's the old "herding cats" problem... Imagine a poor developer with the business brief:

There is a room containing an arbitrary number of cats with numbers on their backs. You will be given a list of lines with a cat number, a word to write on the cats' message pad and a door number. For each line on your list,
  • catch the correct cat
  • cross out the word currently on kitty's message pad, and write the requested word
  • catch another cat, write a copy of the word on his message pad,
  • herd the cats through the door indicated on the list, making sure they go through in the same order as on the list
  • if a cat falls asleep or runs off, catch a spare cat and use it.
You get the idea - herding cats in a distributed system, in the face of failure or lost messages etc. is tough.

Distributed transaction systems, within their design parameters, have handled the problem of providing the ACID features in the face of
  • failures and ...
  • the impossibility of two machines knowing the same thing at the same time (... whatever that is in a distributed environment).
Now cloud computing has added :
  • Demand for more performance - we're all so impatient now
  • An increase in maximum size of system
  • Scalability (the requirement that the application performance should scale linearly with the number of machines)
  • The wide range of the design space - be efficient with one or two machines up to 1,000s or 10,000s
  • More design options - do we fix a problem at the database, in the network, in a container ... etc.
This extra set of requirements is what has made this design problem so difficult to design for. In the next blog, I will start talking more about the solution we have built in CloudTran.

No comments:

Post a Comment