Wednesday, 3 November 2010

The exciting thing about the cloud - for application architects - is the jump in scale: some applications now serve millions of online users in fractions of a second with information refined from huge amounts of data.

This category of application has big data, heavy processing and high transactions-per-second. At a certain level, the cheapest and best solution is an in-memory architecture (see RAMClouds), which explains why every industry player is betting on some sort of “in-memory database/data grid” product - the industry trends point inexorably to this solution for high-value web sites.

What's exciting is that the scale, reach and low cost of this jump in scale will make a whole new raft of applications commercially and technically viable.

The Problem

However, the problem with cloud architectures for mission-critical applications revolves around ACID transactions - specifically, the lack of them. Brilliant engineers have tried to use existing techniques such as distributed transactions (e.g., XA) to provide scalable applications with ACID support. This effort has failed because distributed transactions are too slow and unreliable.

The CloudTran Approach

The CloudTran solution is to unbundle transaction management from the data stores (i.e., databases, document stores).

This echoes the approach of Unbundling Transaction Services in the Cloud - which proposed an approach to scaling cloud databases, by unbundling the transaction management from the data storage functions.

The first change resulting from our unbundling is a central transaction coordinator that sits between the in-memory data grid and the data stores. The coordinator can handle changes from any number of nodes in the grid and can send data to any number of stores. So the new layering is

  • client (e.g. servlet, REST)
  • in-memory data + (sharded) processing (also interacting with messaging)
  • Transaction Coordinator
  • data stores - databases, Hadoop etc.

The second change made by CloudTran is to distribute transaction management - particularly constraint and isolation handling - to achieve maximum performance. The client, the “ORM” (object relational mapper), the in-memory nodes, and the transaction coordinator all handle different aspects of transaction management . This allows even an entry-level configuration to handle thousands of update transactions per second.

The path to the data stores is supremely important for durability of course, and, secondarily for links into data warehousing and other BI that feed off the database. With CloudTran, however, the performance requirements of the data stores changes and gives opportunities:

  • data affinity and foreign-key constraints are handled in the grid, so tables can, for example, be sent to one physical database each, which may avoid complicated sharding
  • latency is no longer important, which means the application front-end can be in the cloud and the durable data in the data center, reducing security concerns

Being Upfront

Successful companies in the future will need faster web sites that calculate more refined intelligence from deeper analysis of personal preferences and social trends. To achieve this, live data will need to move to the front - alongside the processing of services and events - rather than stored in a separate database tier.

The upside of the move is increased competitiveness and the ability to serve a global customer base. The challenge is the uncertainty as new tools are adopted, and risks in strategy and execution. CloudTran's strong, scalable transactionality linking to standard databases gives architects and developers a familiar reference point.