Tuesday 27 October 2009

The Joy of Scalability

The joy of scalability is a many faceted thing ... such as, losing confidence in the trusty tools in your software library.


I stumbled over this recently on Java's DelayQueue, part of the concurrent library. When I saw the compareTo() method on our Transaction Control Blocks being called a lot by DelayQueue it made me suspicious that this would be a performance bottleneck.


We use the DelayQueues of transactions to time out or retry internal operations, so it's crucial that the take() method observes timeout order. However, we also need a fast remove() method based on the object, because normally our transactions won't time out - and we must get them off the DelayQueue as fast as possible. Because items typically won't have the same timeout, the insertions into the queue will be somewhat random. In summary, we need :

  • fast adds (random)
  • fast removes (random)
  • takes in timeout order, blocking

Our target for the transaction manager node is 10,000 transactions a second. If the average transaction is in the delay queue for 1 second then our DelayQueue will be 10,000 items long. And if the primary transaction manager fails over to the secondary and that takes a while, then the DelayQueue could easily have 50,000 items in it.


To figure out how scalable DelayQueue is, I tried reading the code ... but it's complicated enough to need a test.

Our tester is single-threaded and loads up the queue with, say, 10,000 entries and then starts removing 10% (1000) and adding 10% of the entries using remove() and put(). The timings on my Dell Latitude E6500 (T9600 @ 2.8GHz) to do one operation were:


Queue size

10,000

50,000

100,000

Add

300 nanos (.3 micros)

360 nanos

400 (or 900) nanos

Remove

33.5 micros

155 micros

340 micros


These numbers suggest that add is pretty close to stable as the queue size increases, but remove looks like it is doing a linear search for the exact object (it doesn't do any 'compareTo()' calls to try to zero in on the object to remove).


The bottom line for us is that this is good enough this year. At steady state, removing from the delay queue will only account for about 8.5 microseconds per second on a quad-core i7 machine (each core is about the same power as the T9600 core). However, if we start getting 25 or 50 cores on a machine, then the DelayQueue will start to become a bottleneck.


So I think DelayQueue can go back in the trusted tool box for now - a tribute to the developers of the Java Concurrent library.

No comments:

Post a Comment