Wednesday 11 November 2009

One line of trace

We're doing about 2,000 tiny (price-tick) transactions on our pair of quad-core boxes now. The CPU is still only at 60% utilisation, so there's still a bottle neck or two.

Say we eventually get up to 3,000 transactions at peak, then each transaction will be using around 300 microseconds of CPU.

Now it turns out that we have left one line of console trace output saying "I have written transaction X" which takes roughly 60 microseconds on a single core, or about 15 microseconds of aggregate horse-power on our quad-core box.

So the point is: one line of console trace will use about 5% of available CPU when we reach our target ... is it really worth it? I sense another configuration option crawling out of the woodwork ...

Tuesday 3 November 2009

Thousands of transactions per second

We've recently been performance testing CloudTran and we're now doing over 2,000 transactions per second. This is on a single-CPU, quad-core box running the transaction buffer and being bombarded from other machines over the network. "CatchingTicks" is the test, and the ticks are price changes on a market - Admittedly those 2,000 transactions are among the smallest we can do but the CPU runs them at less than 50% utilisation so we can expect much higher throughput in the future.


We spent ages stuck at around 900 transactions per second, looking at every part of the system ... until we realised we were using the wrong variant of the test. We assumed we were using the performance test mode of the test; in fact, we were using the "isolation" test, to check that transactions working on the same record/row observed isolation correctly. So we've tested isolation to death, inadvertently. And it wasn't all wasted time - when you look at code with suspicious eyes and a particular problem in mind, you realise there are angles you missed. Performance testing is certainly interesting for tech-heads. We've moved from fairly simple synchronisation techniques to Java's concurrency library and Atomic variables with the non-blocking synchronisation.


To do functional debugging, we use heavy tracing of anything that moves, but for performance we use 'OperationTimer' objects - timing the start and end of an operation. Analysing these logs is exciting - you have to be very careful that you're measuring what you think you are. Our life has been made much easier by having coordinated nanosecond timers, so that all the bombarder machines calculate their time relative to the TxB machine. Then, instead of looking at all the logs, we take a one-second slice of the OpTimer output (about 10,000 lines) and then sort on the time of the log. The important thing is we can then trace an individual transaction's path between the various boxes quite easily - the sorted trace is in global time order.


Unfortunately, we're now bumping up against Heisenberg's uncertainty principle - taking just the OpTimer measurements reduces speed by a factor of 3, which almost certainly changes the nature of the events ... timing is everything! So I suspect we'll end up using something like Dynatrace in the near future.


Another point to watch out for is the server's HotSpot optimisation - this affects the performance of the GigaSpaces GSCs. We normally run 50-100,000 transactions through the system before taking the timings - this indicates the bar for doing performance tuning is set very high on the server configuration of HotSpot.


Testing continues as we use the results to continually improve perfromance. And for now it's back to the logs.