We've recently been performance testing CloudTran and we're now doing over 2,000 transactions per second. This is on a single-CPU, quad-core box running the transaction buffer and being bombarded from other machines over the network.
We spent ages stuck at around 900 transactions per second, looking at every part of the system ... until we realised we were using the wrong variant of the test. We assumed we were using the performance test mode of the test; in fact, we were using the "isolation" test, to check that transactions working on the same record/row observed isolation correctly. So we've tested isolation to death, inadvertently. And it wasn't all wasted time - when you look at code with suspicious eyes and a particular problem in mind, you realise there are angles you missed. Performance testing is certainly interesting for tech-heads. We've moved from fairly simple synchronisation techniques to Java's concurrency library and Atomic variables with the non-blocking synchronisation.
To do functional debugging, we use heavy tracing of anything that moves, but for performance we use 'OperationTimer' objects - timing the start and end of an operation. Analysing these logs is exciting - you have to be very careful that you're measuring what you think you are. Our life has been made much easier by having coordinated nanosecond timers, so that all the bombarder machines calculate their time relative to the TxB machine. Then, instead of looking at all the logs, we take a one-second slice of the OpTimer output (about 10,000 lines) and then sort on the time of the log. The important thing is we can then trace an individual transaction's path between the various boxes quite easily - the sorted trace is in global time order.
Unfortunately, we're now bumping up against Heisenberg's uncertainty principle - taking just the OpTimer measurements reduces speed by a factor of 3, which almost certainly changes the nature of the events ... timing is everything! So I suspect we'll end up using something like Dynatrace in the near future.
Another point to watch out for is the server's HotSpot optimisation - this affects the performance of the GigaSpaces GSCs. We normally run 50-100,000 transactions through the system before taking the timings - this indicates the bar for doing performance tuning is set very high on the server configuration of HotSpot.
Testing continues as we use the results to continually improve perfromance. And for now it's back to the logs.
No comments:
Post a Comment