CTO interview: Budgeting in nanoseconds


Andrew Phillips has worked at broker LMAX for 17 years and is currently the company’s chief technology officer (CTO). His budget is measured in milliseconds.

Even though the end-to-end latency LMAX guarantees for high frequency trades is 50ms, the maximum time any application is allowed to run is eight nanoseconds. When the budget is measured in nanoseconds, even the smallest potential delay has a negative impact.

“Our customers are the big trading houses and they want something that runs very quickly and very deterministically,” Phillips says.

“We’ll test various Twinax cables and use direct connect rather than using optical fibre, because there is a small but measurable number of nanoseconds taken to turn an electrical signal into light, transmit it down a fibre, and then turn it back into an electrical signal.”

A different approach to software performance

When the company began building its platform in 2010, Phillips says it was common practice to use a waterfall methodology and develop the software using the C++ programming language. However, LMAX’s exchange is built using an agile methodology and is programmed in Java.

“Using agile techniques and Java, with its immensely rich testing ecosystem, was considered pretty weird,” he says.

Unlike C++, where application code is first compiled to a machine code program that then runs directly on microprocessor (processor), Java is a programming language that uses runtime compilation. This means the Java code is compiled “on the fly” as the program runs.

Java also offers built-in memory management called “garbage collection”, which can impact a program’s performance, as Phillips explains: “We have quite a lot of awkward questions from potential customers about latency spikes and garbage collection.

“We started with the standard JDK in 2013. What we realised was that Java had a heritage – it was designed for set top boxes and was happiest when you gave it about 4GB of memory.”

According to Phillips, beyond 4GB of system memory meant that garbage collection times became unstable from a latency perspective. “We didn’t want to trade the expressiveness or speed of writing in Java, as well as the testing ecosystem, which has really underpinned a lot of our success.”

The company has been using the Java platform from Azul system to get around the memory limitations of the standard Java environment.“At the time, if I had a server with 64 Gb of memory, avoiding garbage collection was essential,” adds Phillips.

Azul, he says, only garbage collects as a last resort. “This is great for us because we were driving our exchange latency, at that time, down from a millisecond towards where we are now, which is 50 microseconds.”

And within that small, 50-microsecond window, an awful lot happens. “50 microseconds is the time it takes from an order being submitted at the edge of our network, to being processed, matched and then the acknowledgement going back out,” Phillips adds.

“Short of being a professional compiler developer, I defy even extremely good programmers to do as well as a compiler in terms of optimisation”

Andrew Phillips, LMAX

With this 50-microsecond window, the Java code has just eight nanoseconds to run as the majority of latency occurs as the transaction passes over the network infrastructure to the server. Phillips believes Java, as a programming language, is better at optimising code compared to someone hand-coding for high performance.

“I come from a background of C, C++ and Fortran, and you tend to escape to assembler language to make things run quicker. [Java] is a bit counterintuitive,” he says.

According to Phillips, modern microprocessors are so fantastically complicated that if a developer chooses to write something in C or C++, the code is optimised only for the “target” processor architecture that the developer configured in the C or C++ compiler tool.

“One of the advantages of running Java is that it is optimised for the processor that you’re running on,” he says. “That can be quite important. Short of being a professional compiler developer, I defy even extremely good programmers to do as well as a compiler in terms of optimisation.”

Typically, a C++ programmer would use the C++ compiler on their development machine to compile the application, using the server processor as the target architecture. This is effectively then fixed, and the application is only optimised for that particular server processor.

But, as Phillips points out, the development and testing environments may be several generations of processor architecture behind the production servers. The staging environments, where code is moved before going into production, are also likely to run older generations of server processors.

Java is able to optimise code at runtime, and so take advantage of any code acceleration features available to it on the target hardware the code is running on.

“Being a bit of a disbeliever, I was quite a bit sceptical that Java could do this,” Phillips says. “I was converted after I had a coding competition between a skilled Java programmer and myself, writing in C and assembly language. I just couldn’t beat the speed of the Java program.”

When asked about the biggest challenge he faces, Phillps says: “The biggest thing that slows me down in Java is being able to access a lot of memory with very low deterministic latencies. This is one of our key engineering challenges. The biggest problem I have right now is probably memory latency.” 

Low latency technology breakthroughs

Going forward, Phillips says he is impressed by the opportunities in hardware latency that CXL technology promises. Compute Express Link (CXL) enables direct memory connectivity between different bits of hardware.

“CXL has enormous potential to completely change what we do because the difference between the memory bus, the peripheral bus and the network layer start to blur into one,” he says.

However, while CXL was touted as a technology that would change computing hardware architecture within a matter of months, it has yet to gain traction. Drawing an analogy between CXL and fusion power, Phillips adds: “It’s always 10 years. The idea of doing remote procedure calls over a CXL fabric is very attractive.”

For Phillips, CXL offers a way to get around the overhead of traditional network protocols. “Everything runs on UDP IP or TCP IP, which are [networking] protocols designed back in the 1960s and early 70s when a dial-up modem [connectivity] was state-of-the-art.”

While he acknowledges the “fantastic engineering effort” that has allowed these protocols to evolve to where they are now with 25 gigabit ethernet, Phillips says: “It would just be nice and make things a lot quicker if we didn’t have the overhead of IP encapsulation.”

This continued work of exploring the art of the possible helps LMAX process trading transactions with as little latency as the laws of physics will allow, gives the company a buffer, and enables it to handle unexpectedly high throughput.

For instance, recalling the huge amount of volatility in the cryptocurrency marketplace that occurred last year that saw crypto exchanges go down, Phillips says: “We didn’t go down. In fact, we saw a massive spike in volume as people were offloading risks on our exchange to each other.”

Even though the trading volume was nowhere near the maximum LMAX could handle, he says the company was able to handle trading for the total crypto market and – according to Phillips – there was plenty of headroom.



Source link