More than a decade ago, European researchers invented highly accurate network time distribution protocols for CERN. They are now adopted by industries including financial services.
“Look at my fingers,” says John Fischer, vice-president of Advanced Research at Orolia, holding his hands about 30 centimetres apart. “That is a nanosecond. That’s how far light travels in a billionth of a second. That gives you an idea of what it means to say that I need highly accurate network time distribution – accurate to within in a nanosecond.”
Accuracy in the context of time distribution refers to the degree to which the clocks on the connected computers and sensors agree to what time it is. Some of the Big Science projects now need even higher accuracies – picoseconds, which are trillionths of a second. Think of the Large Hadron Collider at CERN in Switzerland, one of the most famous Big Science projects in the world.
Some of the particles measured at the Large Hadron Collider exist for about a nanosecond, and they need to be measured by several sensors during their very short lifetimes. To create the particles, events need to be triggered with very precise timing across the different devices. Then to study those particles, measurements from different sensors have to be correlated along a highly precise timeline.
CERN is where the Higgs Boson was discovered in 2012, and CERN is where the World Wide Web began in the 1980s, with the invention of HTTP and HTML, originally developed to help physicists share scientific articles. A lesser-known breakthrough occurred at around 2010. Because CERN required highly accurate network timing protocols, they inspired innovation that can now be used in other scientific projects, in military and space applications, and in finance.
“We think of time as just being an instant,” says Fischer. “But with distributed processing we have to think about synchronising time over a distance, so that all nodes agree on what time it is. This whole concept of measuring with highly accurate time distribution started with the Large Hadron Collider in CERN.”
Part of that technology developed for CERN became an open standard. This open standard, which provided accuracy to within a few nanoseconds, was called White Rabbit. Spanish startup Seven Solutions worked under a grant from the Spanish government to design the White Rabbit switch for CERN. The developers at Seven Solutions went on to make proprietary improvements to the White Rabbit protocol and brought the new technology to industry. Seven Solutions was recently bought by Orolia, a US-based company that markets positioning, navigation and timing solutions – or PNT solutions, as they are called in the business.
Datacentres and financial trading
At Orolia, Fischer applies PNT primarily to government, military and aerospace, but also to industry in general. Two examples of where highly accurate network time protocols are now needed in industry are datacentres and financial trading.
The emergence of datacentres has driven some of the requirements for highly accurate network time distribution. Some 10 years ago, everybody started to move applications and data to the cloud. Huge datacentres consist of thousands of computers that need be to be synchronised. Processing power has been increasing exponentially since the 1960s, so a lot more can happen in a computer within a nanosecond.
Supercomputers regularly perform in petaflops, which is a million floating point operations per nanosecond. Even an average server in an average datacentre performs thousands of operations per nanosecond. When computers run distributed algorithms, they need to work in lock step, which requires a high degree of synchronisation.
Financial trading systems require a high degree of synchronisation to ensure fairness. Since a single trade can change the price of a stock, which then effects subsequent trades, it’s essential to ensure timing. The same holds for inter-bank trading and currency exchanges.
Achieving nanosecond accuracy
“In the early days of the internet, engineers came up with the network time protocol [NTP], which was accurate to a millisecond, which is a thousandth of a second,” says Fischer. “That’s what our PCs used to update their clocks over a fixed-line network. We’ve improved on that since.”
“Doing time distribution over a network became really attractive around the year 2000,” says Fischer. “Engineers came up with a new timing protocol. This was IEEE 1588, or what we call precision time protocol – PTP. The idea is that you send these packets back and forth over an Ethernet network and you measure the time delay. With that, you could get down into microsecond level precision, so a millionth of a second.”
But even that wasn’t enough. Around 15 years ago, scientists needed more accurate time distribution protocols. One way to get more accurate time is to run coaxial cables to connect all nodes. But this is not practical in places like CERN, where the networks extend for kilometres and thousands of nodes need to be connected.
During that period, a new concept came out of the telecom community called Synchronous Ethernet. It carried not only time information, but also the frequency, which made it more accurate. Then the researchers at CERN, including the people from Seven Solution, perfected this idea, with loop backs to do automatic calibration. They were able to get down to nanosecond accuracy for the CERN Super Collider.
White Rabbit was based on Synchronous Ethernet – and Seven Solutions’ proprietary tweaks improved on White Rabbit.
Achieving sub-nanosecond accuracy
Fischer raised his hands again, this time placing them about three centimetres apart. “That’s what 100 picoseconds looks like. That’s how far light travels in 100 picoseconds.”
It wasn’t long before CERN found out they couldn’t do certain experiments if the computers and sensors weren’t synchronised to within a sub-nanosecond level. The engineers working for CERN eventually achieved accuracy to within a hundred picosecond, which is .1 nanosecond.
Seven Solutions also made improvements. Fischer says that to his knowledge, Seven Solutions, now a part of Orolia, has achieved the most accurate network time distribution in the world.
“There are lots of ways of distributing time,” says Fischer. “When you have long distances and lots of different things that want to know the time, a network is the most efficient way. It’s impractical to run wires over long distances and connect all of the different nodes that need to be synchronised.”
But there are some challenges to synchronising time over a network. If you’re trying to send time information over a wire, the time it takes the information to reach the destination is predictable. If you send it over a network, the data goes into packets, and is then passed through switches and routers, which cause delays. If the delay is deterministic, an algorithm can easily compensate. But networks are rarely deterministic to the degree required. Bandwidth, throughput and latency vary based on how much traffic is on the network at the time.
“Most of the time, distribution protocols that work on a network use a kind of network packet interchange,” says Javier Diaz, who helped CERN improve White Rabbit as part of his research work for the University of Granada. Diaz eventually joined Seven Solutions and went on to become its CEO.
“One node sends a packet to another, which then sends the packet back,” says Diaz. “The protocol measures the sending and reception times. If everything goes well, you can just say half of the total time in going back and forth is the propagation time and you can use that value to synchronise the two nodes. This is typically the approach used in standard time distribution protocols.”
“In the past, people would improve on the standard protocols using ad-hoc solutions, based on cables. If you run a coaxial cable to send special signals among the nodes, you need to calibrate to the length of the cable. This was time consuming, it wasn’t scalable, and it was prone to error. The better approach was to define new standards.
“To improve on the existing standards, we needed to first solve some problems,” he says. “First is that the propagation path may not be equal in both directions. The asymmetry may be a source of error. Another problem is that two different devices might have different oscillators. In theory, they are operating on the same frequency. But in practice, there is a small shift in frequencies. This small shift might introduce a nanosecond bias, which is not a huge problem for most applications. But it won’t allow you to achieve sub-nanosecond accuracy.
“Once we solve the asymmetry problem and the problem of having slightly different frequencies, we needed to measure the propagation time with high accuracy,” says Diaz. “To this we provided a highly accurate time stamp in the packet. Previously, you could get six to eight nanoseconds of accuracy, but this is not good enough. In the standard protocols there is some processing above the network cards, which introduced processing delay, and further inaccuracy. Another problem was that standard physical layers offer ‘best effort’, so the propagation delay was not always the same.
“You need to put the time stamping as close as possible to the network card, so there is no processing delay – and you need to modify the physical layer so that it is deterministic,” he concludes.
How highly accurate network time distribution protocols are used today
With the growing sophistication of scientific research comes a growing need for highly accurate timing across computers and sensors on a network. Particle accelerators need to push particles to nearly the speed of light. Then they need to perform measurements from distributed sensors.
To get the particles to such a high speed, different devices need to be triggered to perform actions on the particles with very precise timing. To measure the particles, distributed sensors need to timestamp their measurements using high-synchronised clocks.
Astronomers also need highly accurate timing. The most powerful telescopes in the world use distributed antenna. A hundred antenna dishes may be spread out over a kilometre. They need to be moved with very precise timing to point towards radio signals from distant galaxies. Then the measurements from the dishes need to be correlated, again with very high precision.
Science is important, and it will continue to drive innovation. But in the end, the biggest use of this innovation may wind up being finance. Who gets what information when – and the order in which trades are placed – makes all the difference in the world.
Trading applications are often powered by supercomputers, so things happen very fast in finance these days. They need synchronisation more than ever, which makes financial services a big market for the latest timing protocols.