Monitoring Sync Performance in Complex Packet Networks – The Standards Perspective

5 mins read

Delivery of accurate timing and frequency synchronization is a fundamental prerequisite for the operation and performance of many applications in Telecoms, Data Centers, industrial automation, power utilities, multimedia and so on. Transporting an accurate timing signal over a packet-based network, however, is not without its challenges. For example, GNSS vulnerability or the use of heterogeneous transport technologies can acutely affect the delivery of time synchronization. In addition, there are often different parts of the network – such as the radio and the transport – that are administrated by different operators making it complicated to assess end-to end performance; as an example, erroneous configurations or changes in the network by one operator can inadvertently cause significant asymmetries in the network’s timing to the detriment of the others’ parts of the network. In short, it is becoming more and more critical to have network-wide monitoring of synchronization to ensure applications and critical services function optimally.

Visibility of the network’s behaviour and performance has always been of major importance to Network Operators as it gives them the ability to minimize service impact, or at the very least have faster fault localization in the event of sync issues. In an ideal world they would monitor synchronization at all their sites with local test equipment containing a GNSS receiver to act as a reference. But simple economics would conclude this is only feasible at a limited number of sites.

So, what is the alternative? Under discussion is to use various network elements like routers and switches, plus Boundary Clocks, to collect performance monitoring data. These devices can perform a range of relative time measurements, either versus a backup reference or using local clocks. Some Network Operators and network administrators currently do this by means of proprietary network management systems, but this results in every vendor collecting data to their own specific set of parameters. And let’s not forget that networks can have many points to monitor, ranging from thousands to hundreds of thousands, and in the case of Data Centers potentially millions.

Standardization bodies such as IEEE and ITU-T have working over the last few years to provide some standard methodology to collect data from the network to ensure it is homogeneous and consistent. In particular, IEEE 1588, 2019 Annex J – which is now being adopted by the ITU-T and is referred to as Annex F in PTP G.8275 – provides a list of performance monitoring parameters that should be collected. These include statistics calculated over measured delays and time offsets, and basic statistics such as a minimum, maximum, average and standard deviation of the delays between a transmitter and receiver.

Both Annex J and F call for the sync data to be summarized in measurement windows either at 15-minute and/or 3-minute intervals, and the summary data to be stored for the previous 24-hour and/or 1 hour period respectively. This data makes it possible to detect changes in the network, monitor trends, and eventually determine the root cause of issues in the network. One of the main issues of collecting data this way is that it is not always a direct measure of Time Error in the network; instead, performance and the location of an issue must be inferred from the data. Clearly there is a need to filter out the data and to perform an analysis, but this may need to be carried out over a large number of nodes and that is not always practical.

To understand how to make use of performance monitoring data, some exploratory tests were performed by Calnex on a small network with routers and a base station. Using a Paragon-x to inject disturbance into the network (by adding asymmetry or a ‘noise’) resulted in a change in the time error at the output of the base station. An interesting point was that by looking at the performance monitoring data, it was possible to trace back to the exact point in the network where the issues were occurring, and with careful evaluation of the data, it was also possible to detect the actual change of asymmetry. In other words, this data made it possible to identify – in real time – the point at which the network was impacting the performance of the end application.

In addition to the PTP parameters defined in Annex F of ITU-T G.8275, there are other important parameters such as the identity of the Grandmaster Clock that can be collected to provide a detailed picture of the network’s behaviour and topology. Recently, the ITU-T has also standardized parameters for monitoring synchronous Ethernet so that the synchronization distributor and the physical layer can be monitored too; this has been covered in the revised ITU-T G.781 with a new Annex B.

In conclusion, an effective solution for monitoring the performance of networks and its synchronization is key to a robust and resilient network. It’s something that Network Operators are asking for more and more, and the Standards are fundamental to getting a consistent analysis from the data that is collected, along with an effective way to filter and analyze maybe hundreds of thousands of clocks to prevent network problems.