Military Application Resilience
Emulating real-world network conditions to verify applications & systems
In the context of AI networking and cluster interconnects, the distinction between an open system and a certified system is foundational to how AI infrastructures are built, scaled, and supported.
An open system uses standardised, interoperable technologies that allow components (servers, NICs, switches, and software stacks) from multiple vendors to coexist and communicate through open protocols and interfaces.
A certified system is a closed, vendor-qualified stack where every hardware and software component is tested, validated, and supported as one integrated system by a single vendor or consortium.
In an open AI networking system, every layer of the infrastructure, e.g. hardware or network stack, is designed for interoperability. The entire system is built on open standards, such as Ethernet, RoCEv2, and increasingly the Ultra Ethernet specifications, allowing components from different vendors to coexist and communicate without proprietary constraints.
Customer teams can modify system components such as firmware and drivers to optimise the end system behaviour for their use cases. The same teams are responsible to test, tune, and validate the end-to-end performance. Open systems therefore favour faster innovation and freedom, especially since customers avoid being locked into a single vendor’s roadmap.
Clearly, the advantage of open systems lies in their being interoperable and customisable. This gives engineering and research teams flexibility and choice to adapt their system to their own needs. The ecosystem evolves collectively, driven by continuous community input and shared progress across the industry.
Yet, this same openness comes with tradeoffs. There are no end-to-end system guarantees. There is no single vendor guaranteeing the performance or compatibility of the entire stack. The responsibility for validation, integration, and ongoing tuning falls entirely on the integrator or operator. It takes deep technical skills to ensure that a multi-vendor environment behaves as intended, at the required scale. When something fails, the resolution is not through a single support line but through engineering effort and cross-vendor coordination.
By contrast, a certified system offers a controlled, vendor-qualified environment, where every layer of the hardware and software stack has been tested and validated under defined workloads, to work altogether as one. The vendor owns the integration effort and to ensure that every NIC, switch, driver, and firmware revision has been qualified as part of a cohesive, end-to-end platform.
For operators, this means predictability: the system is guaranteed to perform within specified parameters, backed by SLA and vendor support. End deployments are smoother, maintenance cycles are more stable, and troubleshooting is faster because the responsibility for the entire stack sits with one provider.
The trade-off is reduction in flexibility. Certified systems tend to be closed and proprietary ecosystems. They evolve on the vendor’s schedule, and any customisation beyond the approved configurations can void certification or support. Innovation, in this model, follows the vendor’s roadmap rather than directly driven by end user demand.
Certified systems are ideal when reliability, determinism, and a single point of accountability matter more than architectural freedom, but they are less suited for organisations that want to experiment, optimise, or integrate rapidly evolving technologies from multiple sources. They are a less attractive option for hyperscalers, who seek optimisation in the detail and rapid iteration cycles.
Open systems deliver flexibility and innovation speed, but demand expertise and ownership. Certified systems deliver predictability and peace of mind for deployment and performance but constrain openness and further optimisation.
Here is a comparison of how different aspects are handled between open systems and certified systems in AI clusters:
| Aspect | Open system | Certified system |
| Environment | Uses open standards allowing equipment from different vendors to interoperate | Controlled, fixed combinations of hardware, firmware, and drivers, qualified together |
| Interconnect | Standard Ethernet, RoCEv2, UET | Vendor-qualified InfiniBand, RoCE stack |
| Performance Guarantee | None; user validated | Guaranteed by vendor certification and SLAs |
| Control | Full, customisable | Fixed configuration |
| Ecosystem | Multi-vendor, heterogeneous | Closed, cohesive |
| Evolution | Driven by open communities (UEC, OCP, SONiC, etc.)
Faster adoption of new standards |
Driven by vendor’s Roadmap and cost |
| Debug & Validation | Operator-led / integrator-leg | Vendor-led |
| Example Vendors | Broadcom, Arista, Meta, Intel (open UEC) | NVIDIA, HPE, Lenovo AI stacks |
Open systems
Pros:
Cons:
Certified systems
Pros:
Cons:
As AI clusters scale beyond anything traditional data centres were built for, their interconnects have become critical in determining performance, reliability, and innovation pace. The debate between open systems and certified systems is still shaping how clusters are built and who may ultimately control their future.
As things stand, both approaches represent distinct philosophies of control. Open systems favour adaptability and innovation velocity, while certified systems prioritise assurance and consistency. The choice depends less on technology itself and more on the organisational priorities, e.g. ownership vs dependency and whether it wants to build the system or just run it.
These two philosophies are not destined to remain separate. As Ethernet evolves to meet AI’s scale, the systems that will win are those that combine the freedom of open innovation with the discipline of certified, guaranteed reliability. The AI infrastructure landscape is already moving in this direction. Emerging standards like Ultra Ethernet bridge the two worlds, aiming to deliver open systems that behave with the predictability and assurance of certified environments.
Meanwhile, vendors are beginning to open their once-closed stacks allowing controlled interoperability within certification frameworks. Hyperscalers, for their part, push for open ecosystems that meet data centre-grade performance and reliability expectations.
The future lies in open systems that act like certified ones! Flexible, collaborative, and standards-based, yet consistent and trustworthy at scale. This is exactly where Calnex solutions excel!