When it comes to Service Providers, there is often a longer lifecycle with their platform components. One of the key things is the ability to perform upgrades in a supported fashion as well as a clear understanding of the the software provider when they release their software. The challenge with Open Source software is to find the proper support organization, with long enough support cycles (4-5 years at least), that thoroughly plans and tests of every release with backwards compatibility in mind.
With this purpose, the installation tool needs to encompass the update and upgrade use-cases. One of the best management tools for Openstack lifecycle is the one that uses Openstack itself (project TripleO). The tool is used in all stages from day 0 to day 2, and is being improved to extend the APIs that control the installation, upgrade and rollback. It allows controlled addition of compute or storage nodes, updating them and decommissioning them. Although the road for perfect manageability is long, we’re helping the Upstream projects (like OPNFV) to learn from Telco operators and continuously improving the tools for a seamless experience. At the same time, Service Providers can leverage additional management tools like Cloudforms and Satellite, for a scalable combo of policies, configuration management, auditing and general management of their NFV Infrastructure.
One of the most recent improvements have been the addition of bare metal tools and out-of-band management for Openstack (Ironic and it’s growing list of supported devices, like IPMI). This allows treating physical servers as another pool of resources that can be enabled or disabled on demand. The possibilities in terms of energy savings (green computing) may as well justify the investment in an Openstack-based NFVI, due to the elasticity that such solution offers.
Operators should analyze the core components of their VNFs and the interaction with the virtualized infrastructure. Features like host-level live kernel upgrades with Kpatch may not work if the VNFs are not supported guests. This means ensuring the VNFs use the latest QEMU drivers, with a modern and supported kernel adapted to the recent x86 extensions (VT-x, VT-d). The drivers in the host level are equally important, as they may limit the operations available to update or modify settings of the platform.
FInally, one piece of advice when choosing the components of the NFVI control plane. Ideally, the control plane should offer a protection level equal or better than the systems they are controlling. Most vendors have chosen a simple clustering technology (keepalived) that is popular in the Upstream Openstack project, as it fits most Enterprise needs. At Red Hat, being experts in mission-critical environments, we chose Pacemaker (although keepalived is also supported), because its advanced quorum and fencing capabilities increase significantly the uptime of the control plane elements.. Pacemaker is an opensource project and it can be found here http://clusterlabs.org/ . The resulting architecture of an Openstack control plane with Pacemaker underneath permits automatic fault monitoring and recovering of the foundational services that compose the NFVI layer..