Service Function Chaining explained (Guest Post)

This post is courtesy of Elaheh T. Jahromi, PhD from Concordia and R&D intern at Ericsson Montreal. You can reach her on Linkedin). Thank you!!

Service Function Chaining (SFC – see RFC 7498) is a defined ordered set of network service functions (such as NAT, Firewalls, Load Balancer,…) to realize a high level/end-to-end service. In other words, SFC identifies which service functions a flow of packets should go through from its source to destination.what-is-side-chaining

In conventional networks, Network Services are “statically deployed/chained” in the network. “Statically deployed/chained” stems from the fact that they are hardly coupled to dedicated/proprietary hardware and underlying network topology, thus resulting in rigidity and limitations for network service providers

The substantial reliance of NFs to their underlying HW, brings about huge CAPEX for deployments, as well as OPEX, in view of the fact that configuration of the NFs requires labor-intensive manual effort. On top of that, due to time-consuming process of adding/removing NFs, these NFs should be over-provisioned for peak hours that leads to under-utilization of expensive resources in off-peak times.

NFV and SDN are complementary technologies, emerged to deal with above-mentioned issues.

NFV aims at decoupling the network functions from the underlying hardware, and eventually defining the NFs as stand-alone piece of software entitled as Virtual Network Functionalities (VNF), that could be rapidly deployed and run on top of standard and general hardware, anywhere in the network.

SDN realizes the dynamic SFC, by decoupling the control and data plane in network. Using SDN, the routing devices in the network are programmed on the fly to enable a specific function chain for a specific traffic, in a way that eventually the traffic is steered through different pre-deployed NFs. For example if Service A requires NF X,Y and Z, an SDN controller will populate the forwarding tables of routing devices in a way that the packets belonging to the service A traverse first to service X, then Y and Z. A concrete example of SDN and dynamic SFC usage is illustrated in [1] which offers differentiated QoS for privilege traffics while regular traffics are steered through a best effort service. The following illustration from the Cisco Blog shows how their vSwitch implements SFC:

service_chain-550x193

OpenDayLight Project, is an open source SDN project hosted by The Linux Foundation. This project was announced on 2013 and followed by another collaborative project entitled as OPNFV (Open Platform for NFV) in September 2014. In short, OPNFV is an open source project to realize VNF deployment, orchestration, management and chaining using a set of upstream open projects. As an example OPNFV leverages Open Stack as an IaaS solution for supporting the required infrastructure for VNFs to be deployed and run. ODL is also integrated in OPNFV to enable the dynamic chaining of VNFs.

 

[1] Callegati, Franco, et al. “Dynamic chaining of Virtual Network Functions in cloud-based edge networks.” Network Softwarization (NetSoft), 2015 1st IEEE Conference on. IEEE, 2015.

Advertisements

Boosting the NFV datapath with RHEL OpenStack Platform

Instead of explaning SR-IOV and DPDK by myself, please visit Nir Yechiel’s article on http://redhatstackblog.redhat.com/ and you’ll understand the multiple combinations of technologies around fast processing of network traffic in virtualized environments… Enjoy!

The Network Way - Nir Yechiel's blog

A post I wrote for the Red Hat Stack blog, trying to clarify what we are doing with RHEL OpenStack Platform to accelerate the datapath for NFV applications.

Read the full post here: Boosting the NFV datapath with RHEL OpenStack Platform

View original post

On my way to MWC!

I’m very happy to announce that next week I’ll be at the Mobile World Congress (#MWC2016) in home, sweet home Barcelona.mwc2016_logo

At Red Hat, we’ve prepared many exciting demos and announcements that will appear in our Telecommunications blog 4k

I’ll be presenting a Video on Demand transcoding solution demonstration (from Vantrix) that covers the challenges of 4K (UHD) Video delivery, and how OpenStack can help.

In this blog I’ll write about my daily discoveries, but if you want a quick chat, you can come visit me at the Red Hat booth in the MWC,  2G230 Hall2

Furthermore, I’ll be writing about Mobile Edge Computing and how our open source solutions can be used to create the foundational infrastructure for the edge datacenter in 5G, IoT and C-RAN architectures.

And now a quick word of advice to the first-time travelers to Barcelona:

  • Instead of a regular coffee, order a Cortado (in spanish) / Tallat (in catalan). It’s our version of the italian Macchiato, but better
  • Although the most typical morning drink is Cacaolat, a local chocolate milk invented almost a century agot. You can try the Granja Viader near Ramblas/Liceu, the bar that created that drink that’s been running since 1870 . You can also order, in any bar, a freshly backed Donut (not your typical dunkin donuts, try one!) or a Bocata de Pernil Iberic (a long-bread sandwich of spanish cured ham) with the omnipresent Pa Amb Tomaquet (sounds like “pamtoomakat”) which is a baguette cut in half with tomato spread, oli, and salt or garlic on it… hmmm… delicious and crunchy!
  • Make sure you go visit Plaça Espanya. From there you can go shopping/dinner inside Las Arenas (a former Plaza de Toros, now converted to a mall)  and watch the amazing Magic Fountain (every day at 8 and 9pm I think) on your way to the big museum on the top of the hill.
  • And for god’s sake don’t eat anywhere in las Ramblas. Go to the Gotic, east side of Ramblas (fine cuisine), avoid Raval, west side (more ethnic food), and remember: if there are pictures of the food at the door, you won’t eat well.
  • If you stay over the weekend, go to see the views of the city from the Park Guell where its iconic dragon will meet you with his mouth wide open!

park-guell_dragon-600x436

 

Quick Reality Check: Latency

Nowadays, in any NFV-related improvement to existing technology, improving Latency is always involved. Either via reducing it or making it more deterministic (like real-time).

intelgrafikIn order to reduce latency, one first needs to realize how terribly slow are some things when compared to a CPU speed. As you know, CPU is basically a ultra-fast robot that executes simple operations (math, bit tranformation, push/pop from memory, etc) and takes its commands from code registers and the data from different memory locations: some very fast but small (L1 cache) and others slow but very big (RAM and Hard Drives via system buses like SATA or PCIe).

Motherboard_diagram.svg.pngNow let’s focus about the Memory: when the CPU needs to fetch some data to manipulate, it remains idle until the data comes. This time is then spent to execute other pieces of work. When the data has been fetched (from RAM for instance), then the code resumes execution. If the data exists in L1, is a cache hit, but other than that, it’s a miss and the CPU will have to wait until the data is fetched from L2, L3, RAM, etc.

So in order to understand why it’s important to minimize the amount of data that has to be fetched from “distant” parts, like a PCIe device or a SATA hard-drive, have a look at the time it takes to access those parts, including network devices. The numbers come from https://gist.github.com/jboner/2841832. Referring to nanoseconds doesn’t make sense to our little brain, but here you’ll see how bad this is as soon as you use  a more human scale, where a L1 cache hit takes 1 second, and slower operations take up to years.

If L1 access is a second, then:

L1 cache reference : 0:00:01
Branch mispredict : 0:00:10
L2 cache reference : 0:00:14
Mutex lock/unlock : 0:00:50
Main memory reference : 0:03:20
Compress 1K bytes with Zippy : 1:40:00
Send 1K bytes over 1 Gbps network : 5:33:20
Read 4K randomly from SSD : 3 days, 11:20:00
Read 1 MB sequentially from memory : 5 days, 18:53:20
Round trip within same datacenter : 11 days, 13:46:40
Read 1 MB sequentially from SSD : 23 days, 3:33:20
Disk seek : 231 days, 11:33:20
Read 1 MB sequentially from disk : 462 days, 23:06:40
Send packet CA->Netherlands->CA : 3472 days, 5:20:00

In a future post, I’ll compare how DPDK or SR-IOV allow VNFs to have much faster access to network resources, and process data packets without even touching the RAM (all is done in L1 to L3 cache). Using those solutions, the latency for network processing can be reduced to near-baremetal level, in nanoseconds, instead of being stuck with milliseconds-like performance of regular software switches in virtual environments.

(Credit for Images:

https://en.wikipedia.org/wiki/Central_processing_unit

http://www.tomshardware.com/reviews/dual-xeon-duo,664-3.html

https://en.wikipedia.org/wiki/Motherboard )

In NFV, Automation comes before Orchestration

Yesterday, I had to reject positioning a MANO vendor in a PoC due to the misunderstanding from the customer’s side. They thought MANO was about automation, as if it was a synonym for Orchestration. I believe they are not the only ones that may have some confusion here. After all, the list of software solutions that exist on this domain is too extensive, and they often offer aspects of both concepts: Openstack, Cloudband, Cloudify, Cloudforms, Ansible, Puppet, Tacker, OpenMANO, etc.

One may want to analyize the set of software tools available “to do stuff” and the level of abstraction that every software offers. Let me use a simple analogy, so you’ll understand what I mean by level of abstractions. Suppose you want to build a Caterham Superlight, a sports car that you can either buy assembled as any other car, with waranty, from a manufacturer, or you can build it yourself, as a Kit Car.

In NFV, most telcos prefer to build their solutions integrating pieces from different vendors, so they don’t pick a fully-packaged solution (see what happened to HPE at Telefonica UNICA). If you choose to build a Caterham on your own, you can choose different approaches:

  1. Pick your favourite toolbox, open the cardboard boxes, and start assembling the car piece by piece using soldering machines, screwdrivers, nuts and bolts.
  2. Get a robotic arm, which requires a special program customized with the information about the car components. The arm will do some of the work for you with the tools from your toolbox, or other advanced tools designed to fit the robot’s hand. You still need to bring him the components and he’ll assemble them and move the parts around.
  3. Design a full-fledged factory line with a moving processing line surrounded by multiple robotics arms sit around the line, and they use tools to assembly the parts as they pass by. The parts must be put in pre-defined areas so the robots can pick them up. The production line has enough sensors to show if everything is OK, but when not, it must offer the option to the Operator to intervene, troubleshoot and repair the components.

 

Now that you understand these three approaches, now consider a NFV project composed by infrastructure, cloud software, management tools, automation, service orchestration, policy engines, OSS/BSS, VNFs and their managers, etc.

  1. The tool level includes the Cloud Operating System in the NFVi layer. It’s the set of buttons and APIs that execute actions and give a return code. Like a hammer on a nail, it can hit it right on, miss it entirely, or bend it. They provide feedback about the executed action, but that’s it. It’s not the hammer responsibility to remove the nail if it was bent or if the hammer missed the nail. That’s you, the SysAdmin, the hammer user.
  2. Next comes the automation level, including the VIM and the VNF-Managers, as well as internal components that can be not so visible like scripts/recipes in Shellscript, Puppet or Ansible. It’s about taking a well-known process and making it automated so a machine can do it for us. It cannot innovate and rarely can take decisions on its own. Basic “if-this-then-that” rules can be implemented, and that’s often referred in IT as “orchestration” although in NFV it’s too basic to deserve that title. You should only automate those actions that you have previously mastered doing manually.
  3. This leads us to the orchestration level includes Policy Definition and Enforcement, descriptors for Service Definition and function chaining, all part of the top MANO components (NFV-O), This is like the manufacturing floor plant above, and the keyword here are policies. A policy it’s a written rule that sets an ordering criteria (where do we put the parts), a set of levels and KPIs that indicates if everything is OK or not, and a predefined set of actions that need to take place when those levels are not reached, the most basic being a “Halt the production line when something is wrong”. Pick your orchestration in function of the automation tools that are more productive for you, and make sure you can define the orchestration rules in a language that suits your business needs.

When a process has reached the level of maturity of the orchestration level, you can then consider further Quality Improvement techniques that will collect statistical data, Using historical data, you can apply many methods to detect bottlenecks, identify sources of failure and improve the productivity of your systems. Best-practices such as ITIl, eTOM or ZOOM, are the red tape required to allow the controlled change in procedures/tools in a way that never disrupts the business and keeps improving over time.

If you are curious about those steps, you can read more about six-sigma and CMMI and you’ll see the equivalence by yourself.

In conclusion, in an NFV project (as well as any other IT architecture) , do not put the cart before the horse. You should only automate those actions that you have previously mastered doing manually. Pick your tools carefully, master them, before engaging in any Automation project or purchasing Orchestration solutions that have nothing to orchestrate yet.

Comparing Openstack vs AWS to Tesla vs Edison

After reading this article I realized the current situation between AWS (public cloud) vs Private cloud (IBM/Cisco/HP/Dell-EMC) is very similar to the Electricity landscape in mid-1800’s ending in the War of the Currents in late 1800’s (AC vs DC)

Let me explain my point in this analogy: before electricity was discovery, manufacturing used natural power (water, air and animals) to create movement that was transferred inside the factory to help process goods using belts and chains. When electricity appeared, an Electrical generator was used as a power source, copper cables transferred electrical power to machines, and increased productivity and the density of workers, with reduced maintenance for the machinery.

So eventually every factory had its own coal-powered electrical generator (i.e. x86 servers nowadays), and line workers used their machines (i.e. computers) that required the energy that came via the copper wires (i.e. internet or VPN access).

Then the Power Grid was invented, and the industry wanted to shutdown their small electrical generators and leverage the bigger power grid. Outsourcing servers to AWS (public cloud) is like connecting the factory to the power grid (public electricity), but there were no standards for that in late 1800s, hence the Edison (DC) vs Tesla (AC) war.

In this case, Amazon is Edison, the greedy inventor, with lock-in in his veins (DC was patented). Tesla is Openstack, a generous inventor with open APIs and specs to build your own special sauce on top of a good-enough standard (60Hz, 110V, but easily changeable to 50Hz, 220V)

We’re seeing the struggle of Industries outsourcing their power generators and connecting it to the grid (i.e. the cloud). But there is the Monopoly way (Edison), and the Open way (Tesla)

Let OpenStack be the Tesla here, it will help the ‘Tech Giants’ to fight the good battle (Interconnected Private clouds using Openstack) and provide better services to the Industry.

Marcos Garcia, an OpenStack enthusiast