Nowadays, in any NFV-related improvement to existing technology, improving Latency is always involved. Either via reducing it or making it more deterministic (like real-time).
In order to reduce latency, one first needs to realize how terribly slow are some things when compared to a CPU speed. As you know, CPU is basically a ultra-fast robot that executes simple operations (math, bit tranformation, push/pop from memory, etc) and takes its commands from code registers and the data from different memory locations: some very fast but small (L1 cache) and others slow but very big (RAM and Hard Drives via system buses like SATA or PCIe).
Now let’s focus about the Memory: when the CPU needs to fetch some data to manipulate, it remains idle until the data comes. This time is then spent to execute other pieces of work. When the data has been fetched (from RAM for instance), then the code resumes execution. If the data exists in L1, is a cache hit, but other than that, it’s a miss and the CPU will have to wait until the data is fetched from L2, L3, RAM, etc.
So in order to understand why it’s important to minimize the amount of data that has to be fetched from “distant” parts, like a PCIe device or a SATA hard-drive, have a look at the time it takes to access those parts, including network devices. The numbers come from https://gist.github.com/jboner/2841832. Referring to nanoseconds doesn’t make sense to our little brain, but here you’ll see how bad this is as soon as you use a more human scale, where a L1 cache hit takes 1 second, and slower operations take up to years.