Packet capture in the next-gen network
I love capturing packets with my laptop – all the tools are set up the way that I need them. However, I know that it’s no longer a tool that I can use in a next-gen network. New data centers are being designed with non-blocking L2 ECMP architectures like Leaf-Spine and Clos ( Leaf-Spine video from Brad Hedlund and an IEEE ANTS Presentation PDF by Malathi Veeraraghavan U. of VA.), where it’s difficult to answer the #1 question for sniffing: where do I plug in to capture the packets I need?
What makes this more difficult is the speed of these links. It’s increasingly common to find 10G to servers and 40G in the fabric between the switches. There’s no way my laptop can keep up with those speeds. What we need is a way to have visibility baked into the network. This post will explore a few emerging methods.
Infrastructure-based capture: Network Packet Brokers (NPB)
NPBs add flexibility and scalability to the high fidelity that comes from taps. They provide taps with the intelligence of switches to create a “dark network” that captures packets somewhere in the live network, then sends them to a different analysis point. The trick is that, once packets are copied off the wire, it’s dead traffic that can’t be re-inserted into the network, so the NPB fabric handles the forwarding to the packet capture appliance. Some NPBs even do some limited packet capture on the box.
The downside of NPB as a packet capture methodology is the cost (power, space, money). This is additional hardware that has to be installed at key points to maintain visibility. Installing a tap at every inter-switch link can quickly get expensive. Some of my customers rely on high fidelity capture, so they find budget to develop a real Visibility Architecture, but usually the number of capture points are minimized to keep the cost down.
In the current market, the leaders in this space (Anue/Ixia, etc) are facing pressure from standard switches. Arista’s DANZ feature set does basic NPB in an 80/20 fashion (80% of the common features, 20% of the price). There are also market-specific products, like Datacom’s TradeView for stock-trading networks. I’m also seeing cPackets being deployed at a few customers who like the stats gathering on every port, even if there’s not an active packet capture.
Another product that fits into this category is a virtual tap. This is a VM that sits on a VM host (physical server) and acts like a tap for other VM guests on the same host. While there have been a few products like this over the past decade, the current feature leader is Phantom Tap from NetOptics (now owned by Ixia).
Smart network: OpenFlow
One of the big questions in networking is how to reduce the complexity of maintaining networks. One fat finger can break STP and create a switch loop, or drop a subnet into a routing black hole, which undermines the whole argument for more boxes to create a resilient non-blocking architecture. One proposed solution which is gaining some adoptees is OpenFlow, in which a Controller tells all of the switches how to forward traffic. It’s like adding the intelligence of routing to L2, except that it’s centrally compiled and pushed out, rather than being built box-by-box via a routing protocol.
OpenFlow has the ability to create copies of traffic, essentially meaning that any given switch could act as a remote packet capture source. This is like a flexible and scaled form of RSPAN, in which a remote switch sends a copy of its packets to a different switch, then eventually to a packet capture appliance. The problem with RSPAN is that it requires configuration on every switch in the captured packet path, so it’s effort-intensive, prone to error, and has a high cost of failure (creating duplicate packets in the “wrong” place in the network). OpenFlow addresses this by centrally auto-configuring the switches along the captured packet path.
The downside with using OpenFlow for packet capture is that there’s a loss of fidelity. You’re essentially using SPAN ( Not a real representation of the actual data flow - see article - SPAN versus TAP) instead of taps, and then sending that SPAN across the network. You’ll lose some of the packets you capture, but you’ll be able to capture from just about everywhere you’ve got a switch that supports OpenFlow. If that’s a trade-off you can live with, AND your network architecture is moving in the direction of OpenFlow, then here are some tools for you.
BigSwitch was one of the early big names in OpenFlow, and they also have a product to do packet capture called BigTap. However, there’s an emerging open source OpenFlow controller called OpenDaylight (ODL), and now it has its own packet capture app called SampleTap. There’s a quote from Greg Ferro, who co-runs an awesome podcast on networking, that packet capture is the “Hello World” of software-defined networking (SDN) like OpenFlow.
If you think this is just vaporware, the largest implementation I’m aware of is MicroSoft’s DEMON project. They had a great presentation about it at SharkFest ’12. (There’s also speculation that they use Arista and BigSwitch.)
Smart data: Overlays / Virtual Switches
An alternative to OpenFlow – which requires switches that support it – is to use an overlay network like VXLAN or NVGRE. These protocols are commonly found in highly virtualized environments, where the East-West traffic (between servers in the same data center) is encapsulated in another header. The idea is that virtual server moves, adds, and changes (MAC) can require reconfiguration of the network, so the virtualization orchestration layer hides those changes from the network, but exposes them to the virtual switches.
Packet capture in an overlay network can be difficult for two reasons. First, there’s not a guarantee of where any given server is located, and the server location can change dynamically. Second, packets captured in a traditional manner will include the overlay header (VM host IPs, not VM guest IPs), which makes it difficult to use capture filters, and difficult to do analysis if the tools don’t support the overlay.
The solution being proposed by VMware, one of the largest vendors in this space, is to use ERSPAN. ERSPAN was invented by Cisco over a decade ago, and it’s basically SPAN encapsulated in GRE. The switch with the SPAN port wraps each captured packet in a GRE header, and forwards it to a remote packet capture appliance, where the GRE tunnel is removed and the original captured packet is stored or analyzed. With large-scale VMware products like NSX, ERSPAN is supported for every virtual port on every virtual switch. This is definitely not high-fidelity – lossy SPAN combined with lossy transport – but it is very convenient for casual on-demand packet capture.
The future: service chaining / NFV
As the forthcoming changes in network architecture driven by SDN solidify into best practice, the most likely method of inserting middleboxes will be service chaining. This concept takes packet steering to a new level, where packets for a particular service (network-available application) are sent through several hops along the way, like a firewall, IPS, or load balancer. Since most of these additional hops will be implemented as VMs, the emerging term to refer to them is Network Function Virtualizaton (NFV).
NFV will be both wonderful and awful for packet capture. When we think about packet capture today, it provides insight into the network infrastructure not only at the capture point, but also along the packet path, reflected in artifacts like TCP retransmissions. Using NFV for packet capture means that the network will redirect the packets through a capture point, whether a physical tap or SPAN. That’s a dramatic change from classic packet capture, because it’s no longer a purely passive activity, watching the packets as they pass by. There will likely be problems are are “solved” by applying NFV packet capture, since the packet path will change to go through the capture point, potentially bypassing a bad network segment. However, there’s ultimate flexibility in examining the upper layer protocols. What we lose in L1-L3, we gain in L3-L7. That enables us to shift our focus from “the network is broken” into the realm of “how can we make the network work better?”
I still maintain that you can have my packet capture laptop when you pry it from my cold dead hands. However, it sounds like I’ll still be using it happily to diagnose network problems across the planet without having to get out of my chair.
Author Profile -Jim MacLeod has been doing packet capture since 1996. He worked his way up the OSI layers with switches, routers, firewalls, VPN, IDS, and application gateways, before working his way back down with network management and monitoring. Now he’s a product manager for protocol analysis appliances, trying to keep alive the discipline of packet capture in an environment increasingly populated with virtualization, automation, and point-and-click applications.
Editor's Note from Oldcommguy - Jim is a super Geek and a long time friend and associate of mine who lives and breaths data capture and analysis. He believes in doing data capture the correct way and always knowing the limits of methods and tools.
Other articles by Jim MacLeod -