In an earlier article, we discussed data access technology for “out-of-band” network monitoring and compared SPAN Port with TAP. The key point of the article was that if customer is looking for a truly non-intrusive, high fidelity and scalable data access technology, then SPAN (or mirroring) ports would have too many inherent limitations and tapping is the only game in town.
This is particularly important if the motivation for data access has to do with satisfying certain legal requirements and that ultimately the captured traffic would become evidence that needs to be presented in the court of law (such as CALEA and Lawful Intercept). Moreover, we recommended that tapping should be the preferred access methodology if maintaining synchronicity between packets is critical (e.g., in VoIP or IPTV monitoring).
On the other hand, there are many more monitoring applications where SPAN port would be more than adequate. These include application performance monitoring, application discovery, web use experience, web analytics and SOA performance monitoring, etc., especially those where the monitoring appliance is only “sampling” traffic.
Our experience with customers is that the breakdown is about 50/50; half of them prefer tapping (especially true with telecom customers) and the remaining prefer SPAN ports. From the customer standpoint, the advantage of SPAN port is simplicity and flexibility. Also, since SPAN port is a standard feature integrated with most high-end LAN switches, no one should argue with free (although one could argue that SPAN port is never free since we are consuming a revenue port and also that it could potentially degrade switch performance).
In this article, we will review some state-of-the-art tap products that are designed to mitigate the simplicity and flexibility issues. Specifically, we will discuss the mission critical InteropNet (for the upcoming Interop New York show) where an array of “Aggregation Tap” is used. Also, we will introduce a new class of data access and aggregation switch that is complementary to tapping (to perform “Tap Aggregation”). Specifically, we will discuss the implementation of both “Aggregation Tap” and “Tap Aggregation” or “Tap Aggregator” switch for SpyNet, which is an “out-of-band” monitoring network for InteropNet (to facilitate on-the-spot troubleshooting, forensics and application monitoring).
In the process, we will also introduce the concept of filtering and multi-rule mapping (which can be integrated into the tap or the switch, by either manufacturer) and how this capability is important to prevent oversubscription and to enable load sharing (e.g., between 10-Gig network and multiple 1-Gig monitoring tools).
Finally, we will discuss a real-life example on how these products are used by a telco customer for IPTV performance monitoring and subscriber authentication troubleshooting.
Editors’ Note - Our intention here is not to promote products from specific vendors, although we will use them as examples of implementation. Our goal is to provide both available and desired feature information and to discuss the resulting tradeoffs from the customer perspective.
Optical Tap Technology
In a future article, we will discuss in more details the physics of passive tapping, especially for Gigabit copper links. For now, it suffices to say that tapping of an optical fiber is very simple and straightforward. The key component of an optical tap is an optical splitter, which is the same as an optical coupler.
The picture below shows the fabrication technique for a “Fused Biconical Taper Coupler” which according to many patent applications consist of the “means for holding the optical fibers under tension but with no pulling force in contact along a portion of a length of the fibers resulting in a contact portion; a heat source; and means for fusing the optical fibers including means for brushing the heat source across the contact portion and means for moving the heat source in ever decreasing steps gradually closer to the fibers and over an amplitude.”
In other words, it is pure black magic.
In any case, once the optical component is fabricated and packaged, it can be used as a coupler/combiner (left) or a splitter (right).
Therefore, tapping an optical fiber (using an optical splitter) is a very reliable process. For all practical purpose, no additional failure point is introduced by tapping since there is not even a break in the fiber (the same fiber runs from one end of the tap to the other). In other words, an optical tap should be as safe and as non-intrusive as an optical patch cord.
However, one of the primary objections to deploying optical tap is that it removes precious light from a production link. This is a problem especially if one wants to connect multiple monitoring tools since one would need to deploy multiple taps. Moreover, each time we tap a bidirectional link, we would create two independent optical outputs requiring two potentially expensive Gigabit interfaces in the monitoring tools (whereas a SPAN port would conveniently aggregate them into a single data stream).
To address this concern, many tap manufacturers have introduced their second generation “regenerative” and/or “aggregation” taps, which can be shown as the following (using the Network Critical product as an example).
Picture on the left shows the front panel of the “Combi-Tap”. The two LC optical connectors (left) are for the production network and the two RJ45’s (right) are for the monitoring tools.
An exact copy of the production traffic is made using the aforementioned optical splitter and customer has the choice of leaving the bidirectional traffic in two separate streams such that in this simple case, the tap is also a media converter (converting Gigabit optical to Gigabit copper). Or, the customer can choose to pre-aggregate (i.e., combine) the two bidirectional data streams into one (same as a SPAN port) and replicate the combined traffic on the second copper port (making a carbon copy) so that one can simultaneously support two tools (i.e., it is now both an aggregation tap and a regenerative tap).
InteropNet and SpyNet
In previous article, we have written extensively about the history and the implementation of the Interop SpyNet, which is an out-of-band monitoring infrastructure for the mission critical temporary show network. The following is the network diagram for this year’s show (to be held in October in New York).
Once again we have a dual core consisting of two geographically diverse WAN links into the Internet, two redundant border routers, two firewalls and two LAN switches. From the core switch, we support a number of remote PED’s (20 to 28 are for the show-floor and the rest are for the off-show-floor including classrooms and sales/registration desks).
The tradition of InteropNet is that all critical links are tapped and instrumented (with monitored traffic never run through the production network) such that troubleshooting can be done even when the network is severely under attacked and compromised.
The following shows three “Aggregation Tap” chassis (for a total of 12 taps) deployed between two patch panels, allowing all distribution links to be monitored non-intrusively. The tap is configured such that bidirectional traffic is pre-aggregated (before feeding to the aggregation switch). The second RJ45 for each tap module is configured to receive a “carbon copy” of the aggregated traffic (for each link). This second port is called a “Walk-Up” port allowing Interop engineers to walk up and perform troubleshooting on the spot (especially important at setup time when not all equipment is up and running yet and we have to attack one link at a time).
To further aggregate the outputs of the “Aggregation Tap” and to support the monitoring tools, we deploy a data access switch (which is the orange box from Gigamon Systems shown below sitting on top of the router and the firewall).
The yellow cables on the left of the data access switch are the same yellow cables coming from the taps and the two modules on the right are the GigaTAP-Sx internal tap modules (that perform basically the same function as the Network Critical taps). The internal taps are used to tap the four optical links both front and behind the two firewalls.
All captured traffic are distributed to monitoring tools attached to a second data access switch (not shown), which is connected to the first one by a 10-Gig stacking link such that the two switches behave like one.
This setup provides the Interop engineers with great flexibility, allowing them to customize monitoring traffic to fit the needs of each monitoring tool. For example, one tool might receive aggregated traffic in front of the firewall (aggregating from both primary and secondary links) and one tool might receive aggregated traffic behind the firewall. Similarly, one tool might be monitoring off-show-floor traffic and one might be monitoring everything. The Gigamon data access switch can easily accommodate all of these customizations.
The tradeoff between using an “Aggregation Tap” (such as the Network Critical tap product) and conventional non-aggregation tap (such as the GigaTAP internal tap module) and then using the Gigamon data access switch to perform “Tap Aggregation” has to do with cost, form factor and performance.
Since a single Gigamon switch can only accommodate three GigaTAP modules, it can only tap six links. In order to tap sixteen links (four core links and twelve distribution links), we will need at least three chassis. On the other hand, by pre-aggregating the tapped outputs and performing the media conversion (from optical to copper), the Network Critical tap product allows tapping of up to twenty links with a single Gigamon chassis (since it can have up to 20 ports in one chassis).
On the other hand, an “Aggregation Tap” has the potential shortcoming that since it pre-aggregates two “Gigabit” data streams and if both sides of the bidirectional link exceeds 50% utilization, there is the possibility of oversubscribing the Gigabit interface and dropping packets before it even gets to the aggregation switch. By not pre-aggregating (i.e., using conventional taps or the GigaTAP internal taps), we can keep the two traffic streams separate until the Gigamon switch has a chance to filter and to map the traffic (more on that later).
At InteropNet, we use a combination of the two methodologies. We deploy the “Aggregation Tap” where we are confident that we will never exceed 50% utilization to save port count (at the distribution links) and we deploy non-aggregating tap and rely on the aggregation switch to perform subsequent “Tap Aggregation” where we feel we might have a high bandwidth situation (at the core) due to occasional DDOS attacks.
With a little bit of planning and the right mix of products to choose from, customers can easily have the best of both Worlds.
Filtering versus Mapping
At least one tap manufacturer has taken the design of “Aggregation Tap” to the next logical step by introducing a “Filtered Aggregation Tap”. That is, in addition to aggregation and regeneration (i.e., multicasting), the tap electronics can also perform packet filtering.
It is important for customers to recognize that there are two variations to filtering; one is Pre-Filter (filter before aggregation) and one is Post-Filter (filter after aggregation). Both have potential shortcomings.
The diagram above shows a connectivity scenario where packets are flowing from left to right (this happens to be for the Gigamon data access switch but it can also be used for discussion purpose for an “Aggregation Tap” which would have only two inputs and two outputs).
Ports on the left are called Network Ports (ingress) and ports on the right are Tool Ports (egress). Obviously, Network Ports are to be connected to the network (which can be SPAN ports, external taps or internal taps such as the GigaTAP, or optical splitter in the case of an “Aggregation Tap”). Similarly, Tool Ports are to be connected to monitoring tools.
Packet filtering can be implemented either at the ingress or the egress. Filters that are implemented on the ingress side are called Pre-Filters since filtering is done before any connectivity operations, i.e., before aggregation (Many-to-Any) and multicasting (Any-to-Many). Similarly, filters that are implemented on the egress side are called Post-Filters since filtering is done only after aggregation.
To prevent oversubscription, one would prefer Pre-Filters since it cuts down on incoming traffic before aggregation. However, the problem with Pre-Filters is that we only get one bite to the apple, i.e., once filtered, it is not possible for a second tool to monitor the same traffic either unfiltered or filtered in a different manner.
On the other hand, the advantage of Post-Filter is obvious. Although it won’t help in preventing oversubscription, it is very useful as a way of customizing traffic for multiple attached tools (filtering of one tool does not affect its neighbors).
For the Gigamon switch (or any other data access switch in that product category), there is (and there should be) a third way of customizing traffic which is called “Mapping” and can be thought of as a “multi-rule” Pre-Filter and is available (and should be available) for both 1-Gig and 10-Gig ingress ports.
The above diagram shows a typical example where the ingress ports are 10-Gig which can receive traffic from the SPAN port of a 10-Gig core switch or from a 10-Gig passive tap. Using these multi-rule Pre-Filters, 10-Gig traffic can be “mapped” to multiple load-sharing 1-Gig monitoring tools, with each tool analyzing a specific VLAN range, port number or IP subnet according to the specific filter rule, thereby providing the ability to perform comprehensive monitoring at 10-Gig line-rate without oversubscribing any single 1-Gig tool (what customers often refer to as “Reverse Aggregation”).
So basically “Mapping” is a combination of “multicasting” and “pre-filtering”. Unlike conventional single-rule Pre-Filter (which gets one bite to the proverbial apple), here it is as if we first make a “backup” copy of the incoming traffic before we perform filtering so that subsequent filtering can be performed on the original ingress traffic (for as many times as allowed by the switch manufacturer, which is 120 times for the Gigamon switch).
In summary, “Mapping” is yet another way to differentiate between “Aggregation Tap” and “Tap Aggregation”. Again, customers can choose between the two solutions (or a combination of both) depending on cost and performance.
IPTV Monitoring & Authentication Troubleshooting
The following shows a simplified network diagram for a typical telco customer. In addition to providing conventional telephone services, customer also provides ADSL and IPTV to gain additional revenues. The redundant links between the data center (NOC) and the regional central office (CO) are 10 Gigabit links. Since "Aggregation Tap" is not (yet) available for 10G, the customer is currently using conventional taps and the data access switch to aggregate and more importantly, to filter the 10G traffic in order to support a number of “revenue-critical” monitoring tools.
Each monitoring tool serves a particular purpose including lawful intercept, performance monitoring, security, malware detection, deep packet inspection and authentication verification. Therefore, filtering/mapping becomes very important. For example, to focus on RADIUS authentication traffic, customer can filter on Port 1645/1812 and to focus on RADIUS accounting, customer can filter on Port 1646/1813. Similarily, to focus on multicast traffic, customer can filter on a particular range of IP addresses correspond to the originating media servers.
Time has changed.
To ensure security and compliance, network monitoring has become mission critical for our customers. In particular, for both technical and political reasons, customers prefer that monitoring is to be done non-intrusively, i.e., out-of-band. This requires either SPAN ports or deployment of passive taps, both are useful depending on application.
SPAN port is free, simple and flexible. But taps are truly non-intrusive, more accurate and scalable and in case of “Lawful Monitoring”, i.e., monitoring as a result of legal requirement, it is most likely the only acceptable solution. Unfortunately, tapping is not as simple or flexible as SPAN ports.
In this paper, we discussed a new generation of “Aggregation Tap” as well as a revolutionary class of “Tap Aggregation” data access switch that are designed to restore simplicity and flexibility to monitoring of customer networks.