What is Network Metadata?
The human view in Network Visualization!
Network Metadata is human readable data that describes your network traffic. It is generated and consumed by network traffic monitoring systems to analyse and report on network and user activity. This type of continuous monitoring is concerned with users, the apps they use and the data they access. It is generally not used for monitoring the health of the network fabric and attached devices.
The graphic below depicts some of the available technologies for continuous network traffic monitoring and how they relate to each other in terms of the information provided and the cost and complexity of implementation.
Network Metadata is used to fill the gap between the “not-enough-detail” SNMP switch port counters and “too-much-complexity, too expensive” full packet capture systems.
SNMP counters give a very basic packet and byte count of network traffic into and out of any managed device, such as a router or a firewall. While inexpensive and simple to deploy, they offer a good overview of bandwidth pinchpoints or other issues, but rarely can help with troubleshooting user activity, security issues or forensics as they do not capture enough detail. Solarwinds Orion NPM and Nagios are well known SNMP monitoring systems.
Output of SNMP Counters
Various standards based and propriety flow technologies, such as Cisco’s Netflow, IPFIX, sFlow and others start to add more detail. Typically, the flow data can be generated on one device, say a network switch and consumed by various 3rd party collectors as can be seen here. This is a very simple flow.
At the other end of the monitoring spectrum are powerful full packet capture and storage systems, that capture every network packet to disk, allowing full sessions to be reassembled and replayed, giving very detailed insight into the activity on the network. Systems such as IBM QRadar Network Forensics give comprehensive visibility, but are expensive and complex to deploy and use. This category of tools can be used for debugging very complex application performance or timing issues or security use cases where the complete contents of a file needs to be analysed.
Wireshark is an excellent network traffic analysis tool that provides great detail and is free of charge. However, it’s not included on our main graphic, as it’s not designed for continuous monitoring but more relevant for single point investigations.
Network metadata is used in various products to bridge the gap between the flow technologies and the full packet capture systems. Network metadata is a relatively new term and does not have an agreed upon definition; the amount of detail in network metadata can vary. Essentially metadata is data extracted via Deep Packet Inspection of packet contents, extracting and retaining key application specific attribute(s); for example a file name if the application is SMB or a URI if the application is web.
The richer the metadata the more generally useful it is. The rich detail can help with a large variety of use cases including monitoring users, security and applications and forensics, while sidestepping the significant processing and storage requirements of full packet capture systems.
From network flows to metadata
A basic network flow describes a connection between two systems, and is uniquely described by a tuple that includes: source IP address, destination IP address, source port, destination port, network protocol, start and end time. With each flow, some counters are added, packets and bytes sent and received. This gives a high level view of who’s talking to who and how much traffic is being exchanged. This flow format, with the addition of some other values, such as switch ports and TOS values forms the basis of the Netflow V5 PDU format, the most widely used Netflow protocol. The information contained in this type of flow gives an excellent overview of how the network is being used, highlight bandwidth pinchpoints and see which systems are consuming bandwidth disproportionality. There are many open source products that can consume Netflow records and generate reports and graphs that are free of charge.
Digging a little deeper, application fingerprinting
All of the information contained in the flow record is extracted from the packet header; there is no need to look at the packet payload. However, the information available from the header based flow reports is limited. There are typically many thousands of flows per minute on any network, making it difficult to get a ‘right level’ view of the applications that are being used. The Administrator can make a guess that all traffic on port 80 is HTTP for example, but has no proof of this.
By looking at the packet payload, it’s possible to perform a technique known as Application Identification, Application Recognition or Application Fingerprinting. This technique looks for known patterns in the packet payload (HTTP 1.0 GET, SMTP HELO etc), sometimes looking at client messages and server responses and then determining what application protocol generated the message. Thus, the simple network flow now has a new piece of information. Various flows can be identified as Web Traffic or NFS traffic etc. Commercial systems that add application identification to flow include Cisco NBAR and IBM QFlow.
The addition of the application information to the flow, gives a much clearer view of how the applications are running on the network. A standard flows report that shows tens of thousands of records for a network, can be reduced to a few hundred records by pivoting and displaying applications instead of port numbers.
So what are the applications doing?
From an understanding of what applications are running on the network, the next question frequently is ‘what are those applications doing?’. If a client is downloading 5GB of data from a file share, the network administrator may want to know what files are being accessed, what SQL queries are running or what DNS requests are being made.
Getting this information requires a more sophisticated DPI engine, which in real time identifies applications and also follows the application protocols, to do TCP reassembly and extract data of interest. Essentially, the DPI engine is ‘extracting complexity’, performing a huge data reduction and making the output and reports much easier to read and interpret.
For example, when a user accesses a file on a Microsoft Windows file share, Netflow like data would include detail such as the client and server IP addresses, source and destination port numbers and amount of data transferred only, but because the packet contents are now also analysed, metadata includes the file and folder name accessed and the action performed (or the website and page, or the SQL query etc).
Here we have the bigger picture; where we also need some detail to really understand what has happened.
Metadata does not include the contents of the file or detail on the changes to the file. Such detail would be provided by full packet capture and storage, where all network packets (both header and contents) are captured and stored; this is the main difference between full packet capture and storage, and metadata.
Adding User Context
The final piece of information that can be added to network metadata, is user data. By interrogating an authentication system, such as Windows Active Directory Domain, analysis systems can determine which user was logged onto a client system when a particular transaction was initiated. This allows the system to display metadata in a new way and add critical context, by associating the responsible user with the network event. The actual user name associated with every network flow or event is also very useful when troubleshooting or performing investigations in a DHCP environment where IP addresses are constantly changing but the actual user names remains the same and a critical pivot point.
Metadata is a useful option for obtaining visibility anywhere across the network. Flow enabled devices are not required; the only requirement is a traffic source usually a network tap, SPAN or port mirror. The rich detail and drill down that metadata provides, is not only ideal for quickly troubleshooting network performance issues as it eliminates guesswork, but it is also critical for continuously monitoring network security and user activity.
Finally, the data reduction that automatically results from metadata extraction makes the output very readable and interpretable, making it suitable and very affordable for organizations of all sizes.
Morgan Doyle - CTO, NetFort
As founder and CTO of NetFort; Morgan interacts with customers, developers and partners to further develop the product roadmap with a special interest in integration with SIEM and network control systems. Previously he held the position as Director of Software Development where he led the development of their LANGuardian Network Visibility product for 10 years.
Prior to joining NetFort, Morgan worked as a Kernel Developer for DEC/Compaq/HP High Performance Computing, clustering and fast interconnect groups, and also with Copperfasten Technologies on their fast, secure website update and monitoring toolkits.
Morgan is a graduate of Trinity College Dublin and holds BA, BAI and M.Sc. Degrees in Engineering.