Getting Back The Window Title in Wireshark 2 (by Tony Fortunato)
Troubleshooting with Wireshark - Mapping IP Address Location (by Chris Greer)

Cisco ASA Behaviour with Packet Losses and Overtaking - Using NetData visibility (by Bob Brownell)

Cisco ASA Behaviour with Packet Losses and Overtaking

Using NetData Visibility

A question posed to ask.wireshark by wdurand in September asked why reading a file across a WAN from a NetApp file server was slower than the equivalent writing operation:

https://ask.wireshark.org/questions/55972/slow-writes-even-slower-reads-spanning-wan-to-netapp

The network path included a Cisco ASA, and an explanation for the slow transfer requires an understanding of ASA behaviour. We present here our analysis of similar ASA behaviour, drawing on a pair of concurrent captures taken from both ends of a network path that traversed an ASA, from an Oracle database server to a client workstation. System behaviour is illustrated with NetData charts.

BrownellArticle2Chart1

Click on Charts/Graphics to expand!

This data-sequence chart describes a 34-millisecond period of abnormal network behaviour, with many retransmissions, selective acks and duplicate selective acks (D-SACKs), as seen by the server. Each server packet is marked by a short vertical strip plotted against the time-of-day scale at the bottom and the sequence scale on the left. The end of each packet strip is marked by a horizontal tick running left from the top of the strip.

On the left of the chart is a rapid burst of more than 40 data packets. Black strips are normal data packets. Green strips are packets that might have been overtaken after being recorded by the sniffer, an outcome that NetData deduced from the selective acks and subsequent acknowledgements that appear much sooner than retransmissions could achieve.

One short packet at the end of the green packets is marked by a red strip to indicate that it was probably lost after being recorded by the sniffer. It was retransmitted twice and was acknowledged one round-trip time (RTT) after the second retransmission. All the other retransmissions (circled in red) were unnecessary, as is confirmed by their corresponding D-SACKs (circled in orange).

The gaps in the horizontal bands of selective-ack information – pale blue for acknowledged data and cross-hatched grey for missing data – present a mystery. There must be an alternation between acks with selective-ack information, and acks without. However, the mystery is explained by the next chart that displays the traffic as seen by the client:

BrownellArticle2Chart2Click on to expand

Packet strips appear further to the right by an amount that reflects the transit time of the packets. The near-vertical stack of packets in the original large burst now has a decided slope that reflects the speed of the intervening WAN. This is all understandable but the enigma of this chart is that it presents no hint of a sequence gap, whether caused by packet loss or packet overtaking. There are numerous retransmissions, all without an apparent cause. D-SACKs correspond to the retransmissions, but there is no selective ack.

An explanation emerges form the packet IP identifiers. Packet 11183 was forwarded with little delay, but packet 11185 was held until the gap left by the loss of packet 11184 was filled by its retransmission 11254. When the ASA encounters a sequence gap it buffers all the subsequent packets until the gap is filled, and then an unbroken sequence of packets is released for transmission onward to their ultimate destination. In other words, the ASA hides the occurrence of packet loss and overtaking, but relays tell-tale bursts of redundant retransmissions.

Another view of ASA behaviour with this traffic is provided by NetData’s packet-timing chart. To produce these charts NetData analysed the two sequences of capture files from each end of the network path and performed a linear-regression analysis on the pairs of timestamps of matching packets in the two captures. NetData calculated not only the difference in sniffer-clock settings but also the difference in clock speeds, and used those parameters to convert all the timestamps in the second capture to their corresponding times in the first capture.

BrownellArticle2Chart3

Click on to expand.

Four horizontal ‘Socket’ bands on this chart carry different markers representing different types of packets according to the box of legends. The top two bands carry server packets, and the bottom bands carry client packets. In each pair of adjacent bands the upper band carries packets seen at the server, and the band underneath carries the same packets plotted at times seen by the client. Grey lines join the markers of the same packet, and the lengths of these lines indicate the network transit times of the packets. The packets with large transit times were those held in the ASA until a sequence gap had been filled.

Markers in red boxes indicate packets that were recorded in only one capture file because they were lost in the network. Those in green boxes were also recorded only once, but because they were injected by the ASA. We see that all the selective acks came only from the ASA, and all the D-SACKs and plain acks came only from the client.

The yellow bar plotted on the lower band represents the apparent server time to respond to a database request to fetch the next 500 rows in a result set, and the blue bar indicates time to transfer those rows. The very short yellow bar on the upper band indicates that the server needed less than a millisecond to prepare the response.

Relevance to Slow Writes, Slower Reads

The answer to the question posed by the slower reads measured in another network lies in that network’s ASA hiding evidence of packet-overtaking from the capture taken at the client. The large numbers of redundant retransmissions seen by the client can be credibly explained only by the ASA reacting to a sequence gap and sending selective acks back to the NetApp file server. The sequence gap probably arose because at least one packet was overtaken between the server and the ASA.

Other evidence of an alternative path between the ASA and the NetApp server is provided by a rare sequence of three data packets from the server that contained selective-ack information:

BrownellArticle2Chart4Click on to expand.

The burst of data packets depicted on the left side of this sequence chart are SMB2 Read requests, and, in view of the selective-ack information depicted on the right (time shifted closer to the data packets for easier comparison), NetData has properly deduced that the third packet must have overtaken the second packet after being recorded by the sniffer.

Although the 200-MB file-write operation was faster, its analysis (presented in a later article) shows that it too was affected by a network event in which a long burst of packets was overtaken by another packet, somewhere between the ASA and the server.

The next step is to look for alternative paths between the NetApp server and the ASA, where one packet can overtake another. This severe performance problem may be caused simply by the use of a double-headed network card operating in a load-sharing mode – in a router, firewall, or the file server itself. The problem is unlikely to be caused by any physical fault but by a configuration issue or software weakness.

Although the sequences of retransmissions always correspond to the start of an original burst, we can’t infer that all the original packets were overtaken. When the ASA sees a sequence gap and decides to issue a selective ack, it can’t acknowledge any packets still in flight between the ASA and the client, in case they are lost after leaving the ASA. The packets still in flight usually include all the earlier packets in the same burst, and sometimes the last packet of the preceding burst. The selective ack issued by the ASA therefore tends to overstate the sequence gap by a large factor and prompts the retransmission of far more packets than are necessary.

If the network has bandwidth to spare, the redundant retransmissions are not a problem, but in low-speed networks we have seen a few packet losses generate such large numbers of retransmissions that response times increase, retransmission timeouts cannot be recalculated, connections close and applications eventually fail. One wonders what advantage the Cisco designers saw in this behaviour to compensate for the potentially severe downside.

BobBrownell_HeadAuthor : Bob Brownell has more than 45 years experience in communications and IT, initially designing and building networks, computer-controlled systems and packet-switching systems in Australia and Europe. He is a founder and director of Measure IT Pty Ltd, a firm that specialises in diagnostic analysis of IT systems, and over the last 20 years has been developing NetData, a tool that analyses captured network traffic and visualises system behaviour. It characterises virtually all transactions, including those that fail and requests without responses, with a broad range of application decoders that includes all the major database protocols. NetData has been licensed by IBM, major Australian banks and government departments to diagnose the most complex IT performance problems around the world.

Bob holds bachelor degrees in science and engineering, and a PhD, from the University of Tasmania.

Contact: If you have an intractable performance problem or would like to extend your Wireshark skills, learn more about NetData and discuss trial versions or licensing, please send an email to - bob@netdata-pro.com 

Comments