When we get to the point in an investigation where we are about to break out Wireshark, the complexity of the packet analysis can seem quite daunting. And yet by covering a few key points can dramatically cut the time needed to analyze any diagnostic data.
Let's start with a seemingly obvious point; do you understand the problem? Sounds like a stupid question, but I am amazed by how much time an IT team will spend investigating a problem that they barely understand.
Take the example of the bank that had a tiger team of seven investigating a "network performance problem" for four months. Staff in an Indian processing center were complaining that they couldn't meet business targets because the system was slow. The same system was used by UK workers and it performed fine for them; so it must be a network problem right? I arranged for us to call a user at the processing center.
The lady at the center explained that during the latter stages of processing a loan application the system intermittently threw a script error, and so she needed to start the process again. This made the system slow to use!
What had the tiger team been doing for the last four months? Crawling all over the network, Citrix servers, application servers, databases, etc.
|Scenario 1||Scenario 2||Scenario 3|
|Start Word||Open Windows Explorer||Open an Inbox item|
|Choose File -> Open||Navigate to the shared folder||Double-click on an attached Word doc|
|Navigate to the shared folder||Double-click on a Word doc||Hang for 30 seconds|
|Double-click on a Word doc||Hang for 30 seconds||Document opens in Word|
|Hang for 30 seconds||Document opens in Word|
|Document opens in Word|
The situation is often a little more subtle.
Consider a case where the reported problem is that it intermittently takes 30 seconds to open a Word document. That's better, but ideally we need to know more.
The table above shows three scenarios. A 30-seconds delay in Scenario 1 would most likely be an SMB, TCP, network or server problem. A 30-second delay in Scenario 2 adds the possibility of a delay in the starting of Word. Scenario 3 involves the transfer of data from the email server to the local drive. What's more, if the email server were, say, a Microsoft Exchange server, the Word document could be transferred using an RPC protocol rather than SMB.
There's a simple rule we can apply that's straight out of the RPR playbook.
Make sure you understand a symptom to a level of detail that you could repeat the key strokes and mouse movements that would cause the problem to occur.
This doesn't mean that you need access to the application. If you have access, it doesn't mean that you could recreate the problem. It just means that you need to understand the problem to that level of detail.
If you don't understand the problem to this level, forget about breaking out Wireshark; you'll just be wasting your time.
PS: You can get instant access to the RPR manual via the Network Trace Analysis Guide section of the TribeLab site.
Paul is currently leading the TribeLab project to explore new ways to help IT support people troubleshoot performance and stability problems.