I get a lot of pleasure ensuring that things work as advertised.
Regardless if I am local, remote, troubleshooting or teaching, I always have a mental list of things I like to check. The list changes on the fly depending on the topology, equipment and location, but nevertheless I have a rough idea of what I like to look for.
Some of these tests may seem pretty straight forward like a throughput test, measuring how long it takes for a failover to work itself out or hunting for errors. Other tests I propose, gets me those sarcastic, "Why" and “Huh” stares like routing and stability audits. In this article let cover the basics around what I call a routing review or audit. I usually hear the following comments; “why bother, everything is obviously working” or “We aren’t getting any complaints about that”.
Let me walk you through how this typically unfolds; after reviewing the network diagram, or creating one with post-it notes, I sit down with the client to determine how many hops it should take to get from one host to another and which path packets should take. As long as ICMP isn't blocked, a simple traceroute from a client computer will do. If you use Windows you can perform a pathping since it can provide additional diagnostics with its results.
Routing is usually taken for granted in the sense that if you are getting there, obviously it must be working. I am not trying to prove if its working or not, I’m trying to determine how WELL it is working. Let face it, in the past 10 years or so things don’t break as they did in the 90’s, but things sure slow down.
In the past, I have uncovered routing loops, multiple routes and extra hops. The important thing to keep in mind when going through this exercise and you discover something odd, step back perform multiple tests and truly understand why it is happening. Create a plan for your proposed change and a backup. Lastly don’t forget to test to ensure your changes had the intended impact.
Here’s an example of a traceroute from a layer 3 Cisco switch on a network that was being built:
1 10.16.30.252 0 msec
10.16.30.254 8 msec
10.16.30.243 0 msec
2 220.127.116.11 16 msec 8 msec
10.16.30.252 4 msec
The routing results from this layer 3 switch identify 3 different routes. Fortunately this was the intended design, but we all questioned why .254 was consistently 8 ms or more while the other two hops were 0 ms. Then the client started wondering why ,254 is the second route, what determines the order of the routes to be used, what is the cost of those routes, what protocols are being used, etc..
The point here is that the client is now aware, can research and make changes to get things looking the way it should.
You should occasionally verify and validate your routes. This also helps build what I call 'tribal knowledge', which is invaluable when troubleshooting or working on network upgrades.
Don’t forget that there a tremendous amount of value in verifying everything is working as designed.
Just because it works, doesn’t mean it works well