Army DOIM on Netscout Systems
“The nGenius Solution gave us the proof we needed to remove the onus from the network team and put it back on the application team ... We've fixed two big problems in a very short time. Now we're cooking up new things to do with it every day!”-- Senior Network Engineer, Army Contractor
Customer Profile - The stated mission of the US Army Directorate of Information Management (DOIM) is to provide secure, reliable, responsive service in information resources and information technology to multi-component forces, fully supporting all Army bases and installations.
Vendor Profile - NetScout Systems, Inc. (NASDAQ: NTCT) has been an industry pacesetter for advanced network and service assurance solutions for over a decade, and counts the world's largest enterprises, government agencies, and service providers among its customers. Enterprise and government IT organizations deploy NetScout's nGenius(R) Performance Management System to increase service levels to their users by reducing or preventing service disruptions. Service providers depend on NetScout's proven IP performance management technology and expertise to protect the quality of their customers' experience with IP-based services. NetScout is headquartered in Westford, Massachusetts and has offices worldwide.
Army DOIM chooses nGenius Solution to Solve Long-Standing Performance Issues and to Protect against Future Difficulties
Problem
Lack of visibility into the network and essential applications hindered this Army DOIM's ability to resolve several long-standing problems and users were getting frustrated.
Challenge
The network team, using some home-grown and third-party tools to actively monitor the DNS servers, saw rolling outages during the day that lasted for several minutes, but couldn't isolate root cause.
Solution
The nGenius Solution provides granular visibility into the applications running on the network, which enables fast diagnosis of root cause and easy escalation to the proper groups for resolution.
Result
Within two weeks of installation, the network management team was able to identify the firewall as the culprit in a DNS server problem and two issues - one server configuration error and the other a load balancing problem - affecting SAP.
Introduction
This U.S. Arsenal is a leader in the research, development, engineering and production support for advanced weapons systems. Large numbers of new weapons, munitions and auxiliary equipment have been produced and fielded by the more than 3,000 engineers and scientists, providing U.S. forces with state-of-the-art capabilities for increased effectiveness on the battlefield.
Over the past few years, the network team worked hard to improve the network design, implementing a traditional hierarchical approach with core, distribution and access tiers, but they were forced to use conventional portable protocol analyzer devices to troubleshoot network issues. Shortly after connecting with NetScout at the Army LandWarNet conference, this Army DOIM decided to use a more instrumented, “eye-on-the-wire” approach to network troubleshooting. Because they are supporting a large campus network of several dozen buildings and nearly 6,000 users across a 6,500-acre military installation, they purchased multiple Gigabit Ethernet probes to instrument the entire core and distribution layers. The payoff was immediate: within a month of implementation, network operations resolved two long-standing performance issues and diagnosed a third.
Isolating a Long-Standing Issue with DNS
For nearly two years users had been experiencing long connect times and actual failures when attempting to connect to external applications and web sites that required DNS lookups. Using third-party tools, the network team saw rolling outages during the day that lasted for several minutes at a time, but couldn't isolate the root cause. There was as much as a 5% failure rate on the Active Directory DNS server. They knew they needed strong ammunition - real proof - to bring to higher command and their service providers before the necessary action would be taken to fix the issue.
Once network operations installed the nGenius Solution, the first thing they did was investigate the DNS issue. They quickly discovered a significant imbalance in request / response rates between the local DNS servers and Tier II DNS servers (See Figure 1). In fact, at peak times, there were nearly two requests for every response received. The team launched a packet capture from the probe and saw an excessive number of no server responses, which were causing the multiple requests to be issued.
The network team opened a ticket with the service provider. After a few days of investigation, the service provider determined that it was not actually a problem with the DNS server. More digging revealed a firewall along the path was that configured to do an IDS function called "Deep Inspection". The firewall was identifying traffic from Picatinny's DNS servers as being “malicious” and was configured to drop traffic matching that signature. As a result, all DNS requests from the DNS Servers ended up in the bit bucket. Once the setting was changed, request-response traffic went back to a one-to-one ratio (See Figure 1).
Diagnosing Multiple SAP Performance Issues
Users of an SAP application were reporting poor performance and occasional application failures of the type “page cannot be displayed”. The network team confirmed the slow performance by reviewing the Application Response Time graphs within nGenius Performance Manager and also determined that the SAP server was the most heavily utilized server within Picatinny's DMZ, with more
than four times the users of any other server.
Additional investigation also revealed that there were 170,000 failed responses to a backend SAP Server from the DMZ (See Figure 2) and an enormous number of Server to Client (SC) resets (See Figure 3). By taking a packet capture, the team was able to determine that SAP was using the wrong web server configuration file.
The team then used nGenius Performance Manager to generate an ad-hoc report and emailed it to the application group who quickly fixed the configuration file thereby resolving the performance issues.
The nGenius Solution also helped uncover a load balancer configuration issue that was slowing SAP performance. Now that the issue has been diagnosed, the SAP application support team is close to resolving the problem.

Continue reading other LoveMyTool posts on NetScout Systems »

