Managing IT Assets As Business Assets (by Zeus Kerravala & Vanessa Alvarez)

Author Profile - Zeus Kerravala manages Yankee Group's infrastructure research and consulting. His areas of expertise involve working with customers to solve their business issues through the deployment of infrastructure technology solutions, including switching, routing, network management, voice solutions and VPNs. Before joining Yankee Group, Kerravala was a senior engineer and technical project manager for Greenwich Technology Partners, a leading network infrastructure and engineering consulting firm.
Vanessa Alvarez is an analyst in Yankee Group's Enterprise Research group. She conducts research and advises enterprises and vendors on best practices and go-to-market strategies for the deployment of enterprise IP telephony, VoIP and unified communications. Alvarez specifically focuses on the impact of IP communications and examines the market drivers, inhibitors and economic justification for the deployment of next-generation IP communications within the Anywhere Enterprise™.
Executive Summary
IT executives have different mandates today than they did 5 years ago. With the emergence of enterprise mobility, service-oriented architecture (SOA), virtualization and convergence, the underlying infrastructure that supports corporate applications and services has become much more complex. IT organizations are being asked to deliver more applications to more locations while controlling operational expenses (see Exhibit 1). To accomplish this, IT needs to radically change the way it manages the infrastructure that supports the company for the following reasons:
- The introduction of new devices and applications on the network makes the network more complex and the end-user experience becomes increasingly important.
- Networks are being required to be more “on-demand” and available anytime, anywhere.
- Today, IT departments spend 90% of the time required to fix a problem just trying to isolate the problem. Being able to rapidly identify where a problem is would cut mean time to repair (MTTR) by orders of magnitude.
- Existing network and systems management (NSM) approaches look at the world from the bottom up and are not sufficient to effectively manage these new, complex networks.
- The concept of IT management needs to be shifted to a top-down, end-user experience management model that can tie IT to business initiatives and be more aware of the user experience as opposed to just infrastructure.
I. Introduction
The turning of the new millennium created an opportunity for a marked shift in IT departments. With Y2K behind them, there was once again a redoubling of efforts on innovation. Innovation came in areas such as more flexible ERP implementations, RFID inventory systems, software-as-aservice (SaaS) deployments, SOA and mobile e-mail, and have led to dramatic increases in productivity for its business users. Moreover, Moore’s Law has become a reality and tremendous gains have also been made in the areas of performance, capacity and availability in each of the infrastructure silos (e.g., storage, servers, networks) supporting these new services. For example, today’s switches and routers are nearly limitless in forwarding capacity and are built with resilient and redundant architectures that are nearly bulletproof.
These advances put IT on a path where it might finally be delivering the always-on, always-available utility model that has been the holy grail for IT for so many years. Regrettably, that vision has not materialized. What seems to keep holding it back?: The complexity of the infrastructure has grown exponentially.
A primary factor contributing to the complexity is the incredible rise in the number of networked applications running in an enterprise. In large enterprises that number is counted in the hundreds, most of which the business units would consider mission-critical. Additionally, a majority of these applications are now in centralized data centers being accessed from all over the globe. Further entangling things is the boom in endpoint devices that produce and consume these applications, including video-conferencing systems. IT doesn’t have a good understanding of how these individual devices interact with each other, with the infrastructure and with applications. Once a fine balance is established, any subtle configuration shift has the potential to set everything off kilter. Therefore, it’s no wonder that the availability and performance levels of applications have not kept up with that of the infrastructure silos. The issue at hand is that applications transcend these silos; and no matter how you fine-tune the silos, it’s not enough to temper the day-to-day application availability and performance issues that plague the enterprise.
The negative ramifications of not tackling these issues are costing enterprises millions of dollars in lost revenue and productivity annually. Exhibit 2 on the next page shows that companies estimate workers lose approximately 14% of their productivity because of poor performance and availability of applications. (See exhibit on next page) Companies spend millions every year trying to make workers more productive. But if they would focus on ensuring that what they have works optimally, end users would enjoy a double-digit improvement in productivity.
What a difficult quandary this is for IT —because the more innovation it delivers related to individual employee and business line productivity, the higher the buy-in from end users and the higher the expectations are on the quality of the services delivered. So even though IT is doing a good job, it still finds itself in the awkward position that it’s often letting its end users down. For example, Yankee Group estimates that 75% of the time, end users notice IT issues before the IT department. Ideally, the IT department should always know of a problem before the user community.
A model where IT is reactive to user needs is hard to justify in an environment where IT is at minimum spending 2% of a company’s top-line revenue (up to 12% in the financial services sector), and 80% of that spending is tied to operations such as keeping the lights on. In other words, IT’s business users are justified in not being fully satisfied with the services they are getting from what is considered a support organization.
The state of affairs will get tougher on IT before it gets easier. Many of the new initiatives today, such as SOA and virtualization, add to the complexity that IT already cannot manage properly. It’s hard enough for IT to understand the interactions of all the IT assets today. But what happens in a SOA world with mash-ups, where services are dynamically divided up across the entire infrastructure? Supporting a dynamic IT infrastructure just got even harder. IT must evolve the way it manages its operations. This is the only way to get control of everything.
II. Network and Systems Management Software Is Stuck in the Past
The innovation made for and implemented by IT has been astounding. The tall order ahead for IT is how to manage this myriad of convoluted systems, networks and applications, and concepts like VoIP that further blur these lines of distinction of traditional IT. Inexplicably, innovation in the management sector has not kept pace with all the other advances mentioned. The management software vendors, in particular the Big Three—HP, IBM and CA —have been anything but leading edge and as an oligopoly have managed to protect their cash cows while holding IT hostage.
The problem is that the fundamental design concepts of network and systems management software are obsolete today. The schema was developed in the late 1980s in the early days of networking to manage devices that were inherently unreliable. What happened was that equipment vendors started out creating device-specific management solutions that showed steady green lights when their products were fine and flashing red lights when things were wrong. These same vendors then had to build element management systems that did a good job of putting all their devices under one umbrella (e.g., CiscoWorks, which manages, in theory, all Cisco products). And then one or more of the Big Three came in to aggregate all these element management solutions under a bigger tent. That gave IT one centralized console that had an aggregated view of the infrastructure and could point IT to where specific hotspots were when there were physical infrastructure problems.
From the 1980s through the Y2K era, that model was adequate and fulfilled a need that IT had. Today, IT’s requirements have changed and that model is no longer sufficient. The reason is simple: The foundational design concepts are no longer correct. Managing the siloed infrastructures is no longer the most important thing. What is most important is managing the integration of all the business elements—which is effectively a composite mesh of the endpoints (desktop, server, storage device, smart phone, etc.), the applications they produce and consume, and ultimately the services they create. All of these things have equal stature because a change in any one of these areas can change the equilibrium that affects them all. Therefore, it’s important to put the endpoints and the applications at the top of the pyramid instead of the infrastructure piece parts.
The old management model, which Yankee Group defines as the bottom-up approach, fails because it knows nothing of the interdependencies just elaborated. This deficit in knowledge puts IT in a situation where it has two tremendous challenges. The first is that it is stuck in a role where it consistently fails to identify the real-time application performance issues that often transcend silos (e.g., the dropping of VoIP calls). This is proven by the fact that end users are the ones who report their issues to IT and not the other way around. The second is that once an issue is identified by the user, there is an incredible set of steps ahead to then get to the source of the problem. Yankee Group research estimates that identification of a problem accounts for 90% of an issue’s mean time to repair.
Getting to the source of the problem is anything but formulaic. In fact, we have an expression at Yankee Group to describe it: We call it the problem resolution ping-pong (see Exhibit 3). The ping-pong effect is driven by the fact that the escalation process begins with an end-user call to complain about a certain application. That complaint is a symptomatic issue that reflects a deeper problem that caused it. No one in IT knows where that deeper problem sits—certainly not the help desk, which for all practical purposes is just providing desktop support. The result is that it gets escalated to a silo and in that silo the team uses its silo-specific tool, which tells it there is nothing wrong in its silo, and so the escalation gets passed over to another silo and so on. This unproductive escalation model has been ever-present and is not likely to change given the state of the traditional tools. The unfortunate footnote here is that network operations can no longer afford to pass around the problem; that group is now directly responsible for managing application performance. The Yankee Group 2005 Network Management Survey showed that 70% of network managers deem improving application performance their number-one issue. Needless to say, this is a rather thankless task given that they are rarely actually consulted when the application team decides to roll out a new application; and it’s not entirely uncommon for them to not even know of the application’s existence.
Even though there have been advances in application performance management (APM) that were supposed to help the problem resolution ping-pong, in general they have not. That is because the APM solutions are still out of the silo management model and are not aware of the necessary interdependencies. For instance, these solutions may tell you an application is experiencing high latency, but they won’t be able to pinpoint for you the cause of the latency. Given that the IT team has seen a continuous stream of cutting-edge technologies everywhere else in their IT asset portfolio, it’s time for them to start demanding the same treatment from management vendors.
III. Top-Down Management Is a Must
The bottom-up management model has not served IT well. Therefore, IT is still unfortunately left with a compelling operational need to manage the IT assets that have been amassed in a way that correctly reflects how they have been deployed to run the busines. The assets have been deployed as a set of interwoven, interdependent endpoints that produce and consume applications and services that effectively run the business from potentially anywhere, anytime across a common infrastructure. Yankee Group calls this business-oriented model “top-down management” because the endpoint devices and their applications, which are the fundamental productivity tools of the business user and business services, should be at the top.
Knowing details about these endpoints and applications, how they normally behave over the infrastructure and when they are adversely affected is becoming mandatory information for identifying complex problems. This top-down model will not be constrained in the same way the bottom-up solutions were. IT will be aware when there is a problem because the interdependencies among endpoints, applications and infrastructure are well understood. Moreover, identifying a problem is also a much simpler task because getting to a problem is a matter of knowing what has changed among all the elements that are affecting the other endpoints. This is something else that top-down management provides.
Therefore, the only way to manage these things is from the top down, where the focus is on understanding the nature of the endpoints and how they interact with each other through applications (see Exhibit 4). Application performance cannot be managed in isolation; rather, it needs to be managed in conjunction with other parts of the infrastructure. Every infrastructure object has the potential to affect not only the business, but also other objects; not just in its own domain, but in other domains as well. The infrastructure must be managed holistically.
To understand the top-down concept, it may help to consider it in the context of VoIP management. VoIP is particularly interesting because it brings together many environments that were previously in separate silos. Voice must now leverage the same infrastructure as every other service.
Problems in the traditional, TDM telephony world were few and far between so bottom-up management was adequate. But now, take communication that used to run on a completely independent, highly available and static infrastructure and move it to a combined infrastructure that is less available and much more dynamic, and the results will be fairly challenging. For example, what happens when a VoIP user in real time calls the help desk angrily from his or her cell phone stating that VoIP connections were dropped three times in a minute? Telling the user that, according to the latest mean opinion score (MOS) readout—a standard developed by the International Telecommunication Union (ITU) to measure call quality— network quality is just fine will not make him or her feel like the IT department is providing good support. This is a perfect example of where bottom-up management breaks down: The MOS is the infrastructure’s perspective on a VoIP application, not the endpoint’s or its end user’s. How will this top-down approach work in the VoIP world? It will start by calling out all the component parts.
In a basic VoIP infrastructure, there will be three types of endpoints: the IP phones, the IP call managers and the IP voicemail system. And although we talk in terms of VoIP service, there are really a number of underlying applications including the call signaling, the RTP voice stream and voicemail. Understanding how these endpoints and applications normally interact with each other is essential for properly managing this infrastructure.
Let’s take the example of the angry end user with dropped VoIP connections. The top-down approach should have an understanding of the application experience of the VoIP phone and how it differed from its normal state at the exact time that the calls began to drop. Was the call fluctuating with bursty traffic streams? Did it just stop emitting packets? Was the communication getting starved on the way to the other phone? Being able to answer these questions is a great start to help determine what was really going on at the time. Although problematic from the end user’s perspective, the dropped calls are actually just a symptomatic result of some core problem yet to be determined. And very likely that one VoIP endpoint is not alone in experiencing those issues.
The top-down approach sets up a model where IT is in a much stronger position (see Exhibit 4) to communicate with end users. In fact, IT is seeing its IT assets in the very same way that the business is leveraging them. The result is a situation where IT is far more responsive to problems and for the first time has a structured chance of seeing problems before its end users do.
IV. Top-Down Management in Action
Xangati’s Rapid Problem Identification Solution
Xangati delivers a leading top-down management framework in the form of a rapid problem identification (RPI) solution. The concept of RPI leverages the top-down approach (what Xangati refers to as “endpoint-in”) to help IT quickly uncover the source of application performance-related issues. Xangati focuses on delivering RPI by providing IT with knowledge of the application experience of every endpoint. This model is centered on the concept that any problem with a networked application will ultimately reflect itself in a change in the endpoint’s experience with the application; conversely, there is no real problem to be reckoned with if it does not manifest itself at an endpoint.
The heart of the Xangati RPI model is the precision profile that is generated by leveraging flow data (e.g., NetFlow) from designated switches and routers. The profile is a way to provide incredibly detailed information about the application behavior of an endpoint without investing in costly software agents or hardware probes. These details establish a normalized pattern of application behavior for every individual endpoint, letting the RPI system know the following things:
- Which applications does an endpoint produce or consume?
- At what performance levels does that endpoint use these applications?
- During which time periods is an endpoint busiest with an application?
- How many other endpoints does the endpoint normally communicate with for each application?
- What networks does this endpoint normally function on (e.g., specific subnet, VPN, wireless network)?
The RPI solution automatically learns and establishes this detailed application behavior (and re-learns behavior when it changes) for every endpoint, to a scale of tens of thousands of endpoints. The solution then continuously compares each profile against the real-time application behavior of the endpoint. Through this active comparison the application experience of the endpoint is understood; and when the experience differs from the norm established by the profile, the IT organization is alerted. These abnormal application experiences documented by the RPI system actually catch two things: a set of symptoms of adverse application behaviors for a variety of endpoints, and the core problem causing the symptoms in the other endpoints. As the following case study shows, IT can use this symptom information to then correlate to the core problem.
Bernalillo County Case Study
Bernalillo County, which includes the city of Albuquerque, is the largest county in New Mexico in terms of both area and population. Under the leadership of CIO Paul Roybal, the county’s IT organization had executed very successfully on its strategic objectives during a dynamic 2-year period in which it evolved from an internal IT shop to one that also delivers services that can be used by every member of its community. To do so, the organization had a sharp focus on delivering innovations that had direct business impact on productivity and customer satisfaction.
More recently, the IT organization was hit with a spike in enduser complaints tied to the onset of sluggishness in the centralized applications that were serving the county’s various departments. Bernalillo’s IT team was unable to resolve these issues with the aid of their traditional tools, which is when they contacted Xangati. Within the first couple weeks of deployment, the Xangati RPI solution was able to shed light on Bernalillo’s application performance issues.
The precision profiling was the key to unearthing these issues. The precision profiles identified the collective set of endpoints that were receiving their applications at a slower rate than normal. At the same time, precision profiles discovered that the domain name service (DNS) was beginning to degrade in performance and that one of the servers that delivers DNS was intermittently responding to name resolution queries at a much slower rate. This set of symptoms provided fertile ground for Bernalillo’s IT team to correlate the issues to understand the source of the problem. DNS is an essential service required by nearly all client endpoints to get to their desired servers (both intranet and internet) by resolving a name to an IP address. Without a fully functioning DNS server pool, end users will feel the application performance sluggishness that Bernalillo did. By being able to correlate the symptoms and to tie them together under the common aspect of sharing the same intermittently failing DNS server, the IT department could get to the heart of the problem. The resolution was simple at that point, which was to replace the ailing DNS server. From there, application performance was restored and the user issues disappeared.
Through the RPI model, Bernalillo IT now has the ability to track down the sources of a comprehensive set of application performance degradation issues such as the following:
- A backup process that does not occur when normally scheduled
- VoIP call managers that are not functioning at their normal level
- A failing RADIUS server that is causing problems for Wi-Fi access
- An endpoint using so much bandwidth that others cannot access the WAN and internet
- A P2P server on the internal network
To summarize the experience, Paul Roybal remarked: “My team and I were incredibly pleased to see the immediate benefits of the Xangati solution. We had other products on hand that allowed us to monitor interactions, but these tools were not putting us in front of the problem.”
V. Conclusion: IT Has a Mandate for Top-Down Management
Many trends are coming that will radically change the IT landscape, creating an environment where any information can be delivered to any user, anywhere at any time. This will put an even further strain on the IT department’s ability to manage the technology as well as increased user expectations. Spending 80% of a company’s IT budget to “keep the lights on” will not allow organizations to continue to innovate. To accomplish this, the management of the IT infrastructure must also go through a major transformation as the traditional bottom-up method of managing IT is no longer sufficient. Top-down management will allow IT managers to have a better idea of how the technology is performing through the user’s eyes, and will allow IT to be much more proactive and productive because the majority of their day will not be spent in front of a monitoring station. The top-down approach of infrastructure management leads us closer to the holy grail of management: an organization in which IT is the first to respond to problems, not end users. Only once all the pieces are in place and end users don’t experience issues before they are identified can we move toward business productivity management as opposed to a reactive element management.
With enterprise networks rapidly changing, IT management has become increasingly difficult and complex. With business initiatives becoming increasingly dependent on IT, it will be essential for enterprises to have a holistic and comprehensive top-down management system in place to support these business initiatives. To help companies get started with topdown management strategies, Yankee Group makes the following recommendations.
Recommendations
- Cap investment in bottom-up models. Although most companies have spent millions of dollars on traditional management platforms, further investment may be throwing good money after bad. Take the time to understand where the management gaps are and buy a solution that meets that need versus investing further in the frameworks.
- Stop playing ping-pong with end-user support. An enterprise’s IT and communications teams can no longer operate as independent silos. Companies need to bring together the different operating groups within IT and deploy a management tool that can identify the source of the problem, not just alert that there is one.
- Reduce mean time to identification, not repair. To fix problems faster, companies need to focus on where the bulk of their energy is spent—and that’s identifying problems. Once the problem has been identified, resolution is relatively quick. Invest in management solutions that focus on problem identification.
- Change the process when you change the management tools. Companies that deploy new management tools but not the IT department’s process will not get as much bang for the buck as companies that think about problem resolution as a process, not just something that’s in the way.
Vendor Profile - Xangati provides a rapid problem identification (RPI) system which IT organizations leverage when first responding to end-user, application and network performance and availability issues. Xangati RPI enables its users to swiftly and effectively track down root cause. Enterprises, government organizations and service providers alike use Xangati to accelerate problem identification efforts by at least twenty percentâwhich directly results in major productivity gains for the technical teams and their end-users.
Continue reading other LoveMyTool posts on Xangati »




