My Photo

Regular Contributors

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

« Measuring for Business Impact – The “Level 4 Measurement” (by Tom Tosh) | Main | Team Building versus Bread Making (by Denny K Miu) »

March 19, 2008

What’s Luck got to do with it? (by Scott Turkow)

Scott_turkow_2Logo_integrienAuthor Profile - Scott Turkow has 8 years of experience in the Enterprise Software space, primarily in Operations and Sales Ops roles. Scott is the Senior Operations Manager at Integrien Corporation, the leading intelligent systems management company that enables the predictable operation of mission critical applications. Prior to Integrien, Scott was with the Resource Management Software Group of EMC, which focused on the development and sale of automated network management products. A tri-athlete in training, Scott tries to be outdoors when he’s unshackled from his computer.


Q: What’s Luck got to do with it?

A: For some, everything.


You’ve got a firewall, a disaster recovery plan, redundant hardware, change management, and trouble ticketing. And then there’s monitoring. Monitors for your network and app performance data – check. Monitors for your O/S and Server data – check. Monitors for Database metrics – check. Storage device metrics – check. Transaction response metrics – collected. In duplicate!

Your 1985-2007 IT Strategy was recently revamped from “If we collect more we’ll have a better chance of detecting problems” to a superior 2008 Strategy “If we collect more and collect it more often, no problem can sneak by us.”

You’ve done it; you’re the master of your IT cosmos! You can proudly don your “Best IT Manager Ever” t-shirt to the office on Monday. Truth is, you were going to wear it anyway, but better to wear it like you mean it. All is good in your world, when suddenly a system slowdown elevates to an outage in a matter of minutes and thousands of users are dead in the water. Feeling like an actor in a Southwest “Want to Get Away” commercial, you begin to ask yourself – what went wrong? With St. Patrick fresh in your mind (and Guinness not so fresh on your breath) you realize your luck simply ran out. Or for those of you who don’t believe in luck – your poor reasoning and wishful thinking caught up with you.

You may be surprised, but the above theory - “collect more stuff as fast as possible and we’ll be okay” combined with a heavy dose of luck – is an actual strategy employed by some companies (not you of course).

We recently met with an online banking company that was collecting 2,500 metrics from a load balancer every 10 seconds. When asked the obvious question – “Why?” – the answer given was “We don’t really know what to collect, so we collect it all, and we do so at as fast an update rate as possible. We don’t want anything to slip by…” But the team agreed that problems did slip by, again and again.

Luck can not save you (although the lottery might!). Collecting more metrics at faster update rates will not save you. The reality is that more data collection will not lead to better chances of problem identification. In fact, the costs outweigh any perceived benefits. No matter how many defense systems and processes you put in place, things slip through and problems occur. And given that your organization’s productivity, value of brand, and a little thing I like to call “revenue” are at stake, it is your fiduciary responsibility to find a better way.

Switching to a metaphoric example - flooding – detection of water spilling over the levee is insufficient. You must become more proactive in order to reduce, or avoid, impact to your customers. Alerts storms of “levee breaches” are not helping anyone. Detecting early warning signs and ultimately preventing the failure of the entire levee system is a much better option.

IT has become far too complex for old defensive methods of collecting and monitoring. System interdependencies, not individual component failures, are the primary threat to enterprise IT performance and availability. Tools that look at the business service holistically and utilize sophisticated, real-time analytics to forecast possible problem areas provide the best opportunity to become truly proactive. While luck may help you draw that inside straight during your weekly poker games, it has no place in Business Service Management.


Continue reading other LoveMyTool posts by Scott Turkow »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2534758/27227568

Listed below are links to weblogs that reference What’s Luck got to do with it? (by Scott Turkow):

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Scott - Great article !! Many network managers do not know the network business case or real value and that is a major mistake many upper managers make. As you pointed out it is essential not to get caught in the "more is better" in network management. No, it is vigulance and understanding of the network as well as its value to the success of the company. Great Job....Tim

Post a comment

If you have a TypeKey or TypePad account, please Sign In

LoveMyTool Daily PIC

*** Senior Moments ***

News from Our Sponsors

LoveMyTool Local Search



Recent Comments

Popular Incoming Sites

Upcoming Conferences

LoveMyTool Visitor Stats