If there’s any one gripe most users have with their monitoring toolset, it’s got to be a lack of effective event suppression. Virtually every organization I visit struggles with notification fatigue. The modern datacenter has a lot going on, and a lot of potential signs of failure occurring; high CPU utilization, services not responding to requests on occasion, drives filling up. One switch goes down and Administrators suddenly get hundreds of notifications. Most of this can be ignored, but how do we suppress effectively?
For the last quarter century, the general way in which we measure the availability of a networked asset has been the ping or connectivity test. If we get a response from a ping test, the asset is Up; no response it’s Down. Simple as that. But is this the best measurement of availability? Consider this […]
APM Digest recently posted an interesting article discussing the differences between agent-less and agent-based monitoring. I personally can’t count the number of times we’ve heard customers tell us that agents were out of the question. And I can’t blame them, as agents have previously garnered a bad reputation, and the appeal of monitoring without the overhead […]
According to a recent survey by the Ponemon Institute, average reported incidents lasted 86 minutes in 2013. While this is an improvement over results from the same study in 2010 (97 minutes), the impact to the business was actually greater ($690,200 versus $505,500 in 2010). This makes it quite a painful hour-and-a-half, and an area […]
FireScope’s head of cloud development, Pete Whitney, recently published a great article called Genesis of a Genetic Algorithm. Its a great read and really dives behind the scenes with regard to the thinking behind the development here at FireScope. Also, some great insight into how you think about managing your infrastructure.
Radware’s annual 2013 State of the Union: Mobile Ecommerce Performance is out, with a really nice infographic to make the findings easier to digest. This is a follow up to their State of the Union for Ecommerce Page Speed & Web Performance [Summer 2013] which focused on the standard websites for the top 500 internet […]
The average network switch can expose upwards of 10,000 metrics describing availability, performance and security. I’ve seen filers with over 30,000. So, the question I’m posing is, for effective monitoring should I be collecting and analyzing every one of these?
It seems like such a basic question, yet, a lack in a consistent definition is a major problem for ITSM. If we can’t define it, how can we achieve it?