About a year ago I drove in the snow from Portland, Maine to the outskirts of Boston to meet with the regional IT director of a popular five star hotel chain. After waiting for 2 hours a sweaty apologetic regional IT director came rushing to greet me. He sat me down with me and told me that he could spend 10 minutes with me. During my 6 minutes, he was beeped once received two mobile calls and was called over his radio link. This was a man living a reactive IT firefighting hell. His was an IT department out of control. Driving back in an amazing snow storm I was reminded of my own days of fire fighting while working as a Network and IT manager in London, England. Although I had never found myself in quite the same spot, I knew that if our organization had not taken the proactive approach to automate our network management, monitoring and notification I could have quite easily found myself in the same situation. This article focuses on the fundamentals and advantages of network monitoring.
Early on in my IT career when working as a network engineer, my mentor advised me that it was good practice to start every day with a remote console into our Novell servers and to check the systems logs and to look at system parameters. These were then recorded in the log. With 5 Novell servers this took a good 20 minutes each day. When I started trained networking staff I religiously passed on this tradition. When I was made head of a newly formed Enterprise Management team, I was astonished to that a member of the operations team was spending 14 days each month looking through error logs on multiple systems. Clearly things had got a little out of hand.
At this stage we were looking after 300 systems over a VPN mesh (see previous articles), across 35 countries and had started to transition from a site centric organization (bodies on site) to a system centric model, where groups with a particular technical expertise would look after systems that fell into their area. The grand job of checking that all the parameters (CPU, File allocation buffers, Disk Space, var/log/messages, NT Event viewer, exchange link, etc) fell to our operations department. It occurred to me very quickly that we were really no longer being proactive. If it took us two weeks to look through every system, then in fact we would look at each system only twice a month. As far as disk space or critical errors were concerned these could occur 20 minutes after you checked the system concerned and you would not realize it for two weeks or until the system fell over.
Alongside this, as an IT outsourcing Company we had customers operating on many different Network Operating systems, with a myriad of different applications, all connecting back to our Network Operating Centers in the UK and USA over a VPN mesh. Clearly it was time to invest in a network monitoring tool.
What is Enterprise Network Monitoring?
Enterprise Network monitoring is the systematic analysis of varying network environments, and the presentation of the information in such a way as to clearly identify immediate and future problems.
Where to begin
We started by compiling a list of questions for our vendors and undertook the task of interviewing vendors with the goal of picking a single vendor to provide us with the tools we needed to proactivly monitor our networks. These were some of our requirements.
1. Monitoring Network Operations Systems (Windows, Novel, Linux, Unix)
- IP connectivity
- CPU utilization
- Available Disk Space
- Running processes and services
- View system logs
- Perform trend analysis on disk/cpu and memory utilization over time.
2. Monitoring infrastructure (Hubs, Switches, Routers, VPN)
- IP Connectivity
- Availability of management services such as SSH, Telnet
- Change in available services
- Ability to receive information on change of status via Simple Network management Protocols (SNMP), such as the change in the configuration of a port on a switch.
- Other options include monitoring available bandwidth on an ISP or CPU utilization on a router.
3. Applications monitoring (Messaging, Accounting, Time Sheets, Data Bases, Intranets)
- Application availability such as http on a web platform
- Availability of back end SQL database
- Other options you may wish to consider are the interaction between applications, such as monitoring the ability for one Exchange server to talk to another.
4. Monitoring support structures (Backups, UPS, Environmental)
- Temperature of a cabinet and/or a hosting environment
- UPS Power.
5. Security Monitoring (IDS, Port availability)
- Looking for intrusion patters within a DMZ or on the LAN
- Monitoring the ports on the firewall to make sure that these have not changed
Of all the technologies we evaluated we found that there were many high end products by well known vendors that provided sophisticated and costly products. The cost was in most cases not in the purchase of the products but the cost of the months of consultancy required to implement the product, the training and the ongoing maintenance. The total cost of ownership was high and the return on investment was some way off in a foggy horizon.
There was also the well known industry quote that was telling us that 70% of all in house implementation of Enterprise Management systems failed.
The answer to our problems came from an unexpected corner of the IT world, thesource market.
The single outstanding product in our research was Big Brother, now owned by Quest software. Big Brother allows for the real time monitoring of networks and networking environments via dynamic web pages. A demonstration can be found at http://www.bb4.com/bb/.
Big Brother works by both gathering information from agents that sit on the system being monitored and performing tests against the system being monitored. These systems can be arranged and structured in a distributed model where local network probes can gather data and pass this data upstream. One of the exceptional features of Big Brother is the ability to automatically fail over from one system to another.
Although Big Brother works on a number of platforms. All our implementations were based on Linux Red Hat builds as there are a number of excellent plugins for this platform.
An extensive list of plugins for Big Bother can be found at Deadcat.net I have listed a very few below but it is worth looking through this list to understand the scope of this outstanding product.
Some of the most useful plugins include the ability to plug MRTG (Multi Router Traffic Grapher and to perform trend analysis. A demonstration for this can be found at Big Brother. MRTG will graph data gathered via SNMP over a period of hours, days, months and a year. You can set freeholds and be alerted when certain thresholds are met.
There is a bridge between Big Brother and theSource Intrusion detection Program Snort allowing for a sophisticated enterprise level Intrusion Detection Network to be built (see earlier articles). There are also plugins that allow you to use NMAP to look for a change in available ports on a system and to alert based on the changing port configuration.
There are a number of SNMP modules that allow you to receive SNMP traps from SNMP enables devices.
The client modules are available for a number of platforms and are small, easy to configure and most importantly, very stable.
There are also utilities such as Little Brother which will sit in your Windows system tray and give you an unobtrusive status report of how big brother is running.
Last but not least there is the ability to send notification based on different events.
Although Big Brother was a clear favorite there are a number of other systems that are worth considering. The first is the Big Brother clone, Big Sister. Big Sister will interoperate with systems such as Big Brother and HPview.
MS is one to watch for in the future. It has been getting some good reviews from Dell and IBM, but has some ways to go to catch up to Big Brother.
For those that would like to take a proactive approach to IT and the managing IT resources and systems, network management is an invaluable tool.
US Technical Manager, Nexus.
Senior Technical Consultant, News Views