One of the goals of a good system administrator is being able to respond to problems before they affect operations. To this end we use various monitoring tools. Over time I have successfully used the following:
“mon” is a tool for monitoring the availability of services, and sending alerts on prescribed events. Services are defined as anything tested by a “monitor” program, which can be something as simple as pinging a system, or as complex as analyzing the results of an application-level transaction. Alerts are actions such as sending emails, making submissions to ticketing systems, or triggering resource fail-over in a high-availability cluster.
Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes.
Zabbix offers advanced monitoring, alerting and visualisation features today which are missing in other monitoring systems, even some of the best commercial ones.
Cacti is a complete network graphing solution designed to harness the power of RRDTool‘s data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.
Munin is a networked resource monitoring tool that can help analyze resource trends and “what just happened to kill our performance?” problems. It is designed to be very plug and play. A default installation provides a lot of graphs with almost no work.
OpenNMS is an award winning network management application platform with a long track record of providing solutions for enterprises and carriers.
Of these, I have used Mon, Nagios, and Zabbix more than any of the others. Zabbix is, for me, the newest one, and I am currently migrating from a Nagios-based solution to a Zabbix solution.
A short comparision of these three tools (this table is excerpted from Wikapedia):
|Name||IP SLA Reports||Logical Grouping||Trending||Trend Prediction||Auto Discovery||Agent||SNMP||Syslog||Plugins||Triggers / Alerts||WebApp||Distributed Monitoring||Inventory||Data Storage Method||License||Maps||Access Control>||IPv6|
|Cacti||Yes||Yes||Yes||Yes||Via plugin||No||Yes||Yes||Yes||Yes||Full Control||Yes||Yes||RRDtool, MySQL||GPL||Plugin||Yes||Yes|
|Nagios||Via plugin||Yes||Yes||No||Via plugin||Supported||Via plugin||Via plugin||Yes||Yes||Full Control||Yes||Via plugin||Flat file, SQL||GPL||Yes||Yes||Yes|
|OpenNMS||Yes||Yes||Yes||Unknown||Yes||Supported||Yes||Yes||Yes||Yes||Full Control||Yes||Limited||JRobin, PostgreSQL ||GPL||Yes||Yes||Limited|
MySQL, PostgreSQL, SQLiteGPLYesYesYesNameIP SLA ReportsLogical GroupingTrendingTrend PredictionAuto DiscoveryAgentSNMPSyslogPluginsTriggers / AlertsWebAppDistributed MonitoringInventoryData Storage MethodLicenseMapsAccess ControlIPv6
- Product Name
- The name of the software, linked to its Wikipedia article. Any
software listed without being linked to its article, demonstrating its
notability, will be removed.
- IP SLAs Reports
- Feature reports on IP
- Logical Grouping
- Support arranging the hosts or devices it monitors into
- Provide trending of network data over time
- Trend Prediction
- The software feature algorithms designed to predict future
- Auto Discovery
- The software automatically discover hosts or network devices it
is connected to
- The product rely on a software agent that must run on hosts it is
monitoring, so that data can be pushed back to a central server.
“Supported” means that an agent may be used, but is not mandatory. An SNMP daemon does not
count as an agent.
- Able to retrieve and report on SNMP statistics
- Able to receive and report on Syslogs
- Architecture of the software based on a number of ‘plugins’ that
provide additional functionality
- Capable of detecting threshold violations in network data, and
alerting the administrator in some form.
- Runs as a web-based application
- No: There is no web-based frontend for this software.
- Viewing: Network data can be viewed in a graphical web-based
- Acknowledging: Users can interact with the software through
the web-based frontend to acknowledge alarms or manipulate other
- Reporting: Specific reports on network data can be configured
by the user and executed through the web-based frontend.
- Full Control: ALL aspects of the product can be controlled
through the web-based frontend, including low-level maintenance tasks
such as software configuration and upgrades.
- Distributed Monitoring
- Able to leverage more than one server to distribute the load of
- Keeps a record of hardware and/or software inventory for the
hosts and devices it monitors
- Data Storage Method
- Method used to store the network data it monitors.
- License released under (e.g. GPL, BSD_license, etc.)
- Features graphical network maps that represent the hosts and
devices it monitors, and the links between them.
- Access Control
- Features user-level security, allowing an administrator to
prevent access to certain parts of the product on a per-user or
- Supports monitoring IPv6
hosts and/or devices, receiving IPv6 data, and running on an