Monitoring Tools Comparison

Here are some of the monitoring tools and their features:


  • StatusCake is used for website monitoring.
  • It has a number of features some of which are monitoring uptime, domain monitoring, server monitoring, page speed monitoring, SSL monitoring and virus scanning.
  • You can integrate StatusCake with a number of applications to receive alerts some of which are Slack, PagerDuty, HipChat and SMS.
  • We have implemented StatusCake for monitoring this site! You can find more details here.


"Zabbix is an open source network monitoring tool that works with a centralized Linux-based Zabbix server."
"The Zabbix server automatically collects and parses data from monitored hardware so administrators can check availability and see trends in network performance. The server communicates to the native software agents that are available for many operating systems, including Linux, UNIX and Windows.
For operating systems without an agent, generic monitoring protocols such as Simple Network Management Protocol (SNMP) or Intelligent Platform Management Interface (IPMI) can be used."
You can find more details here.

  • "Nagios is a free and open source application that monitors systems, networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches, applications and services. It alerts users when things go wrong and alerts them a second time when the problem has been resolved."
  • "Nagios sets itself up as the “Industry Standard In IT Infrastructure Monitoring” while Zabbix says it is “the Enterprise-class Monitoring Solution for Everyone.”"
    You can find more details here.
  • Sensu is an application and infrastructure monitoring solution. It is not a SaaS solution.
  • Sensu is easy to integrate with the tools used by the organisation to send alerts and notifications.
  • New servers that are provisioned can be automatically register with Sensu without any manual intervention.
  • Configuration files are written in JSON which makes it easy to automate. You can find more details here.
  • Prometheus is an open source monitoring and alerting tool.
  • Its used for recording time series data which is characterised by metric name and key/value pairs.
  • It can be used for monitoring machines as well as microservices.
  • Prometheus offers reliability and lets you diagnose a problem quickly during an outage. It can display statistics even under failure conditions. You can find more details here.
  • "Sysdig is open source, system-level exploration that captures system state and activity from a running Linux instance, then save, filter and analyze."
  • "Sysdig Monitor combines container monitoring, alerting + troubleshooting with intelligent Kubernetes, Mesos and Swarm integration."
    It is the first container-native Docker monitoring solution.
  • "Sysdig gives you the deployment flexibility you need for clouds."
    You can find more details here.
New Relic Infrastructure
  • "New Relic Infrastructure provides flexible, dynamic server monitoring. "
  • "It runs more efficiently in the cloud with visibility at every layer.By using New Relic APM and Infrastructure together, you get a comprehensive view of the health of your servers and hosts as well as the applications and services they depend on, creating shared context for development and ops teams."
  • "Infrastructure empowers modern operations teams to make intelligent decisions about complex systems, from a physical datacenter to thousands of Amazon Elastic Compute Cloud (Amazon EC2) or Microsoft Azure instance."
  • "Real-time metrics and powerful analytics reduce your mean-time-to-resolution (MTTR) by connecting changes in host performance to changes in your configuration."
    You can find more details here.
AWS CloudWatch
  • "Amazon CloudWatch is a web service that provides real-time monitoring to Amazon's EC2 customers on their resource utilization such as CPU, disk and network."
  • "You can use Amazon CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources. Amazon CloudWatch can monitor AWS resources such as Amazon EC2 instances, Amazon DynamoDB tables, and Amazon RDS DB instances, as well as custom metrics generated by your applications and services, and any log files your applications generate."
  • "CloudWatch Events can send system events from AWS resources to AWS Lambda functions, Amazon SNS topics, streams in Amazon Kinesis, and other target types. CloudWatch Logs can monitor, store, and access your log files from Amazon EC2 instances, AWS CloudTrail, or other sources."
Google StackDriver
  • "Google Stackdriver is a freemium cloud computing systems management service offered by Google."
  • "Google Stackdriver provides powerful monitoring, logging, and diagnostics. It equips you with insight into the health, performance, and availability of cloud-powered applications, enabling you to find and fix issues faster."
  • "Stackdriver collects metrics, events, and metadata from Google Cloud Platform, Amazon Web Services, hosted uptime probes, application instrumentation, and a variety of common application components including Cassandra, Nginx, Apache Web Server, Elasticsearch, and many others. Stackdriver Monitoring discovers and monitors your cloud resources automatically, whether you are running on Google Cloud Platform or Amazon Web Services."
    You can find more details here.
New Relic
  • New Relic is an APM based on the SaaS model.
  • The New Relic dashboard provides real time insights of the health of the software.
  • Custom alert policies can be created for faster incident reporting.
  • New Relic can give intelligent suggestions using machine learning techniques by going through the collected data to find patterns or anamolies. You can find more details here.
  • AppDynamics is an APM which measures application performance.
  • It provides continuous monitoring which allows to detect the slowest transactions which can be optimised later.
  • If the logger logs an error related to a transaction AppDynamics helps to detect the error.
  • Different releases for the application can be compared on AppDynamics. You can find more details here.
  • "DynaTrace, previously known as Compuware APM, is touted as the first self-learning Application Performance Monitoring Software. Through its agent is provides auto-discovered topology visualizations of applications and their components."
  • This is where DynaTrace stands out as an application performance tool.
  • Drawback : DynaTrace is unable to generate correct results due to lack of data. It needs more time and data in order to learn and overcome false positives.
    You can find more details here.
  • Boundary is an APM solution based on SaaS model.
  • It can monitor in house, cloud or hybrid environments. It requires zero change to the application.
  • Boundary collects a massive amount of data and combines them from different sources.
  • It can display this data as a real time visual map.
  • "BigPanda Solutions, transform your IT alerts into actionable insights."
    The main features of bigPanda are
    (i) Algorithmic Alert Correlation that reduces noise,
    (ii) Centralized Visibility,
    (iii) Custom Views that helps concentrate on key features and
    (iv)Smart Ticketing to get real-time notifications.
  • "It integrates with existing IT infrastructure monitoring tools, including traditional monitoring systems from HP and IBM along with others like New Relic, AppDynamics, Splunk, Nagios, Zabbix, Amazon CloudWatch, and more. The platform uses clustering algorithms to aggregate data across multiple monitoring systems."
    You can find more details here.
  • Pagerduty is a SaaS incident reporting tool.
  • Escalation rules and notifications can be defined for a service which routes the incident to the best person able to resolve it.
  • Incident notifications can be received in various ways such as phone call, sms, email or slack. It can be customised according to the severity of the incident, the time it occured.
  • Pageduty provides visual analytics that can help root cause an incident faster. You can find more details here.