The best server monitoring tools for developers
— Last updated: 2021-09-15
From 2009-2018 David was CEO at Server Density, a SaaS server monitoring startup. He has been a developer for 15+ years and is now co-founder of Console.
✦ Disclosure: All reviews are editorially independent and partners must meet our selection criteria. Where indicated, we work with some partners to provide extras to our audience, but do not accept payment for reviews.
The best server monitoring tools have lots of high quality integrations and plugins, have highly flexible graphs, and provide advanced alerting functionality so you can be notified when things go wrong.
We tested 18 server monitoring services using our independent selection criteria and the requirements described below. The best server monitoring tools for developers and devops engineers in 2021 are:
In this article, we explain why. We assessed the key features all server monitoring tools should have - plugins, graphs and alerts - and also considered features like dashboards, events, incident management, runbooks (usually part of alerts) and team collaboration.
This review will help you decide whether to pick hosted SaaS server monitoring or self-hosted monitoring, and then recommend which monitoring product is the best based on our 15+ years of development experience.
tl;dr the best server monitoring tools
The best hosted SaaS server monitoring tools:
The best self-hosted on premises server monitoring tools:
But do you still need server monitoring? New applications might be serverless and cloud-first, but servers still exist! Whether you are running a Kubernetes cluster, training machine learning models on GPUs, or just running an Nginx load balancer that you want to control directly, you are going to need to monitor those servers.
Even when you are using cloud services, modern monitoring doesn’t just mean monitoring servers - the best monitoring tools cover your entire infrastructure, from cloud services to serverless and from log search to application performance monitoring. Legacy applications still rely on servers, both virtual machines (VMs) and bare metal, and with cloud proving to be extremely expensive as you scale more engineering teams are back to deploying servers.
You might not be at the scale of Dropbox who famously moved away from Amazon S3 for storage (but not analytics), and you may not need to build your own data centers like Apple, but spinning up EC2 VMs, or deploying code on Digital Ocean droplets, is still common. Servers are where it starts.
The three most important features every server monitoring tool must have are: integrations / plugins, graphs, and alerts. Other features might be useful to improve the product experience, but modern server monitoring tools in 2021 must have these three.
For each of our server monitoring tool reviews we will assess functionality based on our standard selection criteria and these three requirements:
Integrations / plugins
Every monitoring tool should collect basic system stats like CPU usage, process lists and disk space, usually via a monitoring agent that works on Linux, Windows, and other platforms or operating systems. However, monitoring is useless without integration into everything in your tech stack. Can it monitor your database, web servers, load balancers, network, and the application itself? The list of integrations is the most important factor in picking a server monitoring tool.
The number of integrations is important, but so is the quality. The advantage of plugin systems is that anyone can build integrations, but that is not helpful if they are later abandoned or not kept up to date. There might be a long list of integrations, but do they work well?
Modern monitoring software tools must graph and visualize the monitoring data. Simple graphs may be all that is needed, but the best server monitoring services have sophisticated graphing capabilities such as filtering, different types of graphs, percentile breakdowns, trend analysis, annotations, etc. Graphs should be customizable and load quickly. Collecting lots of monitoring data is no good if you can’t visualize it.
Monitoring has two purposes - helping you debug problems that have already occurred, and alerting you when something is going wrong (ideally before it causes an outage). Alerts are a key feature in all server monitoring systems.
Alerting is broken down into two parts:
- Triggers: Metrics cause alerts to trigger. This can be based on simple thresholds e.g. is CPU load over 1.5? It can also be much more complex. Different values can trigger at different thresholds e.g. warning vs error. It can be based on relative values or % change over a specified time period (delta alerts). Alerts could trigger for anomalies, or as composites of multiple metrics with conditions that must all exist before an alert is triggered. This can get complicated so it is important to have flexibility to define what you care about.
- Notifications: Once an alert is triggered you need to be notified about it. Email notifications are fine if you are in your inbox and/or the alert isn’t time-critical, but you need rules to define when, how, and who is notified. Perhaps you want text messages or push notifications to a mobile app. Maybe Slack notifications that someone has to acknowledge are better. Or integrations into incident management tools like PagerDuty and ServiceNow. The ability to customize notification configurations is important.
The best monitoring tools in 2021 don’t just monitor servers, they also have lots of other features. These include incident management, runbooks, team collaboration, machine learning for anomaly detection and suggested remediations, error tracking, and many other complementary features.
You should expect to find monitoring vendors offering other products like application performance monitoring (APM), log search, security monitoring, real user monitoring (RUM), profiling and tracing.
In this review we are only reviewing server monitoring products. We’ll comment on complementary features if they are present, but other monitoring products are out of scope.
Hosted SaaS server monitoring tools
Pros & cons of hosted SaaS server monitoring
Back in the old days there was no choice - you had to run your own monitoring on premises, either using an open source product like Nagios and Zabbix, or by paying to license an enterprise monitoring solution.
Today, there is a much greater choice between whether you self-host your monitoring or you pay to use a hosted SaaS cloud monitoring tool. Not only is there a choice between these two deployment models, but there are now lots of tools in each category.
The key difference is whether you want to manage your own monitoring, or whether you want to pay someone to do it for you. Deciding whether you self-host or buy hosted SaaS monitoring is the deciding factor about which tools are appropriate - most tools are one or the other.
What are the pros and cons of hosted SaaS monitoring?
- Let someone else deal with it. Is monitoring part of your core business? If not, investing in your product is a better use of time and money. Let someone else deal with ensuring the monitoring is working, scaling, and under active development.
- Keeping up with integrations. One of the reasons we created Console is to help developers stay up to date with the high velocity of releases in the tech industry. There are always new software releases, new cloud services, new APIs. Keeping up to date is difficult, but when you pick your tech stack you want to be sure you can monitor it. The best monitoring products are always up to date with the latest releases and will integrate with new services as they are announced. Keeping self-hosted on-premises monitoring up to date is a challenge.
- Saving engineering time. Deploying hosted SaaS monitoring still needs time from your engineers but it is mostly integration work. These are specific to your environment rather than the undifferentiated heavy lifting of ensuring reliable alert delivery or managing a time series database. SaaS monitoring tools handle all that for you.
- Expensive licensing. If you are operating at large scale, you probably have hundreds or thousands of servers, millions of metrics, and many TBs of logs. This makes SaaS monitoring very expensive because it is not free and open source, so the cost can be difficult to predict.
- High network traffic egress. Hosted SaaS monitoring is outside your network so you have to pay for all the outbound network traffic. At large scale you may be able to set up a peering or interconnection relationship but most of the time your monitoring traffic will egress over the public internet. This can become expensive.
- Data protection. The best SaaS server monitoring services have lots of certifications to ensure compliance with data protection regulations, but there is still a nervousness about sending sensitive monitoring data to third parties. In reality, most monitoring data is numerical and lacks the context needed to infer what’s going on by itself, however there is the potential for accidentally leaking data into the monitoring environment. This is more relevant for log monitoring than server monitoring.
Best hosted SaaS server monitoring tools
Datadog is the industry leader with the most comprehensive and up to date set of integrations, however this also means it has a more complex UI.
New Relic is best known for application performance monitoring (APM) but can now monitor everything, including server monitoring. It has a clean UI but has more limited alerting capabilities.
Also considered: Dynatrace, Logic Monitor + 2 more
These are the other hosted SaaS server monitoring services we tested. They are not as highly rated as the two options above, but may be worth considering.
AppOptics is part of the Solarwinds cloud monitoring suite and integrates into their APM and log management tools. On the surface it has many of the same features as Datadog and New Relic - integrations into cloud and open source infrastructure software, configurable dashboards, and alerting on metrics with thresholds based around trigger conditions and aggregations.
However, when you start using the product you discover that it lacks depth. AppOptics is missing the same degree of flexibility as Datadog’s alerting and there is nothing similar to the power of New Relic’s query language. All the major integrations are there - they state 150+ on the website - but if you have anything new or unusual in your stack it may not be supported.
With pricing ranging from $10-$13/server/month depending on annual or monthly contract it’s definitely cheaper, but if you’re going to spend $10/server/month then you may as well pay a bit more for Datadog or New Relic.
Dynatrace has been around in various forms since 2006, starting as an application performance monitoring tool and then expanding to cover all aspects of monitoring, now including server monitoring. It has a specific focus on full stack “platform” monitoring which covers your entire infrastructure, from logs to APM as well as code level analytics.
Dynatrace can do the job - the server monitoring agent installation went smoothly and it reports the standard system metrics you would expect. However, alert configuration is confusing - there is no “alerts” section in the main UI, for example. Instead, Dynatrace uses “Problems” that are detected based on anomalies, either from automated baselines or built in static thresholds. Evaluation happens on sliding 5 or 15 minute time intervals. These are configured in a separate “Settings” section of the UI which is an unusual approach.
Unfortunately, Dynatrace has the most complicated pricing structure we’ve ever seen. For example, although the Dynatrace website lists 400+ integrations, only 150 of those are agent integrations and these consume a billing metric called Davis Data Units (DDU). Each data point consists of 0.001 DDUs but is weighted depending on the amount of RAM a server has, with different weights depending on whether the monitoring agent is in “Full Stack” or “Infrastructure Mode”. The former gets 1000 custom metrics per host and the latter gets 200 custom metrics, which don’t consume DDUs. Integrations into cloud APIs and serverless products also consume DDUs.
The default Full Stack Monitoring mode pricing starts at $69/server/month or $21/server/month for Infrastructure Monitoring mode for 8GB hosts. A free tier of 200,000 DDUs is included which equates to 381 metrics collected at 1 minute intervals, and the two different monitoring modes include a metric quota. It’s not clear how that relates to real-world monitoring, but you can buy top ups. The documentation states DDUs are bought in groups of 1 million, but the pricing page says $25/month for 100k DDUs (billed annually). It’s unclear if alerting is billable.
This makes it difficult to calculate how much monitoring a server would cost. It’s not clear why pricing is different for servers with more RAM and the distinction between full stack and infrastructure mode is not intuitive. $69/server/month is high relative to the competition.
Given the limited number of integrations, unusual approach to alerting, and complex pricing, we do not recommend Dynatrace for server monitoring.
Logic MonitorNo rating
Logic Monitor tricks you into thinking you are signing up for a trial but you can’t actually test the product without speaking to someone - you are actually “requesting” a trial, not signing up. Coupling this with the fact that they do not list their pricing anywhere makes for a frustrating experience for a developer who just wants to try the product themselves.
Our selection criteria require self-service signup so we were unable to evaluate Logic Monitor.
Lightstep is designed primarily as a tracing tool that also supports infrastructure metrics. It doesn’t have its own monitoring agent so it requires you to push data from one of several supported sources - Datadog, Prometheus, AWS, Google Cloud or OpenTelemetry. As such, we decided not to review it as part of this article on server monitoring. It will be covered in a future Console review of tracing and observability tools.
Self-hosted server monitoring tools
Pros & cons of self-hosted on premises monitoring
Self-hosted monitoring is different from SaaS monitoring where everything is done for you in a single product. Although there are all-in-one self-hosted server monitoring software tools, for the best setup you really need to deploy several products and integrate them. This is more operationally challenging - demonstrating the value of SaaS monitoring - but allows each tool to focus on what it does best.
Once you have everything set up, running your own self-hosted monitoring means you have more control, you can keep network egress costs low and you don’t have to pay monthly fees. However, you have to ensure your monitoring is reliable, and scaling data storage as you collect more data is a difficult problem. Is this really a good use of your engineering team?
What are the pros and cons of self-hosted on premises monitoring?
- You have complete control. You can choose which software you prefer, how it is deployed and integrated, how long you keep data for and where that data is stored. This last point is important in some regulated sectors where data cannot leave your environment and/or you need to control which countries data is stored in.
- Traffic stays in your network. You can manage the security of your monitoring by ensuring that monitoring data is only transmitted over specific network infrastructure such as a dedicated monitoring subnet, VPC, and/or encrypted links. It also means traffic remains internal - monitoring software can generate large volumes of network traffic, which can be costly if it has to egress your network to a hosted SaaS monitoring product.
- No licensing costs. Hosted SaaS monitoring is usually billed on a usage basis - per server, per GB of log storage or per metric. This starts out cheap but as you scale, monitoring can become a significant expense. For volume or metric based pricing this provides a negative incentive - the more you monitor the better the view you have of your infrastructure, but the more it costs. Running your own monitoring usually means using free, open source software although there are still some legacy on-premises enterprise monitoring products.
- Monitoring your monitoring. Deploying your own monitoring means you need to consider reliability, redundancy and backups. It’s not very useful if your monitoring also goes down when production goes down! This means deploying in entirely separate infrastructure, setting up monitoring for your monitoring, and regularly testing things like whether alerts are being delivered.
- Scaling time series is hard. Storing time series data with high availability and low retrieval latency is a difficult problem. This is why hosted SaaS monitoring is more expensive the longer you want to retain your data - it’s expensive to keep large volumes of data for fast querying. Running your own monitoring means you need to deal with storing everything.
- Higher engineering costs. You might not pay for a software license but you will pay for the engineering time to deploy and maintain the software. With modern monitoring tools focusing on a single component, you will have to install and maintain several separate tools e.g. Grafana for visualization and Prometheus for storing the time series data. Maintaining and scaling independent monitoring infrastructure is only a trivial problem if you’ve never done it before, especially at scale. Is this something you want to dedicate your engineering teams to?
Best self-hosted server monitoring tools
Also considered: 8 more
These are the other server monitoring tools we tested. They are not as highly rated as the two options above, but may be worth considering.
checkmk is an unusual project because the free open source edition (checkmk Raw) is based on Nagios but the other “Enterprise” editions (available in Free, Standard and Managed Services variants) are unrelated to Nagios.
The Nagios-based Raw edition is in maintenance mode but does have an active community around plugins and a discussion forum. There is an upgrade path to the Enterprise products for when you need a higher performance backend, high availability, and/or technical support.
The main reason to use checkmk Raw is if you have decided to use Nagios (or perhaps are already using it), but want extra functionality like a modern UI and graphs. This scales up to a few hundred hosts but then starts to hit resource limits (because of the Nagios core).
Checkmk enterprise has improved scalability and adds support for container monitoring, high availability deployments, 1 second time series granularity and more visualization options. The free version of the Enterprise edition supports up to 25 hosts. Above that, it is priced based on the number of “services” monitored - the average host will report 30 services. A license for 3000 services (~100 hosts) costs $720/year.
We did not evaluate Checkmk Enterprise because this section only covers open source software. We don’t recommend Checkmk Raw because it is based on Nagios Core and is only receiving bug fix updates. There are more powerful options to choose from.
Graphite is a time series datastore that can generate graphs. It was originally released in 2006 and open sourced in 2008. It’s still in active development but is focused on a very simple use case - storing and displaying monitoring data. It relies on third-party tools to get data in.
Graphite has its place, but doesn’t compare to the other tools in this category. If you have very simple requirements for graphical monitoring it will do the job, but if you need more sophisticated visualizations or alerting then there are better options.
Icinga was originally a fork of Nagios and continues to maintain compatibility. It has a more modern UI but is not really on-par with the power of Grafana or InfluxDB. It understands the concepts of cloud and containers but is still very much host-based.
The biggest omission is the lack of any meaningful visualizations. It represents data through single, static numbers and color associations such as red = bad. If you think server monitoring should include graphs then Icinga isn’t for you.
Nagios is one of the oldest monitoring products around, having originally been released in 2002. Its age shows, and although it is based around an open source core it is really just a lead generation tool for the enterprise version - Nagios XI. This is a UI on top of the Nagios Core backend.
Nagios comes from the era of individual servers that sit around for a long time. It’s architecture is based on a pull model where a central Nagios server connects to agents running on each node, triggering a check cycle and then returning the results. This model struggles in modern cloud environments where instances are ephemeral, especially with containers.
Everything is defined in agent config files, including alerts, which limits the flexibility if you need anything more than simple threshold-based monitoring. As such, it’s difficult to recommend Nagios for modern deployments given the sophistication of the alternatives like Grafana, Prometheus and InfluxDB.
Netdata is an open source monitoring product that has an installable agent which defaults to local storage of metrics on a single host. Users connect to the monitoring UI running on each host, but it supports streaming data to a cloud UI hosted by Netdata themselves. This is unusual because Netdata does not store any metrics, instead pitching their cloud service as a “war room” that can be used by multiple users for incident management. It is also provided free of charge, with a paid service planned for the future.
Metrics are stored in-memory with the default configuration of around 2,000 metrics collected per second able to be stored for approximately 2 days (see the Netdata storage calculator). If you want longer term storage, it integrates with a range of metrics database backends, such as Prometheus, AWS Kinesis, ElasticSearch, InfluxDB, New Relic and PostgreSQL.
Alerts and notifications are configured in local config files per host, with alarm status reported back to the UI. This means if you want to configure the same alert on many hosts, you need to figure out how to distribute the config files across them all.
Netdata is a good solution for a small number of nodes where you are only interested in live streaming metrics, but can get complicated quickly as you scale or if you need consistent alerting and want to store data for more than a few days.
OpenTSDB is a time series database built on top of Hadoop and HBase. It’s a good option if you need to store a huge volume of time series data but it’s not a monitoring tool by itself.
Running HBase/Hadoop can also be a huge operational challenge, so if you are interested in OpenTSDB then you may want to consider Google Cloud Bigtable which is compatible with the HBase API.
Opstrace packages several open source monitoring tools
Grafana, Prometheus) to make
deployment and management easier. This is still an early-stage project so we did
not evaluate it, and there are some limitations such as
to use a
*.opstrace.io domain unless you contact them about a commercial
version. We’ll reevaluate Opstrace in the future.
Sensu is open source monitoring software, but only if you compile it from source code yourself, then combine it with the web UI project. This is not obvious because when you download Sensu from their website, you are actually downloading Sensu Go binaries - a commercial release. This is free for up to 100 hosts but then costs $3-5/host/month. The open source version is also missing features such as dashboards and many third-party integrations.
This section only covers open source options so we did not evaluate Sensu Go.
SigNoz is a metrics monitoring tool built around OpenTelemetry that uses Clickhouse or Kafka + Druid for its backend (you can choose which one you prefer). It is marketed as an open source alternative to Datadog and New Relic, but currently doesn’t come close to their functionality. This is an interesting project to follow, but is still very early in development and not on par with our main recommendations above.
Zabbix is an monolithic monitoring tool written in PHP with Apache or Nginx as the frontend and several options for the database backend (MySQL, PostgreSQL, TimescaleDB, Oracle). It’s actively developed and supported by a commercial organization but remains an open source project anyone can install without any limits.
Zabbix assumes you know what you’re doing when it comes to managing the backend, which means understanding how to properly deploy, secure and scale the database. This includes deploying the database and web server components in a high availability configuration, which you must manage yourself.
Zabbix consists of several components, the main two being the Zabbix Server and the monitoring agent installed onto each system you want to monitor.
Agents can either communicate with the server directly or they route through a Zabbix Proxy that alleviates load on the server. This is because Zabbix comes from the same world as Nagios where checks are actively polled. The server contacts every agent and requests data, which then triggers the checks on each monitored host. This is the opposite of the approach used by tools like Datadog where each agent runs independently, posting data back to the central server. The Zabbix Proxy sits in the middle, requesting checks from its connected agents so that the Zabbix Server only needs to deal with a smaller number of Proxies.
Zabbix supports collecting data at 1 second granularity and has a range of built-in monitoring metrics, which Zabbix calls “items”. Using the newest version of the Zabbix agent brings built-in plugins written in Go for products like Ceph, MongoDB, MySQL, Docker and Redis. However, these are limited in number so you need to rely on community plugins.
Alerting is configured through triggers and notifications, with expressions defining the alert conditions that send notifications through email, SMS and web hooks. Examples are provided for common destinations like PagerDuty, Opsgenie, Slack and Microsoft Teams. Zabbix’s age shows when configuring SMS alerts because it assumes you have a GSM modem connected via a serial port! You’re probably better off using a webhook to Twilio.
Zabbix is a powerful open source monitoring tool but it feels somewhat dated compared to more modern options like Grafana and InfluxDB. We like its open source philosophy and it covers most of what you would expect in a server monitoring tool in 2021, but doesn’t quite get our recommendation compared to our two finalists above.
What is server monitoring?
Server monitoring is a type of computer system monitoring that focuses on collecting metrics from servers and the applications running on them. This is usually by installing a software agent onto the server which then collects server metrics like CPU usage, disk space, memory usage, and network traffic. The agent integrates into other software in the tech stack, such as databases, web servers, mail servers, queues, etc, so every aspect of your infrastructure can be monitored.
How does server monitoring work?
Server monitoring works by installing a software agent onto the server that can then collect system metrics like CPU usage and network traffic. The agent usually works on different operating systems such as Windows, Linux and macOS, integrating into the full tech stack to allow you to monitor databases, web servers, application servers, etc. Data is reported back to the monitoring tool which then saves the metrics in a time series database so that you can visualize the data on graphs. Alerts are triggered based on the reported data to notify you when there are problems.
Does server monitoring need an agent?
Most server monitoring products have options to use both an agent and agentless monitoring. An agent must be installed onto an operating system such as Linux or Windows, or inside a container, but this is not always possible such as for cloud products like load balancers and managed databases. Agentless monitoring connects to the APIs offered by the software or cloud vendor so that data can be collected even without a monitoring agent.
Why do we need server monitoring?
Server monitoring is essential for any production infrastructure so that you know when things are going wrong, what happened, and how to fix problems. Without server monitoring, you will not know whether your servers are running, how they are performing, and what is causing issues.
What are the best open source server monitoring tools?
Open source monitoring tools are usually self-hosted because you get access to the source code to deploy it yourself. In our review above, the best open server monitoring tool is: Grafana + Prometheus. We also like: InfluxDB + Telegraf.
The main benefit of open source server monitoring is that you can write your own plugins and integrations. Most developers will not want to modify the core product, which is why SaaS monitoring tools usually have an open source agent - so that their users can contribute new integrations. For example, the Datadog monitoring agent is open source and the source code is available on GitHub. The Datadog service itself - the dashboards, alerts, web UI, etc - is closed source.
How much network traffic do server monitoring tools generate?
How much bandwidth server monitoring tools use is important if you have a lot of servers. It can be a problem when using a hosted SaaS monitoring service because you have to pay for outbound traffic. In cloud environments, data egress fees can be expensive so you end up paying not just for the monitoring product but also for sending data into it.
The amount of data used will depend on how many metrics you are collecting, but starting at tens of megabytes per month per server for basic system monitoring. More plugins generate more metrics which generates more network traffic.
In most cases, this will not be a problem until you have thousands of servers, but this can change if you are also using log monitoring. Logs can easily reach hundreds of gigabytes per month per server, especially if the server is doing a lot of work. The trick is to tweak the log verbosity settings and then ensure that the monitoring agent can filter out the unimportant logs on the server before they are sent back to the SaaS monitoring service.
Alternatively, you can use self-hosted monitoring to keep all the traffic within your internal network. For most cloud providers, zone-based traffic is free or very cheap.
Our editorial policy
Why you can trust us
Console is written by developers for developers. Using our decades of experience building software at scale, we apply strict selection criteria to decide which tools we feature.
This includes asking questions like “Would this form part of a daily-use set of developer tools?”, “Would this be used by advanced, power-users?” and “Does it have a good graphical and/or command line interface? Shortcuts? Accessibility?”. The more of these questions we can answer positively, the more likely a tool is to be featured.
We do not accept payment for inclusion and where we work with partners, they must fit our selection criteria before we consider working with them.
About the author
David Mytton is co-founder of Console. From 2009-2018, David was CEO at Server Density, a SaaS server monitoring startup used by hundreds of customers to collect billions of time series metrics from millions of servers. He is also a researcher in sustainable computing at Uptime Institute and affiliated with Imperial College London. David has been a developer for 15+ years.
Console is the place developers go to find the best tools. Each week, our weekly newsletter picks out the most interesting tools and new releases. We keep track of everything - dev tools, devops, cloud, and APIs - so you don’t have to.
Interesting tools by email
Our free weekly newsletter picks out the 2-3 most interesting tools. See the latest email.