— Each week, console reviews the best tools for developers. Subscribe

Metrist

metrist.io @Metrist_io

Monitor your cloud dependencies.

Observability

Our review

What we like

Monitors not just service status pages, but also synthetic tests against APIs to check things like creating an S3 bucket, upload objects, access them, etc. Supports cloud (AWS, GCP, Azure) and SaaS e.g. GitHub, Cloudflare, Google Drive. Degraded (average latency has increased) or down (not responding) alerts. Optional agent to test from within your own account.

What we don't like

The Metrist agent can auto discover API calls using eBPF on the OS level. This allows unknown APIs to be monitored, but has some limitations on the traffic it can inspect e.g. outgoing TLS/SSL calls, but only on Linux if they use OpenSSL (dynamically linked).

Reviewed: 2022-12-01

Developer Interview

With Ryan Duffield, CTO

2022-11-21

What is Metrist and why did you build it?

Our mission at Metrist is to give software developers and IT leaders the same level of visibility into the third party cloud products that they build on as they have with the software that they build themselves. Apps are built on top of other apps. That starts with anything from a dozen different services at AWS to APIs like Twilio, Stripe, Easy Post and cloud tools like CircleCI and GitHub. If one of those tools goes down, you risk going down or at least having a degraded user experience or an inability to ship code.

The problem that we identified is twofold - one is it's really hard to either find out about or verify that it's a third party that's causing the problem, not you and your code, and the things you control. The second thing is it's really hard to hold your vendors accountable to what their SLAs are. Metrist empowers people to monitor the services that they rely on. We put the health of all of your third party cloud dependencies into a single dashboard, alerting you about outages typically 10 to 20 minutes before a status page gets updated. We provide you enough details to not only answer the question is it me or is it them? But also answer, what is the problem, will it impact me, and is there anything I can do about it?

One of the reasons I was excited to start this was because while working at PagerDuty, talking to people about incident response and observability, I just kept hearing over and over again from people that their downtime is tied back to a third party, not their own software. But I didn't see monitoring tools changing or adapting to focus more on those things. The New Relic and Datadog agents can tell you that there's a problem with a call to a third party, but there was this sense of uncertainty over is it me or is it them?

Current synthetic tools such as Datadog, Grafana, New Relic hit a URL, if it returns a certain status code, they do a thing and maybe run some logic if it calls another thing. We go a step further where we actually stitch together an end to end workflow of what to expect. If you are creating a bucket in S3, we verify that the bucket exists, we then start uploading files to that bucket, deleting things, then removing the bucket itself. If an endpoint is supposed to send you an email or send you a webhook, we wait for those things and report back how long it took to receive.

And then the bigger problem was holding them accountable. How do I know if they hit their SLA last month or last quarter? We aim to solve that visibility problem that is becoming a bigger piece of the developer's operational workload.

About Console

Console is the place developers go to find the best tools. Each week, our weekly newsletter picks out the most interesting tools and new releases. We keep track of everything - dev tools, devops, cloud, and APIs - so you don't have to.