Monitor your cloud dependencies.
What is Metrist and why did you build it?
Our mission at Metrist is to give software developers and IT leaders the same level of visibility into the third party cloud products that they build on as they have with the software that they build themselves. Apps are built on top of other apps. That starts with anything from a dozen different services at AWS to APIs like Twilio, Stripe, Easy Post and cloud tools like CircleCI and GitHub. If one of those tools goes down, you risk going down or at least having a degraded user experience or an inability to ship code.
The problem that we identified is twofold - one is it's really hard to either find out about or verify that it's a third party that's causing the problem, not you and your code, and the things you control. The second thing is it's really hard to hold your vendors accountable to what their SLAs are. Metrist empowers people to monitor the services that they rely on. We put the health of all of your third party cloud dependencies into a single dashboard, alerting you about outages typically 10 to 20 minutes before a status page gets updated. We provide you enough details to not only answer the question is it me or is it them? But also answer, what is the problem, will it impact me, and is there anything I can do about it?
One of the reasons I was excited to start this was because while working at PagerDuty, talking to people about incident response and observability, I just kept hearing over and over again from people that their downtime is tied back to a third party, not their own software. But I didn't see monitoring tools changing or adapting to focus more on those things. The New Relic and Datadog agents can tell you that there's a problem with a call to a third party, but there was this sense of uncertainty over is it me or is it them?
Current synthetic tools such as Datadog, Grafana, New Relic hit a URL, if it returns a certain status code, they do a thing and maybe run some logic if it calls another thing. We go a step further where we actually stitch together an end to end workflow of what to expect. If you are creating a bucket in S3, we verify that the bucket exists, we then start uploading files to that bucket, deleting things, then removing the bucket itself. If an endpoint is supposed to send you an email or send you a webhook, we wait for those things and report back how long it took to receive.
And then the bigger problem was holding them accountable. How do I know if they hit their SLA last month or last quarter? We aim to solve that visibility problem that is becoming a bigger piece of the developer's operational workload.
What does a "day in the life" look like for you?
One thing I don't do anymore is code. In the early days, I put together a prototype, but I don't really code anymore. Now it depends on what's going on. So in the early days, it was helping with fundraising. Then it was hiring a team. As a CTO, I manage both the engineering and product. Day to day, that's literally going into the Jira board, making sure all the lanes are full, helping in the stand-up, identifying blockers and just managing the engineering team. It’s really about managing people, making sure everyone has enough to work on, making sure the engineers are happy.
What is the team structure around Metrist?
We are a 100% remote distributed team. I'm in San Francisco, Jeff is in Portland and most of our engineers are in Canada. We also hire University of Waterloo co-op students every term. They're with us for four months and work full time on the same sort of tickets as everyone else. Currently, we're five engineers, two co-op engineers and then we have a business development rep and a content manager.
How did you first get into software development?
I started when I was in high school. The local internet service provider that I was using, they were both a free net and a dial up internet service provider. If you met certain criteria they offered free internet access or you could pay for dial up. So I joined them as a volunteer to help, it was called their mentorship program. I would help people in the community learn how to use email and newsgroups and browse the information superhighway, these sorts of things.
And when I was there, an older student right clicked on a page and opened source code, and said, did you know this is how webpages are built? And I was blown away, my mind started racing. Then I got into HTML, not coding, but generating stuff dynamically and that sort of thing. I built some systems to report to users how much dial up time they were using, a little search engine, that sort of thing.
I went to school, did computer science, and then spent the first 10 years of my career not building anything terribly interesting. It was actually starting at Server Density where I worked for my first real software startup, which was awesome. And that really helped when I got to PagerDuty and beyond.
My favorite language is C# - it's the one I've used the most in 15 years of my career. If I need to just code and get something done, I always use that, but I do really like Elixir as well. I've also dealt with Perl, PHP, Classic ASP, Python, Ruby, Scala. I hated Scala, it's terrible.
What is your tech stack?
For our back end we're using Elixir and Phoenix Live View for the web UI. We also have monitors, the things that do the end-to-end tests - those can be written in any programming language, but the ones we have are a mix of C#, Python, Node, and Java.
What is the most interesting development challenge you've faced working on Metrist?
After the prototype phase was over and we had to get serious about things, we decided to use event sourcing and CQRS for our entire platform. That means every human interaction, any machine interaction with our software runs commands that emit events. We have this long event log from the beginning of time until now of everything that's happened in our system.
The alternative to this was the classic CRUD application where you're creating, reading, updating, deleting records. We don't do that - we have an event stream and then based on what we want to see, we create projections on that event data. It's an architecture that’s been around for a while, but it was a paradigm shift for us to adopt. It was like, oh wait, if I want that thing to show up over here, I have to run a command, which will generate an event and then the projection will pick it up. That can be a challenge for new hires, but it really fits the problem we’re trying to solve.
What is the most interesting tool you are playing around with at the moment?
I would just want to give a shout out to the Commanded library. It is an open source event sourcing module that we use. It essentially runs our whole business and we've contributed back to it.
Personally I try to not do tech stuff in my spare time!
Describe your computer hardware setup
I'm a PC enthusiast, I always try to build what I use. Currently I am using a 10th gen Intel overclocked at 4.7 gigahertz with 32 gigs of RAM and a RTX 3080 for gaming. That's my primary driver. I've also got a BenQ 240 Hertz gaming monitor, which I think only does 1080p but it's good enough for games and coding. I've got another HP monitor on the side here and an MSI gaming laptop that I use when I travel. I would like to get a mechanical keyboard, but I can’t decide which one to get. I use all the gaming stuff, not a Mac in sight.
Describe your computer software setup
- OS: Windows.
- Browser: Brave.
- Email: Gmail Web UI.
- Chat: Slack and Discourse.
- IDE: VS Code.
- Source control: Git and GitHub.
Describe your desk setup
I have a gaming chair. There's built-in shelves in my condo and unfortunately what that means is it's not a standing desk. I used to have one.
Daytime or nighttime? Night.
Tea or coffee? Coffee.
What non-tech activities do you like to do?
Although I haven't done it in the last couple of years, I enjoy running in triathlons. I did that a lot from about 2015 to 2019. I enjoy gaming and reading. I read and practice drumming a lot.
Find out more
Metrist is a tool for monitoring your dependencies. It was featured as an "interesting tool" in the Console newsletter on 1 Dec 2022. This interview was conducted on 21 Nov 2022.