More than a video. Replay lets you jump to any point in execution, add Console logs on the fly, and squash bugs as a team.
Tech stack
ReactTypeScriptC++
How engineering
works at Replay
How are the teams structured?
We have a bunch of talented people working really closely together. There used
to be eight people in a team and it was hard to imagine going up to 10 or 12. So
we divided it into a front-end team with two people, a back-end team with four
people and a runtime team with two people.
What tools do engineers use?
Project management: Linear
Communication: Discord
Documentation: Notion
Builds and Deploys: GitHub.
Infrastructure: Kubernetes on AWS
Error Tracking and Monitoring: Sentry, Honeycomb, Datadog
Can developers pick their own tools?
We are at a stage where the first 20 people are going to set the environment for
the next 200. We’re looking for people who are really creative and want to
design that environment. So if you want to use Emacs instead of Vim then you can
do that, but the more exciting thing is thinking about how to design the
environment for Replay as a whole and shape it for the future.
How does the development process work? What’s the process for working through bugs, features and tech debt?
To us, the most important thing is quality. We have a score called “the butter
score”, which was there to say, “Hey, if it’s a hundred, we’re giving our users
a buttery smooth experience.” You can start the Replay quickly, you can interact
with it well, it doesn’t get in your way, it never lies to you.” We just got to
the point where half of the sessions have a butter score of 90. When we started
measuring this, our P50 was 30. When we get to a P50 of 100, it will be the
first time that half of all sessions are buttery smooth, but we won’t stop until
we get to a P90 of 100.
How does code get reviewed, merged, and deployed?
We use GitHub. We do code reviews, we trust you to do great work so we try not
to get in your way, but at the same time we know that we’re all solving really
difficult problems and so we try to support one another as well.
What is the QA process?
We have the best automated tests. We fuzz at every single level, a standard test
would be a unit test where you have very defined inputs and outputs for one
component - you can go up to a functional test where you say, “Hey, I care about
this piece of functionality and I’m going to test multiple systems at once.” And
then you have an end-to-end test where I’m going to pretend to be the user.
That’s all well and good and we have all of those but the highest level of
testing, in my opinion, is generative testing where you try to stress test the
system and you let the computer come up with new cases that we’ve never thought
of. So we are constantly fuzzing at the runtime level, going to random websites
and recording them, at the protocol level, opening recordings and issuing
thousands of requests to try automated debugging in the most strenuous way
possible to discover new issues.
What are some recent examples of interesting development challenges solved by internal teams as part of building the product?
On the runtime side, one of the very interesting problems we faced was the
ability to snapshot a runtime state at the C++ level so that you can write it to
disk and then load it up later. It’s like a save button in your browser so you
can shut it down and then open your browser up exactly at that point in time,
then in that space we have the ability to do lightweight forking. The snapshot
persists to memory or disk and it’s hundreds of megabytes; you get a new content
process in under a second and it just keeps on running.
On the back end, the most interesting thing we’re doing is turning compute -
browser compute and NodeJS compute - into a state in S3 that we can query as a
database. You’re turning imperative execution into something functional and
queryable later, which you do to improve performance and reduce further
computation.
And then on the front end, we’re building dev tools: elements, console, network
monitor, debugger, and trying to do that in a functional way because we have a
time travel debugger. We started with something like Chrome dev tools and
Firefox dev tools, but runtimes are by nature imperative so a tool like Redux
made a lot of sense. Now that we have something more functional.
How does on-call work?
We don’t have a formal on-call, but we might have to bring that on in the next
six months. Currently, we have Slack Connect with our top customers, and we have
great observability and alerting. If there is an issue, we use Discord to start
the conversation and the right people can come in.
Hiring process at Replay
How does the application process work? What are the stages and what is the timeline?
As a general rule of thumb, we believe everyone applying here is technically
strong. I don’t think you should apply to work with this team if you aren’t
excited about the future of software and making software more understandable.
Everything else in the interview process is testing for fit. “How do you like to
work? What are your biases and trade offs and priorities? How do you like to
collaborate?” Getting to know us. I think, if we design the process right, you
will have a chance to say, “Hell, no,” to us. ‘cause you’ve got to know us just
as well as we’ve gotten to know you.
We think that the best people aren’t looking, we try to talk to them three
months, six months, two years before the timing works. Some people really want
to come on when the team’s five people, others really want to come on when the
team’s 20 to 50 and some people are just happy where they are. If someone is
actively looking, we can get to a decision on their timeline very quickly, even
within a week, but usually it would be two to four weeks.
What is the career progression framework? How are promotions and performance reviews managed?
At this stage, the most important thing we do is equity and ownership. The first
20 will shape the next 200. So as a technical leader, you help define the
technical primitives that we will build on top of. You also get to shape the
technical processes for how we collaborate, plan, prioritize, diagnose issues,
recruit. We don’t really do titles, but we do recognize ownership in a pretty
big way.
Console is the place developers go to find the best
tools. Each week, our weekly newsletter picks out the most interesting tools and new releases. We keep
track of everything - dev tools, devops, cloud, and APIs - so you don’t have to.
Subscribe to the weekly Console newsletter
An email digest of the best tools and beta releases for developers. Every Thursday. See the latest email.