Working at Ably
Ably is a suite of simple APIs to develop powerful realtime experiences. Start with pub/sub messaging that is guaranteed to deliver and automatically translates between protocols. Then add device presence, check stream history, and never worry about abrupt disconnections.
How engineering works at Ably
How are the teams structured?
Ably has multiple teams each focused on a particular set of services, experiences or libraries. The real-time team works on the pub/sub system and the web team works on the developer experience of our web console. There’s also an Infrastructure team, working on AWS-based infrastructure to host our services and an SDK team working on client libraries. Each team has their own processes, tools, onboarding materials, playbooks and whatever else they need as a team.
There are also cross functional teams for the bigger projects. Quite often they need input from a real-time engineer, a web engineer, or an SDK engineer to get together and implement a feature. These cross-functional teams, that we call pods, are led by a product manager or an engineer depending on if it's a commercial customer facing feature, an optimization problem, a tech debt project, or something else entirely.
What tools do engineers use?
- Version Control: GitHub
- Infrastructure Management: Terraform
- Communication: Zoom, Slack
- Issue Management: Jira
- Project Planning: Confluence
- Deployment: GitHub Actions, TeamCity and AWS
- Monitoring: VictoriaMetrics
- Alerting: PagerDuty
Can developers pick their own tools?
Developers can use whatever they want - a lot of the team uses VS Code or VIM, but some developers use proprietary IDEs. For example, if you’re working with C# or Java, you probably want to use Visual Studio or JetBrains. Ably buys licenses for these proprietary IDEs if developers ask for them.
There are some choices that are made at a team level and are hard to change, like using Terraform or using Node.js and Go. Developers can pick whatever tool they want to use individually, but there is less flexibility with tools that are used by entire teams.
How does the development process work? What's the process for working through bugs, features and tech debt?
Big features that are driven by product management will typically be delivered through a cross-functional team, but each team has their own backlog of tech debt and improvements that they want to make. Anybody who's not staffed to a cross-functional initiative at any given time will be working through that backlog of bugs and tech debt.
Pods also fix bugs, but the home teams, as we call them, act as a catch all. For example, if there is a problem with the real-time system, it will be assigned to the real-time team; it can also be triaged to a pod if it's related to their work, but that's typically where it starts.
How does code get reviewed, merged, and deployed?
Ably uses GitHub pull request workflows. Each team has their own pull request workflow that they own. If you want to make a change, be it to the real-time system or to an SDK, then you have to follow that respective team's workflow. The SDK teams have a slightly different workflow because they develop completely in public. All of Ably’s SDKs are open source, that means the SDK team has to use tools that anybody can use publicly, whereas the realtime team can use private CI servers.
After the code is merged, it gets built and containers get released to ECR. For deployments, we have an interesting system where Ably has completely immutable infrastructure in production. No EC2 instance lives for more than two or three days in Ably - we're constantly tearing them down and redeploying. We have a rolling redeploy where each week somebody just changes the image tags and that's how new code gets deployed.
We've historically used Jenkins, but most teams are moving away from that now and using GitHub Actions, with the exception of the real-time team - they have some tricky build requirements and use TeamCity.
What is the QA process?
All of the services and components have unit and integration tests and they run as part of CI. We also have an incremental rollout process - we currently deploy to seven AWS regions, that gives us the option to test the canary deployments in individual regions and make sure that everything looks good in the metrics before we make the image default and use it in all regions.
What are some recent examples of interesting development challenges solved by internal teams as part of building the product?
People are quite excited when they join Ably because they're in a team of less than 10 people, but they're deploying to a truly global distributed system with a five-nines uptime guarantee. The engineering department currently staffs 40 people - Ably has a small company startup vibe, but with engineering challenges that you would normally find in big tech.
There is one pod that's completely focused on just scaling us up, they give us traffic projections orders of magnitude more than we can currently handle. The tech lead for that team owns that problem, does some profiling, finds the bottlenecks, looks at the architecture then finds the design level bottlenecks and plans to evolve the system to handle orders of magnitude higher load. That's one of the engineering-led pods where there's not so much need for product management there, that's just an optimization problem.
In contrast, we recently had another pod that was focused on cost management and latency. We wanted to figure out the best balance between service latency and overall infrastructure cost. These are obviously competing objectives, so working through the right balance was a good challenge.
Then on the feature development side, we just shipped a feature called message interactions. The pub/sub subsystem is quite heavily used in chat use cases, which is interesting because we've never added any chat specific features previously. We're just a fairly un-opinionated messaging pub/sub service but a large portion of our traffic is chat. A common use case in chat these days is to be able to edit messages, or delete them - which translates to publishing a change to a message that's already gone out.
Although pub/sub seems like a good design for doing messaging generally, you can't recall or mutate a message that you've already published to an unknown number of subscribers. So, we provided an extra API for people to back-reference a message that's already gone out, then attach arbitrary metadata to it or make changes. We've just shipped some APIs to make us a bit more chat friendly by supporting those message interactions.
We've also got developers working on our first use experience, making it nicer to get set up with Ably. We've got a lot of people working on developer education trying to get our docs experience as good as some of the market leaders.
How does on-call work?
In Ably, the real-time team and the infrastructure team join the same on-call rota. There's a primary and a secondary on-call who will get paged in the night on PagerDuty if an alarm goes off. People get paid for out of hours support and for actually fielding issues. The web team also has an on-call rota, but it's much quieter, so they don't get paged very often.
Our teams are growing and the code base is getting bigger; we need to start dividing things up a bit so that there's separate rotas for different systems. That's a thing that we're looking into doing next year.
Hiring process at Ably
How does the application process work? What are the stages and what is the timeline?
There's a few stages to the interviews and it takes about two to three weeks depending on the candidate's availability.
- Introduction: Applicants meet just the hiring manager and have a quick call to make sure mutually it's a good fit. They will tell them a bit about the team, learn a bit about them and maybe ask a few questions.
- Tech interviews: Different teams have interview processes that are designed for their particular area of expertise. We give candidates an option to do their technical interviews as a homework problem or as a live coding interview, whichever they prefer
- Mission alignment interview: Applicants meet a member of the leadership team, which is somebody from the C-suite. This stage is about making sure that the candidate’s approach and ways of working align with Ably’s and there’s a long-term, mutual fit.
What is the career progression framework? How are promotions and performance reviews managed?
Ably has a leveling framework that has all of the normal levels, engineers, 1, 2, 3, up to staff principal, et cetera. There's a parallel path for both leadership and individual contributors, so you can get all the way up to the VP level and remain an engineer - you don't get forced into management. We do reviews twice a year - there's a promotion and pay review every six months. Reviews provide an opportunity to reflect on how you're doing, refresh your goals, aim for the next level and aim higher.
Console is the place developers go to find the best tools. Each week, our weekly newsletter picks out the most interesting tools and new releases. We keep track of everything - dev tools, devops, cloud, and APIs - so you don't have to.