Move Fast & Avoid Sharp Edges

Published in

Landing Engineering

11 min readJun 13, 2023

Introduction

Delivering value quickly and safely is a core competency of our software engineering organization at Landing. Our team’s entire mission is to deliver useful products and tools (for our members, partners, and internal staff) as quickly as possible while maintaining high quality (things work correctly) and safety (systems are reliable and protect our users’ data). A significant piece of that value delivery chain is the tooling, automation, and infrastructure involved in building and deploying all this software. In 2022, our team rethought those systems from the ground up and migrated from Heroku to Kubernetes, in conjunction with the Porter platform. This strategic move laid the foundation for a well-oiled machine for building, deploying, and scaling our web applications and services. Now, having caught our breath after the transition, we’re excited to share a bit about our approach, the outcomes we achieved, and the valuable lessons we learned along the way. Let’s dive right in!

About Landing

You may be wondering, what exactly is Landing? In a nutshell, Landing aims to make living flexible for everyone. Offering freedom from being tied down by a lease, Landing offers our members a vast network of beautifully furnished apartments across the United States. A Landing member can, for example, live in Miami for a month, then transfer to Portland for few weeks, and then try out Boise for awhile, all while enjoying a consistent level of high quality furnishings and accoutrements. Imagine having the freedom to immerse yourself in new cities, knowing that wherever you go, you’ll have a comfortable and stylish home awaiting your arrival.

Where did we start?

From the founding of the company, Landing’s custom applications were hosted on Heroku, providing our lean engineering team with tremendous leverage and a delightful developer experience. Heroku allowed us to defer hiring dedicated DevOps or infrastructure specialists, enabling us to focus on solving core business problems without getting entangled in infrastructure complexities. In short, with Heroku, things just worked — the team could focus on solving business problems instead of fiddling with infrastructure. In the Spring of 2022, as Landing continued to experience rapid growth, we looked around and realized that, for a number reasons such as scalability, cost optimization, and flexibility, staying on Heroku would not be sustainable for Landing in the longer term. Determined to find a solution that aligned with our expanding needs, we embarked on an ambitious project. Our mission was to identify a platform that could scale alongside Landing’s growth, ensuring a smooth migration of all our application infrastructure, from web servers to databases and beyond, without disrupting our operations.

As we considered our options, our focus centered around optimizing several crucial qualities in the new environment. We prioritized cost efficiency, flexibility, developer experience, and robust security. We aspired to find a platform that would be less expensive than Heroku, with a flatter cost curve as our infrastructure expanded. From a security perspective, we wanted a platform that would allow us to make the right thing the easy thing and allow our engineers to ship safely without getting in their way. Simultaneously, we sought a platform that provided an exceptional developer experience and the flexibility to avoid being locked into a solution that might require another large-scale migration in the future. It became evident that migrating to another pure PaaS (Platform-as-a-Service) like Heroku was not our cup of tea. While products in this segment, such as Render and Fly.io, offered exceptional developer experiences, the inherent lock-in risk and potential for future migrations did not align with our long-term vision.

On the other end of the spectrum, we briefly considered using a managed Kubernetes service directly with a product like Amazon’s Elastic Kubernetes Service (EKS) or Google Kubernetes Engine (GKE), coupled with Helm charts or something similar. However, it was quickly apparent that pursuing this path would require us to hire or cultivate significant Kubernetes expertise within our team or make a LOT of mistakes along the way (or both). Even with a managed Kubernetes (k8s) service involved, without an additional abstraction layer, it was clear that this path would be complicated and painful. We loved the power, flexibility, and ubiquity of Kubernetes, but we also deeply desired a developer experience reminiscent of Heroku. Luckily, as we continued to research our options and build small proof-of-concept demos, we ultimately landed on Porter.

Where did we go?

Porter is a platform service, maintained and operated by the creators of the open source project of the same name, that provides an abstraction layer on top of managed Kubernetes infrastructure running in your own cloud account. The Porter platform provides a web dashboard along with CLI tooling for creating, deploying, and managing applications on Kubernetes without having to know much about the underlying Kubernetes internals. Furthermore, the Porter team also manages the maintenance and upgrades of our Porter-connected AWS EKS clusters (the platform also supports clusters on GCP and Digital Ocean in addition to AWS).

As we explored options, we realized that Porter presents a wonderful combination of developer experience and flexibility. The web dashboard is practical and easy to understand, and the CLI makes easy things easier and hard things possible. Moreover, the Porter team is constantly absorbing our product feedback and shipping enhancements to the platform. The level of support we have received from Porter is incredible. The most powerful long-term benefit of Porter, however, is that it is ultimately Kubernetes (and Helm charts) under the hood. While we have no intention of outgrowing the platform, knowing that it is rooted in these industry-standard technologies, provides peace of mind. If the need ever arises to scale beyond Porter’s capabilities, the migration effort will be significantly streamlined compared to our previous transition off of Heroku.

Over the course of several months in 2022, we executed a project to migrate all of our applications on Heroku over to Porter. In conjunction with this move, we also shifted all of our Heroku Postgres databases to Amazon Aurora Postgres and our Heroku Redis cache instances to Amazon Elasticache. This migration, along with the continuous improvements we are making to our infrastructure and workload creation and management tooling and processes, has established a robust foundation for scaling our engineering teams and infrastructure. These advancements empower us to efficiently support the needs of our growing business and ensure a solid platform for future growth.

Abstractions

One of the standout features of the Porter platform is the abstractions it places on top of Kubernetes workloads, making them much easier to manage and understand. In Porter, one manages one or more clusters, with each cluster in Porter mapping to a cluster in the relevant managed Kubernetes service (in our case, EKS). Naturally, we don’t need to worry about how many nodes are in a cluster or what’s running on what nodes — Kubernetes and EKS manage that automatically.

Perhaps the most crucial abstraction within Porter is the concept of applications. Applications can be “web” or “worker” apps, similar to how things look in Heroku. Creating an application in Porter is straightforward: one simply needs to specify a container image repository, the desired number of containers, the allocated RAM and CPU resources per container, and any necessary environment variables. From there, Porter (and k8s under the hood) takes care of fetching the image, configuring the deployment, and keeping everything up and running. Another important concept in Porter is that of Environment Groups. Each Environment Group acts as a bucket for storing environment variables and can be associated with one or more Applications. The last key Porter abstraction is the Job — Jobs are similar to Applications, but instead of running constantly, they run on a schedule or on demand. We have found these useful for scheduled tasks and the like.

Let’s examine a real-world example to illustrate the practical usage of Porter. In our production environment, we utilize Porter to run our Core Ruby on Rails app. In Porter, we have one Web Application responsible for serving web traffic and one Worker Application dedicated to running Sidekiq, with both applications using the same container image and Environment Group. Additionally, we have a Job that runs on-demand before each deployment to execute rails db:migrate. This ensures smooth database migrations as part of our deployment process.

Building a Playbook

As we transitioned our existing services and infrastructure to Porter and AWS, we began to think about how to streamline the creation of new services. Similarly, given the relatively small size of our engineering organization, we also aimed to provide as much familiar “look and feel” across service codebases as is reasonable. Our goal was to enable engineers to navigate easily between different codebases, quickly grasping how applications are tested, built, and deployed, in order to make engineers as productive as possible, as quickly as possible. Leveraging GitHub Actions for continuous integration and continuous deployment (CI/CD), coupled with Porter’s robust support for GitHub Actions workflows, we embarked on developing a shared ‘lingua franca’ for fundamental development and deployment workflows.

A fundamental aspect of our approach revolves around the Dockerfile, which serves, as one might expect, as the source of truth for “what do we need in the container to run this app?”. Consequently, every application’s GitHub repository includes a Dockerfile and a GitHub Actions workflow configuration for building the app’s container image from the Dockerfile. This container image represents the deployment unit for applications within Porter. By standardizing the use of Dockerfiles and embracing GitHub Actions workflows, we establish a consistent and repeatable process across the organization, simplifying the deployment pipeline and fostering familiarity among engineers working across different codebases.

We have landed on a standardized ‘template’ workflow that is used across the majority of our application repositories for building, testing, and deployment processes. While there might be slight variations from one codebase to another, the workflows are largely consistent throughout our organization. Once an engineer becomes familiar with the workflows in one repo, they can easily navigate and understand the others. Here is an overview of the key components:

Whenever changes are pushed to a relevant branch, such as a staging or production branch, we initiate a test run that includes unit tests, code linters, and other code validation tools.
Concurrently, we launch a suite of security scanners, including dependency vulnerability scanning and static code analysis.
In parallel to the aforementioned steps, we initiate the container image build process; this leverages Porter’s tooling to automatically retrieve the necessary environment variables from the corresponding Porter Environment Group. This step also includes a vulnerability scan of the container image once built. Upon successful completion, the image is pushed to the relevant container image repository.
Once the above steps complete successfully, we get to the business of deploying the new code.
If a database migration, like rails db:migrate, or any other similar job needs to execute prior to deploying the new code, the workflow updates the relevant job to use the new image tag and runs it accordingly.
Next, the workflow uses Porter’s tooling to update each relevant application to utilize the new container image tag. Under the hood, this ultimately triggers Kubernetes to roll out new containers (running the new container image) gradually to each application deployment.
In the event of any failures during the preceding steps, the workflow aborts and notifies the relevant application’s dedicated Slack channel.
Conversely, if all the preceding steps succeed, the workflow sends a notification to the Slack channel, informing the team that the new code is live!

In order to help illustrate the key components of this setup for our engineers, along with other shared practices and configurations across codebases, we have created a template repository. This repository contains a minimal application accompanied by all the necessary workflows and tooling for building and deploying the app to Porter. The goal of the template repository is to present only the essential elements, excluding any specifics tied to particular codebases or tech stacks. Engineers can use this repository as a living document to grasp the core concepts of our build, deployment, and validation workflows. Additionally, this repository serves another valuable purpose: when a team wants to launch a new application, all the fundamental bits of these workflows are near at hand, ready to plug into the repo for their new application!

Furthermore, we have started to push this notion of a standardized experience across codebases to a higher level by developing a standard ‘service launch checklist.’ This checklist outlines all the essential requirements for a new service to be considered ‘production-ready’ at Landing. Alongside the standard build, validation, security, and deployment workflows mentioned earlier, the checklist also expects elements such as a clear developer setup guide, a thorough architecture diagram, the implementation of baseline observability instrumentation, and other aspects that help ensure that the new service will be reliable, maintainable, and secure. Beyond its usefulness for new service rollouts, this checklist also eases the evaluation of existing services to identify gaps, shortcomings, and technical debt that need to be addressed. While we must admit that there is no ‘one size fits all’ solution here, the checklist provides an excellent starting rubric for pragmatically assessing a software service.

Why It Works

This approach proves to be effective for several reasons. First of all, the convention and consistency it establishes across codebases are crucial. It empowers engineers to enter an unfamiliar codebase with the safe assumption that the build and deployment workflows for that application are at least similar to those in their “home” codebase. This alleviates a significant cognitive burden, allowing the engineer to focus on understanding the code itself without worrying about the intricacies of deploying it out into the world.

The abstractions that Porter provides on top of Kubernetes are also invaluable. Our engineers don’t need in-depth expertise in all the core Kubernetes concepts, even though our workloads run on Kubernetes, because Porter provides friendly abstractions that allow engineers to operate at a higher level, above the k8s weeds.

On that note, it’s worth taking a moment to highlight some of the benefits we get “for free” from Kubernetes which are essential for building and operating robust systems. For instance, Kubernetes vigilantly ensures that the desired number of replicas of each application workload is always running. Similarly, if an application container starts consuming excessive memory, Kubernetes automatically restarts it. The practice of “rolling deploys” mentioned earlier, where new containers with the updated container image are gradually introduced into service (and rolled back if necessary), also happens seamlessly thanks to Kubernetes. These are just a few of the best practices of modern software operations that are baked into Kubernetes.

The use of GitHub Actions is another huge asset for our teams. Instead of having to navigate to a different repository or system to understand how the deployment workflow is configured for an application, each engineer can simply navigate to the .github/workflows directory and see it for themselves. They can even contribute pull requests to add updates and improvements to the workflows as they see fit. Since deployments occur through GitHub Actions as well, the workflow logs are conveniently accessible with just a few clicks, right within the system that engineers already spend a significant amount of their time in. To be clear, the specific CI/CD platform at play (in our case GitHub Actions) is less important than the fundamental practices of defining deployment workflows as code and keeping them (and their actual execution) close to the code itself, where the engineers who own the service spend so much of their time.

Looking Ahead

Our journey to adopt Porter and Kubernetes has already yielded tremendous benefits, but we believe we’re just scratching the surface. We are excited to continue refining and optimizing our infrastructure, tooling, and processes to make our engineers’ lives easier and ultimately better serve our customers. A key focus for us is to continuously assess and enhance the security posture of our cloud environments and applications. By emphasizing the principle of making the right thing the easy thing, the workflows described above have proven to be incredibly powerful in bolstering our security efforts. As we embrace a shift-left approach and integrate more security into our software development lifecycle, the investments we’ve made in our tooling will continue to compound.

While it is already relatively straightforward to spin up a new service at Landing using our checklist and standardized tooling, we are committed to making this process even easier and safer moving forward. Additionally, we are determined to improve the speed at which our teams can deliver new features and fixes to our existing services. All in all, we know that the faster our engineers can deliver secure, effective software to solve the problems of our business and delight our customers, the better Landing can execute on our mission to make living flexible for everyone. In writing this post, we realized there are many more topics we would love to explore in the future, so stay tuned for more blog posts from our team. And please don’t hesitate to reach out if you have any questions!