Bettering Developer Performance with Policy Automation at DoorDash
DoorDash recently leveraged Open up Policy Agent to enrich the performance of their developers. The infrastructure team at DoorDash noticed quite a few pros, together with more rapidly opinions of changes to infrastructure insurance policies, a lot more detailed tagging of means, and a notable lower in the number of incidents ensuing from coverage violations.
At QCon New York 2023, Lin Du, senior application engineer at DoorDash, provided a comprehensive overview of a self-serve infrastructure that utilizes policy automation.
A few years again, DoorDash encountered an incident that brought on their purchase volume to drop. While the infrastructure workforce settled the incident in an hour, the root induce was the accidental removal of essential AWS sources. This unintended removal occurred in just a Terraform code that also included around 90 other means, seemingly innocuous in nature. This realization prompted the use of plan automation as a safeguard in opposition to this sort of important oversights in the foreseeable future.
At DoorDash, the team has used Atlantis, an open up-resource orchestrator for Terraform ideas. This orchestrator manages the Terraform prepare lifecycle. When customers produce infrastructure pull requests on GitHub, a webhook party is induced to an Atlantis worker. This employee retrieves Open Coverage Agent (OPA) policies from a designated S3 bucket.
DoorDash crafts the policy guidelines making use of Rego queries to detect deviations from the envisioned system state. The conftest resource, used by DoorDash, leverages these OPA procedures to validate info towards policy assertions.
Atlantis then runs conftest against the Terraform system, aligning it with OPA-defined insurance policies. The effects, alongside Terraform system particulars, are extra as remarks on the GitHub pull requests.
DoorDash even further streamlines the system with Pull Approve, a GitHub integration handling code evaluate, assignment, and plan. With the needed approvals in position, Atlantis executes alterations to AWS sources as per the Terraform system.
Du more illustrated the guidelines that can be penned utilizing this automation. He categorized the policies into 4 varieties – Dependability, Velocity, Efficiency, and Safety.
For trustworthiness, take into account a scenario in which it is critical to safeguard crucial resources from deletion. Du illustrated this by presenting an case in point exactly where a plan was established up to detect these essential assets. Subsequently, a verification phase was released, necessitating an administrative overview ahead of any modifications to these means could get put. To optimize the critique velocity, Du showcased an example where by the plan checked the Terraform module in a given PR from the currently-authorized list of modules. If the crew makes use of a module that is not stated, the plan encourages applying an previously-authorized terraform module.
As an final result of the policy automation, the DoorDash infrastructure staff saved time expended reviewing the pull requests, thus operating in the direction of solution enhancements as a full. The group could also reduce incidents brought about by coverage violations, as they could discover policy troubles in pull requests early. Finally, the staff improved their assets tagging protection and standardization from 20% to 97.9%, primary to price and workforce member optimization.