Tales of Kafka at Cloudflare

At QCon London, Andrea Medda, Senior Techniques Engineer at Cloudflare, and Matt Boyle, Engineering Manager at Cloudflare, shared the lessons their system expert services team uncovered from enabling the use of Apache Kafka at the scale of 1 trillion messages.

Matt began by outlining the problems that Cloudflare needs its technological innovation to address, particularly supplying its individual private and community cloud, and the operational challenge of coupling concerning teams that arose as their enterprise desires grew and evolved. He went on to identify how Apache Kafka was chosen as their implementation of the information bus sample.

Though the messagebus pattern enabled the decoupling of load concerning microservices, Matt discussed how companies nonetheless ended up being tightly coupled due to the fact of an unstructured method to schema administration. To resolve this challenge, they opted to migrate from JSON messages to Protobuf and to create a consumer-aspect library to validate messages prior to publishing them.

As the adoption of Apache Kafka grew across their teams, they made a Connector Framework to make it less complicated for teams to stream knowledge between Apache Kafka and other methods even though transforming the messages in the approach.

In excess of the pandemic, as load on Cloudflare’s devices grew, the workforce commenced to notice bottlenecks on a key buyer which experienced begun to breach its Support Degree Agreements. Andrea explained how the team’s original wrestle to determine the root bring about of the issue prompted them to enrich their computer software growth kits (SDKs) with tooling from the Open up Telemetry ecosystem to achieve far better visibility of interactions throughout their stack.

Andrea went on to spotlight how the achievements of their SDKs introduced extra inside users which spurred a have to have for superior aid in the sort of documentation and ChatOps.

Andrea summarized the essential lessons as:


  • Placing the balance in between remarkably configurable and basic standardized strategies when supplying developer tooling for Apache Kafka
  • &#13

  • Opting for a straightforward and strict 1:1 deal interface to guarantee optimum visibility into the workings of subject areas and their utilization.
  • &#13

  • Investing in metrics on growth tooling to make it possible for issues to be very easily surfaced
  • &#13

  • Prioritizing very clear documentation on patterns for software builders to enable regularity in adoption and use of Apache Kafka.
  • &#13

Last but not least, Matt shared a new interior solution, referred to as Gaia, that the group was creating to permit push-button generation of providers according to Cloudflare’s most effective practices.