1/3rd of the cost of a typical Kafka cluster is storage. Beyond costing money, fluctuating usage means storage space needs to be monitored and has been a source of on-call pain for us. Tiered storage for Kafka is a newly released feature that promises to dramatically reduce storage costs by offloading most data to cheap storage (eg S3) rather than expensive local or network attached disks (eg EBS). It's marked as production-ready, but it's not widely adopted yet. Stripe is currently in the process of migrating to tiered storage across our fleet of more than 50 Kafka clusters. We've encountered some problems already like JVM crashes and metadata calls that occasionally time out only for tiered storage topics, and we're still early in the migration process (though we'll be done one way or the other by the time this conference takes place!). In this talk you'll learn about the problems we encountered that either made us abandon the use of tiered storage or that we had to solve to run it successfully in production.
