At our company, we leverage Apache Druid to provide our customers with real-time analytics tools for various use-cases, including in-flight analytics, reporting and building target audiences.
The common challenge of these use-cases is counting distinct elements in real-time at scale.
We've been using Druid to solve these problems for the past 5 years, and gained a lot of experience with it.
In this talk, we will share some of the best practices and tips we've gathered over the years.
We will cover the following topics:
- Data modeling
- Retention and deletion
- Query optimization