Metadata has been a key data infrastructure need since the beginning of our team's history at Stitch Fix.

We began this journey in 2015 with the setup of the Hive Metastore to work with Spark, Presto, and the rest of the platform infrastructure. But as our business needs grew, we felt the need to enhance and extend our metadata ecosystem.

In this talk, we want to share our journey of building additional capabilities with metadata to solve data and business challenges. Starting with our base infrastructure - the Hive Metastore, we will highlight each capability that led us to build the extensions into our present day metadata infrastructure. This includes improvements made to the Hive Metastore itself, extending the use of metadata beyond table schemas, and additional microservices we added to make access and use of metadata easier.

Building these capabilities has helped our team use metadata to power internal use cases. We want to share how we went about building this ecosystem and the lessons we learned along the way.

 

 

 

Video

Kesselhaus
16.06.2021 16:50 – 17:20
Talk
Intermediate

Speakers