Data Engineers face many challenges with Data Lakes. GDPR requests, data quality issues, handling large metadata, merges and deletes are a few of the tough challenges usually every Data Engineer encounters with a Data Lake with formats like Parquet, ORC, Avro, etc. This session showcases how you can effortlessly apply updates, upserts and deletes on a Delta Lake table with a very few lines of code and use time travel to go back in time for reproducing experiments & reports very easily, how we can avoid challenges due to small files as well. Delta Lake was developed by Databricks and has been donated to Linux Foundation, the code for which could be found at http://delta.io. Delta Lake is being used by a huge number of companies across the world due to its advantages for Data Lakes. We will discuss, demo and showcase how Delta Lake can be helpful for your Data Lakes because of which many enterprises have Delta Lake as the default data format in their architecture. We will will use SQL or its equivalent Python or Scala API to perform showcase various Delta Lake features.

The slides are available here:  tinyurl.com/bbuzz-delta-lake or attached below.

Video

Maschinenhaus
15.06.2021 17:50 – 18:20
Talk
Beginner

Speakers