![]() One of the best aspects of Iceberg is that so many tools are building support for Iceberg such as Dremio (which is also an Iceberg contributor). Check out their docs for many of the great features that exist in Iceberg such as Time Travel, Hidden Partitioning, Partition Evolution, Schema Evolution, ACID transactions and more. Now you know how to quickly set yourself up so you can experiment with Apache Iceberg. ![]() If you want to use this container again in the future: To create a new Iceberg table we can just run the following command. So it may feel like working with a traditional database, and that is the beauty that table formats like Iceberg enable, working with files stored in our data lake in the same way we work with data in a database or data warehouse. So we are creating and reading files that would exist in your data lake storage (AWS/Azure/Google Cloud). Keep in mind, we are not working with a traditional database but with a data lakehouse. If you are curious to the settings I used you can run cat iceberg-init.bash back in terminal. Now we are inside of SparkSQL where we can run SQL statements against our Iceberg catalog that was configured by the iceberg-init script. Start the Docker Container docker run -it -name format-playground alexmerced/table-format-playground This blog will focus on Apache Iceberg, but feel free to play with the other table formats using their documentation. delta-init - to open Sparh Shell with Delta Lake configured. ![]() hudi-init - to open Spark Shell with Apache Hudi configured.iceberg-init - to open Spark Shell with Apache Iceberg configured.Once the docker image is running you can easily open up Spark with any of the table formats with the following commands: All you have to do is rebuild the image, you can find the dockerfiles for this image in this repo. Note: This container was built from 64-bit Linux machine, so the image may not work on an M1/ARM chipset. You can get this up and running easily with the following command:Įnter fullscreen mode Exit fullscreen mode Blog: Table Format Comparison - Partitioningįor this tutorial you do need to have Docker installed, as we will be using this docker image I created for easy hands on experimenting with Apache Iceberg, Apache Hudi and Delta Lake.Blog: Table Format Comparison - Governance.Blog: Table Format Comparison (Iceberg, Hudi, Delta Lake).Blog: Migrating Apache Iceberg tables from Hive.Blog: Apache Iceberg's Hidden Partitioning.Blog: How maintain Apache Iceberg Tables.DataNation Podcast: Episode of Table Formats. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |