ngods: new generation open-source data stack

ZD
1 min readMay 20, 2022

--

I wanted to quickly share my attempt to assemble a new generation data stack that is composed of open-source technologies.

My initial motivation was to try Apache Iceberg features like git-like data snapshots, schema evolution, and partitioning. I’ve done my experiments and couldn’t stop extending the new data stack.

The result is a proof-of-concept of an open-source data stack that consists of Apache Iceberg, Apache Spark, and Trino.

ngods high-level architecture and component options

I was expecting yet another data-lake-slow experiment but got thrilled with the results: the stack is fast and feature-rich.

I’ve created a Github repo with a docker-compose script to share my excitement.

I certainly plan to add more components like DBT, Dagger, Flink, and Postgres to it soon.

I’m really interested in your opinions regarding my components choice, their potential alternatives, and other suggestions.

Would you recommend trying DeltaLake or Hudi? Any alternative to Trino? Let me know.

--

--