Iceberg 14 + Spark 3.3 = FAST!I upgraded the ngods data stack to Apache Spark 3.3.0 and Iceberg 14.0, which is now visibly faster!Aug 5, 20221Aug 5, 20221
Published inTowards DevTrino & dbt: excellent fit for cross-database ELT and data connectorsTrino (aka Presto or Starburst) is an open-source component for querying multiple databases simultaneously. It can execute distributed…Jul 13, 20221Jul 13, 20221
Are you throwing money out of the window by using Snowflake?I recently came across this excellent blog post in which Kris argues that the hosted data stacks are becoming quite expensive. I agree with…Jul 11, 20228Jul 11, 20228
I agree. The cost is one of the reasons why I run my own Spark/Trino/Iceberg stack…Jul 10, 2022Jul 10, 2022
Orchestrating dbt and PysparkI use dbt for my data projects to implement the medallion data pipeline architecture that processes data in three: bronze, silver, and gold…Jul 10, 2022Jul 10, 2022
Hi, I noticed that Snowpark allows you to define a stored procedure in Scala, Java, or Python and…Jul 8, 20221Jul 8, 20221
Published inDev GeniusIceberg + Spark + Trino + Dagster: modern, open-source data stack demoI assembled the ngods (new generation open-source data stack) two months back and have used it for two projects since then.Jul 4, 20229Jul 4, 20229
ngods: new generation open-source data stackI wanted to quickly share my attempt to assemble a new generation data stack that is composed of open-source technologies.May 20, 20222May 20, 20222
Headless BI: metrics vs SQLHeadless BI’s metrics are much better than queries for your “non-SQL speaking” users.May 16, 20221May 16, 20221