Iceberg 14 + Spark 3.3 = FAST!I upgraded the ngods data stack to Apache Spark 3.3.0 and Iceberg 14.0, which is now visibly faster!Aug 5, 2022A response icon1Aug 5, 2022A response icon1
Published inTowards DevTrino & dbt: excellent fit for cross-database ELT and data connectorsTrino (aka Presto or Starburst) is an open-source component for querying multiple databases simultaneously. It can execute distributed…Jul 13, 2022A response icon1Jul 13, 2022A response icon1
Are you throwing money out of the window by using Snowflake?I recently came across this excellent blog post in which Kris argues that the hosted data stacks are becoming quite expensive. I agree with…Jul 11, 2022A response icon8Jul 11, 2022A response icon8
I agree. The cost is one of the reasons why I run my own Spark/Trino/Iceberg stack…Jul 10, 2022Jul 10, 2022
Orchestrating dbt and PysparkI use dbt for my data projects to implement the medallion data pipeline architecture that processes data in three: bronze, silver, and gold…Jul 10, 2022Jul 10, 2022
Hi, I noticed that Snowpark allows you to define a stored procedure in Scala, Java, or Python and…Jul 8, 2022A response icon1Jul 8, 2022A response icon1
Published inDev GeniusIceberg + Spark + Trino + Dagster: modern, open-source data stack demoI assembled the ngods (new generation open-source data stack) two months back and have used it for two projects since then.Jul 4, 2022A response icon9Jul 4, 2022A response icon9
ngods: new generation open-source data stackI wanted to quickly share my attempt to assemble a new generation data stack that is composed of open-source technologies.May 20, 2022A response icon2May 20, 2022A response icon2
Headless BI: metrics vs SQLHeadless BI’s metrics are much better than queries for your “non-SQL speaking” users.May 16, 2022A response icon1May 16, 2022A response icon1