I recently came across this excellent blog post in which Kris argues that the hosted data stacks are becoming quite expensive. I agree with the conclusions of this article, especially with the recommendation to use data stack components that are cloud platform agnostic and can also be deployed locally.
I experienced this myself and recently decided to migrate my data projects from Snowflake and BigQuery to an open-source data stack that I’m deploying on Google cloud engines (production)and my local machine (development). I’m using Google Cloud Storage and AWS S3 to store data, which are much less costly than cloud data warehouse companies charge for data storage.
The amount I’m paying for cloud infrastructure went down roughly 15 times in the few months I’m running this new setup. Performance-wise I haven’t noticed any significant difference except in the development environment, which is noticeably faster.
Operation-wise, I spend a few hours per month managing the data stack once it is installed and configured. Larger organizations should expect some spending here, but I think this should still be much less than what they pay today for their hosted data stack.
The data stack I use costs $0 for software licenses because it is built exclusively from open-source components. It can be deployed to any cloud platform (AWS, GCP, or Azure) or even locally with a simple Terraform or docker-compose recipes. Its platform independence gives me more flexibility and better cost control.
Are you in the same situation? Would you like to explore open-source alternatives to your current cloud data warehouse-based data stack?
You can start with this open-source data stack. It contains the same components (e.g., Apache Iceberg, Apache Spark, Trino) that Netflix uses for processing petabytes of data. This GitHub repo contains a lightweight configuration of this data stack that you can install on your local machine for development or exploration purposes. The installation is Docker-based and requires the docker-compose command.
Thank you for reading my article. Please let me know what you think in the comments!