Analyze COVID data in less than 5 minutes!

4 min readJan 8, 2022

In this short tutorial, you’ll learn how to create a database with US COVID data that you can later analyze with your favorite analytics or data science tool. You’ll also be able to refresh the COVID data by invoking one command-line statement. The whole process won’t take you more than 5 minutes.

In this tutorial, we’ll use SQLite database, but you can easily do the same steps with Postgres, MySQL, or Snowflake.

Install dbd

First, you need to install the dbd tool that we’ll use for the database creation and loading.

python3 -m venv dbd-env
source dbd-env/bin/activate
pip3 install dbd

Create dbd project

This command-line statement creates a new dbd project with an example content that we can use.

dbd init covid19
cd covid19
dbd run .

A new SQLite database states.dbis created after executing the last

dbd run .

command. The database contains the following tables:

SQLite database generated from dbd default project

The database has been created from the model directory.

The three CSV files have been directly loaded to the SQLite database. The data stored in the us_states table are generated from the SQL SELECT stored in the us_states.sql file.

The us_states.yaml file is used for configuration of the us_states table (e.g. column data type, primary key, etc.)

Adding COVID data

We’ll use the NY Times COVID-19 data repository for getting fresh COVID data. We’ll start with a simple statistic that shows daily cases and deaths by US states. The data are in the us_states.csv file. Github provides the raw URL that points directly to this file.

https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv

The only thing you need to do is to copy this raw URL to a new us_covid.ref file that we create in the model directory of your project.

Then you can re-run the dbd from the covid19 directory:

cd covid19
dbd run .

This will create a new us_covid table in the target database.

As we haven’t created any yaml file with additional configuration, dbd have used the defaults and typed all columns as TEXT . It would be nice to configure the table but we won’t need it now.

Putting data together

The last step is to join the new us_covid data to the existing us_states table. You can simply do it by adding a new us_states_covid.sql file with the SQL statement below to the model directory.

SELECT
        us_covid.date AS date,
        us_states.state_code,
        us_states.state_name,
        us_states.state_population,
        us_states.state_area_sq_mi,
        us_covid.cases AS state_covid_cases,
        us_covid.deaths AS state_covid_deaths
    FROM us_covid
        JOIN us_states ON us_states.state_name = us_covid.state

We’ll now re-run the dbd command again:

cd covid19
dbd run .

There is now a new us_states_covid table in the database.

This is all great, but the date, state_covid_cases , and state_covid_deaths columns have the default TEXT datatype.

Now is the time to provide the additional YAML configuration for the new us_states_covid table.

Let's create a new us_states_covid.yaml file with the following content:

table:
  columns:
    date:
      type: DATE
    state_covid_cases:
      type: INTEGER
    state_covid_deaths:
      type: INTEGER

and re-run the dbd command for the last time:

cd covid19
dbd run .

Now the data types are fixed.

You can use an analytics or data science tool of your choice to connect to your new database and plot a few nice charts and dashboards! I can recommend GoodData.CN.

You can also choose a different database, perhaps Postgres or Snowflake, and execute your dbd project on top of it. It will do the same trick ;)

Complete code for this tutorial is available here.

The dbd tool is open-sourced under the permissive BSD license. You can find the code and documentation in the dbd repo.

Analyze COVID data in less than 5 minutes!

Install dbd

Create dbd project

Adding COVID data

Putting data together

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by ZD

No responses yet