Skip to main content

Add a resource

We've now created our own assets and combined them with assets from a component. In this step, we will revisit the ingestion assets we defined, and include another Dagster object to assist with managing our database connection with DuckDB. Currently, each of our assets handles each connection separately, but this resource will allow us to centralize our connection to DuckDB into a single object so the connection can be shared across all our assets.

1. Define the DuckDB resource

In Dagster, resources are reusable components that provide external context or functionality such as database connections, clients, or configurations. Resources can be used by a number of different Dagster objects, but we will first apply them to our assets.

First, we will need to install the dagster-duckdb library:

uv pip install dagster-duckdb pandas

Next, we need to scaffold our resources object with dg:

dg scaffold defs dagster.resources resources.py

This adds a file, resources.py, to the etl_tutorial module:

src
└── etl_tutorial
└── defs
└── resources.py

Within this file, we will define our resources using the @dg.Definitions.

src/etl_tutorial/defs/resources.py
from dagster_duckdb import DuckDBResource

import dagster as dg

database_resource = DuckDBResource(database="/tmp/jaffle_platform.duckdb")


@dg.definitions
def resources():
return dg.Definitions(
resources={
"duckdb": database_resource,
}
)

2. Add a resource to our assets

With our resource defined, we need to update our asset code. Since all of our ingestion assets rely on the import_url_to_duckdb to execute the query, we will first update that function to use the DuckDBResource to handle query execution:

src/etl_tutorial/defs/assets/py
from dagster_duckdb import DuckDBResource


def import_url_to_duckdb(url: str, duckdb: DuckDBResource, table_name: str):
with duckdb.get_connection() as conn:
row_count = conn.execute(
f"""
create or replace table {table_name} as (
select * from read_csv_auto('{url}')
)
"""
).fetchone()
assert row_count is not None
row_count = row_count[0]

The DuckDBResource is designed to handle concurrent queries, so we no longer need the serialize_duckdb_query function.

Now we can update the assets themselves. Each asset will now include a DuckDBResource input parameter set to duckdb (which is the key we set in resources.py):

src/etl_tutorial/defs/assets.py
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_customers"],
)
def raw_customers(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_customers",
)


@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_orders"],
)
def raw_orders(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_orders",
)


@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_payments"],
)
def raw_payments(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_payments",
)

The DuckDBResource connection will then be passed to the import_url_to_duckdb responsible for running the query.

Back in the UI, your assets will not appear any different, but you can view the resource in the Definitions tab:

  1. Click Deployment, then click "etl-tutorial" to see your deployment.
  2. Click Definitions.
  3. Navigate to the "Resources" section to view all of your resources and select "duckdb".

2048 resolution

You can see that this resource has three uses that line up with our three assets:

2048 resolution

Summary

We have now introduced resources for our project. The etl_tutorial module should look like this:

src
└── etl_tutorial
├── __init__.py
├── definitions.py
└── defs
├── __init__.py
├── assets.py
├── resources.py
└── transform
└── defs.yaml

Resources are very helpful as projects grow more complex, as they help ensure that all assets are using the same connection details and reduce the amount of custom code that needs to be written. We will also see that resources can be used by other Dagster objects.

Next steps

In the next step, we will ensure data quality with asset checks.