Add a resource
We've now created our own assets and combined them with assets from a component. In this step, we will revisit the ingestion assets we defined, and include another Dagster object to assist with managing our database connection with DuckDB. Currently, each of our assets handles each connection separately, but this resource will allow us to centralize our connection to DuckDB into a single object so the connection can be shared across all our assets.
1. Define the DuckDB resource
In Dagster, resources are reusable components that provide external context or functionality such as database connections, clients, or configurations. Resources can be used by a number of different Dagster objects, but we will first apply them to our assets.
First, we will need to install the dagster-duckdb
library:
uv pip install dagster-duckdb pandas
Next, we need to scaffold our resources object with dg
:
dg scaffold defs dagster.resources resources.py
This adds a file, resources.py
, to the etl_tutorial
module:
src
└── etl_tutorial
└── defs
└── resources.py
Within this file, we will define our resources using the @dg.Definitions
.
from dagster_duckdb import DuckDBResource
import dagster as dg
database_resource = DuckDBResource(database="/tmp/jaffle_platform.duckdb")
@dg.definitions
def resources():
return dg.Definitions(
resources={
"duckdb": database_resource,
}
)
2. Add a resource to our assets
With our resource defined, we need to update our asset code. Since all of our ingestion assets rely on the import_url_to_duckdb
to execute the query, we will first update that function to use the DuckDBResource
to handle query execution:
from dagster_duckdb import DuckDBResource
def import_url_to_duckdb(url: str, duckdb: DuckDBResource, table_name: str):
with duckdb.get_connection() as conn:
row_count = conn.execute(
f"""
create or replace table {table_name} as (
select * from read_csv_auto('{url}')
)
"""
).fetchone()
assert row_count is not None
row_count = row_count[0]
The DuckDBResource
is designed to handle concurrent queries, so we no longer need the serialize_duckdb_query
function.
Now we can update the assets themselves. Each asset will now include a DuckDBResource
input parameter set to duckdb
(which is the key we set in resources.py
):
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_customers"],
)
def raw_customers(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_customers",
)
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_orders"],
)
def raw_orders(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_orders",
)
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_payments"],
)
def raw_payments(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_payments",
)
The DuckDBResource
connection will then be passed to the import_url_to_duckdb
responsible for running the query.
Back in the UI, your assets will not appear any different, but you can view the resource in the Definitions tab:
- Click Deployment, then click "etl-tutorial" to see your deployment.
- Click Definitions.
- Navigate to the "Resources" section to view all of your resources and select "duckdb".
You can see that this resource has three uses that line up with our three assets:
Summary
We have now introduced resources for our project. The etl_tutorial
module should look like this:
src
└── etl_tutorial
├── __init__.py
├── definitions.py
└── defs
├── __init__.py
├── assets.py
├── resources.py
└── transform
└── defs.yaml
Resources are very helpful as projects grow more complex, as they help ensure that all assets are using the same connection details and reduce the amount of custom code that needs to be written. We will also see that resources can be used by other Dagster objects.
Next steps
In the next step, we will ensure data quality with asset checks.