Dagster & dlt (Component)
The dagster-dlt library provides a DltLoadCollectionComponent
which can be used to easily represent a collection of dlt sources and pipelines as assets in Dagster.
1. Prepare a Dagster project
To begin, you'll need a Dagster project. You can use an existing components-ready project or create a new one:
uvx create-dagster project my-project && cd my-project/src
Activate the project virtual environment:
source ../.venv/bin/activate
Finally, add the dagster-dlt
library to the project:
uv add dagster-dlt
2. Scaffold a dlt component
Now that you have a Dagster project, you can scaffold a dlt component. You may optionally provide the source and destination types, which will pull in the appropriate dlt source:
dg scaffold defs dagster_dlt.DltLoadCollectionComponent github_snowflake_ingest \
--source github --destination snowflake
The scaffold call will generate a basic defs.yaml
file and a loads.py
file:
tree my_project/defs
my_project/defs
├── __init__.py
└── github_snowflake_ingest
├── defs.yaml
├── github
│ ├── __init__.py
│ ├── helpers.py
│ ├── queries.py
│ ├── README.md
│ └── settings.py
└── loads.py
3 directories, 8 files
The loads.py
file contains a skeleton dlt source and pipeline which are referenced by Dagster, but can also be run directly using dlt:
import dlt
@dlt.source
def my_source():
@dlt.resource
def hello_world():
yield "hello, world!"
return hello_world
my_load_source = my_source()
my_load_pipeline = dlt.pipeline(destination="snowflake")
Each of these sources and pipelines are referenced by a fully scoped Python identifier in the defs.yaml
file, pairing them into a set of loads:
type: dagster_dlt.DltLoadCollectionComponent
attributes:
loads:
- source: .loads.my_load_source
pipeline: .loads.my_load_pipeline