Skip to main content

Use Components within an existing project

info

dg and Dagster Components are under active development. You may encounter feature gaps, and the APIs may change. To report issues or give feedback, please join the #dg-components channel in the Dagster Community Slack.

note

This guide shows how to add Components to an existing Dagster project without fully migrating to the Components architecture. If you want to fully migrate your project to be Components-compatible, see "Converting an existing project".

Sometimes you want to use Components in your existing Dagster project without fully migrating to the Components architecture. This is useful when you want to leverage the power of Components for specific functionality while keeping your existing project structure intact, or for testing out Components in a project before committing to adjusting the project structure.

Example project structure

Let's walk through an example with an existing project that has the following structure:

tree
.
├── my_existing_project
│   ├── __init__.py
│   ├── analytics
│   │   ├── __init__.py
│   │   ├── assets.py
│   │   └── jobs.py
│   ├── definitions.py
│   └── elt
│   ├── __init__.py
│   ├── assets.py
│   └── jobs.py
├── pyproject.toml
└── README.md

4 directories, 10 files

This project has existing assets and jobs organized in analytics and elt modules, with a top-level definitions.py file that loads everything together:

my_existing_project/definitions.py


from pathlib import Path
import dagster as dg
from my_existing_project.analytics import assets as analytics_assets
from my_existing_project.analytics.jobs import (
regenerate_analytics_hourly_schedule,
regenerate_analytics_job,
)
from my_existing_project.elt import assets as elt_assets
from my_existing_project.elt.jobs import sync_tables_daily_schedule, sync_tables_job

defs = dg.Definitions(
assets=dg.load_assets_from_modules([elt_assets, analytics_assets]),
jobs=[sync_tables_job, regenerate_analytics_job],
schedules=[sync_tables_daily_schedule, regenerate_analytics_hourly_schedule],
)

Add component configuration

For this example, we'll use the dagster-sling component for data replication. Add it to your project's virtual environment:

uv add dagster-sling

The Sling component relies on a Sling replication.yaml file to define how to replicate data. Create a new elt/sling directory to store it:

mkdir -p my_existing_project/elt/sling
my_existing_project/elt/sling/replication.yaml
source: LOCAL
target: DUCKDB

defaults:
mode: full-refresh
object: "{stream_table}"

streams:
file://raw_customers.csv:
object: "sandbox.raw_customers"

Update your definitions

Now, you can configure an instance of the SlingReplicationCollectionComponent in our definitions.py file, and pass it to the build_defs_for_component utility. This function creates a Definitions object from a component, which you can then merge with your existing definitions:

my_existing_project/definitions.py
from pathlib import Path

import dagster_sling
from dagster_sling import (
SlingConnectionResource,
SlingReplicationCollectionComponent,
SlingReplicationSpecModel,
)
from my_existing_project.analytics import assets as analytics_assets
from my_existing_project.analytics.jobs import (
regenerate_analytics_hourly_schedule,
regenerate_analytics_job,
)
from my_existing_project.elt import assets as elt_assets
from my_existing_project.elt.jobs import sync_tables_daily_schedule, sync_tables_job

import dagster as dg

defs = dg.Definitions.merge(
dg.Definitions(
assets=dg.load_assets_from_modules([elt_assets, analytics_assets]),
jobs=[sync_tables_job, regenerate_analytics_job],
schedules=[sync_tables_daily_schedule, regenerate_analytics_hourly_schedule],
),
dg.build_defs_for_component(
component=SlingReplicationCollectionComponent(
connections=[
SlingConnectionResource(
name="DUCKDB",
type="duckdb",
instance="/tmp/jaffle_platform.duckdb",
)
],
replications=[
SlingReplicationSpecModel(
path=(
Path(__file__).parent / "elt" / "sling" / "replication.yaml"
).as_posix(),
)
],
)
),
)

Next steps