Fine-tune an LLM
note
To see video of this example
In this example, you'll build a pipeline with Dagster that:
- Loads a public Goodreads JSON dataset into DuckDB
- Performs feature engineering to enhance the data
- Creates and validates the data files needed for an OpenAI fine-tuning job
- Generate a custom model and validate it
Prerequisites
To follow the steps in this guide, you'll need:
- Basic Python knowledge
- Python 3.9+ installed on your system. Refer to the Installation guide for information.
- Familiarity with SQL and Python data manipulation libraries, such as Pandas.
- Understanding of data pipelines and the extract, transform, and load process (ETL).
Step 1: Set up your Dagster environment
First, set up a new Dagster project.
-
Clone the Dagster repo and navigate to the project:
cd examples/docs_projects/project_llm_fine_tune
-
Install the required dependencies with
uv
:uv sync
-
Activate the virtual environment:
- MacOS
- Windows
source .venv/bin/activate
.venv\Scripts\activate
Step 2: Launch the Dagster webserver
To make sure Dagster and its dependencies were installed correctly, navigate to the project root directory and start the Dagster webserver:
dg dev
Next steps
- Continue this example with ingestion