Skip to main content

Fine-tune an LLM

In this example, you'll build a pipeline with Dagster that:

  • Loads a public Goodreads JSON dataset into DuckDB
  • Performs feature engineering to enhance the data
  • Creates and validates the data files needed for an OpenAI fine-tuning job
  • Generate a custom model and validate it
Prerequisites

To follow the steps in this guide, you'll need:

  • Basic Python knowledge
  • Python 3.9+ installed on your system. Refer to the Installation guide for information.
  • Familiarity with SQL and Python data manipulation libraries, such as Pandas.
  • Understanding of data pipelines and the extract, transform, and load process (ETL).

Step 1: Set up your Dagster environment

First, set up a new Dagster project.

  1. Clone the Dagster repo and navigate to the project:

    cd examples/docs_projects/project_llm_fine_tune
  2. Install the required dependencies with uv:

    uv sync
  3. Activate the virtual environment:

    source .venv/bin/activate

Step 2: Launch the Dagster webserver

To make sure Dagster and its dependencies were installed correctly, navigate to the project root directory and start the Dagster webserver:

dg dev

Next steps