Blog/use cases

How to set up dbt with AI - create your first dbt model without a single line of code

Complete step-by-step guide to setting up dbt, creating your first models, and leveraging AI to accelerate your data transformation workflow

1 September 2025

By ClaireCo-founder & CEO

Introduction
What is dbt?
4 key decisions to prepare your dbt setup
Creating your first dbt model
How to use AI to set up dbt faster
Conclusion
Frequently Asked Questions

Introduction

dbt is one of the go-to tools for transformation in the modern data stack. In this guide, we'll show teams new to dbt how to plan their setup, configure it properly, and leverage AI to fast-track the process.

What is dbt?

dbt is an open-source transformation tool that helps analysts and engineers:

Transform raw data into trusted datasets using SQL.
Organize transformations into layers (bronze, silver, gold).
Add version control, testing, and documentation.
Create a clear lineage of data transformations.

Concretely, the impact is that ETL is no longer reserved only for software or data engineers who can write Python - anyone who knows SQL can now create their own ETL workflows.

4 key decisions to prepare your dbt setup

1. How to organize your dev/prod environments

Before using dbt, make sure you:

Have your source data already ingested in a database or data warehouse.
Define whether your development environment will be a separate dataset or an entirely different project.
Determine where dbt should create your production data.

👉 Best practice: always work in a dedicated development schema (dev) separate from production. This avoids accidental overwrites and lets you experiment safely.

2. Choose: dbt Core vs dbt Cloud

There are two versions of dbt you can choose from:

dbt Core → free, open-source CLI tool you run locally. You manage the environment, installation, and scheduling yourself.
dbt Cloud → SaaS offering with a UI, built-in scheduler, IDE, and easier team collaboration.

Pros and cons:

	dbt Core (CLI)	dbt Cloud (SaaS)
Cost	Free	Free for 1 user, then $100/user
Scheduling	You handle orchestration (Airflow, GitHub Actions)	Built-in scheduler
Tooling	Local IDE + extensions	Cloud IDE
AI features	AI IDEs (nao)	Limited

3. Choose your coding tool

Options you can work with:

dbt Cloud → Run dbt commands and preview data directly in the browser.
VS Code → Local setup that requires Python and dbt extension.
nao → All-in-one local setup with dbt, environment management, and AI copilot built in.

4. Define your data modeling strategy

It is important to define your data modeling and layering strategy from the start to build a structure that will scale.

Data layers help separate datasets by their quality and level of preparation, while also controlling which users can access which data.

The typical approach used by data teams is the medallion architecture (bronze / silver / gold):

Layer	Description	Naming Convention
Bronze	- Direct mapping of raw tables. - Clean column names, apply type casting, handle nulls. - Goal: make raw data usable, but not business-ready.	`stg_<table_name>`
Silver	- Joins and enrichments across staging models. - Apply business rules (e.g., filter out test accounts, standardize country codes). - Fact and dimension tables that BI tools connect to.	`fct_<topic>` (facts) and `dim_<topic>` (dimensions)
Gold	- Final layer for analysis and reporting. - Aggregated data for self-serve analytics - Metrics-ready	`mart_<KPIs>`

👉 Facts vs Dimensions:

Fact tables = events or transactions (e.g., orders, messages, pageviews). They are long tables with many rows, capturing measurable activities.
Dimension tables = descriptive context (e.g., users, products, dates). They are wide tables with attributes that add context to facts.

In a typical setup, business users only access the gold layer (which BI tools connect to), while data power users can also access the silver layer for more ad-hoc, in-depth analysis. This structure ensures business users never work directly with raw, uncleaned data, reducing the risk of wrong numbers.

Creating your first dbt model

This guide covers setting up dbt Core on BigQuery, following a medallion architecture (bronze → silver → gold) to organize your models and transformations.

1. Create a project folder

Open a blank folder on your computer using your preferred code editor and create a directory for your dbt project. This is just organizing your workspace - think of it as your "dbt repo," where all models, config files, and tests will live.

2. Setup up a virtual environment and install dbt with dependencies

Make sure Python is already installed on your system. Check your version by running:

bash

python --version

A Python virtual environment provides an isolated space for your project's dependencies, preventing conflicts with other projects.

bash

python -m venv dbt_venv
# Windows
.\dbt_venv\Scripts\activate
# Mac/Linux
source dbt_venv/bin/activate

Then install dbt and the BigQuery adapter:

bash

pip install dbt-bigquery

3. Initialize dbt project

Run:

bash

dbt init

This command sets up the structure for your new dbt project and will ask you to provide information including:

Project name
Target schema (e.g., dev)
Database connection (e.g. BigQuery)

After running it, dbt automatically creates:

dbt_project.yml → project-level configuration such as model paths, naming conventions, and version - Learn more in dbt docs: dbt_project.yml
profiles.yml (in ~/.dbt/) → stores database connection settings you provided during setup - Learn more in dbt docs: About profiles.yml

Finally, verify connectivity with:

bash

dbt debug

👉 dbt debug checks that dbt can connect to your database and confirms that your project and profile are correctly configured. It helps you identify authentication or configuration issues before you start building models - Learn more in dbt docs: About dbt debug command

4. Write your first model

a) Define sources (sources.yml):

Declare your raw tables in sources.yml file to tell dbt where your data lives and reference it in your models without hardcoding database or schema names.

yaml

version: 2

sources:
  - name: raw
    description: "Raw tables imported from the source system, untransformed and ready for staging."
    tables:
      - name: messages
        description: "Table containing raw message data, including sender, recipient, timestamps, and subject."
        columns:
          - name: id
            description: "Unique identifier for each message."
          - name: sender
            description: "Email or user ID of the sender."
          - name: recipient
            description: "Email or user ID of the recipient."
          - name: sentAt
            description: "Timestamp when the message was sent."
          - name: subject
            description: "Subject line of the message."

b) Create staging model (stg_messages.sql):

Use a Common Table Expression (CTE) to keep the query organized and easier to extend.

sql

with raw as (
  select * from {{ source('raw', 'messages') }}
)

select
  id as message_id,
  lower(trim(sender)) as sender_email,
  lower(trim(recipient)) as recipient_email,
  cast(sentAt as timestamp) as sent_timestamp,
  subject
from raw

About source(): references a table defined in sources.yml instead of hardcoding paths, improving maintainability and tracking lineage. Example: {{ source('raw', 'messages') }} uses the messages table from the raw source - Learn more in dbt official documentation: About source function

c) Create fact table (fct_messages.sql):

Following the same logic:

sql

with stg as (
  select * from {{ ref('stg_messages') }}
)

select
  message_id,
  sender_email,
  recipient_email,
  date_trunc(sent_timestamp, month) as month,
  subject,
  length(subject) as subject_length
from stg

About ref(): dbt compiles this into a select from the referenced model's output, ensuring lineage tracking and environment portability - Learn more in dbt official documentation: About ref function

👉 Best practice: Always build silver models (fact/dimension) on top of bronze (staging) models using ref(), never directly from source tables. This keeps transformations modular, consistent, and easier to maintain.

d) Add documentation & tests (schema.yml):

This file documents your models and applies built-in tests to validate data quality - Learn more in dbt docs: Add data tests to your DAG

yaml

version: 2
models:
  - name: fct_messages
    description: "Fact table for analyzing messages"
    columns:
      - name: message_id
        description: "Unique identifier of the message"
        data tests:
          - not_null
          - unique
      - name: sender_email
        description: "Email of the sender"
        data tests:
          - not_null
          - unique

5. Run and validate

Once your models and tests are ready, you can materialize the models and validate the data.

bash

dbt run

What this does:

Compiles your .sql models into raw SQL statements.
Executes them in your data warehouse, creating views or tables.
Example: stg_messages becomes a view, fct_messages becomes a table.

bash

dbt test

What this does:

Runs the tests defined in your schema.yml.
Example: verifies message_id is never null and always unique.

👉 Other useful dbt commands:

dbt build → Runs all models, tests, snapshots, and seeds in one command.
dbt run -s model_name → Runs only the specified model (model_name).
dbt run -s +model_name → Runs the specified model and all its upstream dependencies.
dbt run -s model_name+ → Runs the specified model and all its downstream dependencies.

You can find the complete list of dbt commands on the official dbt documentation: Complete list of dbt commands

6. Push your dbt project to Git

Finally, you can version-control your project with Git:

bash

git init
git add .
git commit -m "Initial dbt project with staging and fact models"
git remote add origin <your-repo-url>
git push -u origin main

Or using your IDE:

Initialize a Git repository (often via a "Git: Initialize Repository" button).
Stage all project files for commit.
Commit with a descriptive message (e.g., "Initial dbt project with staging and fact models").
Connect to your remote repository (GitHub, GitLab, etc.) and push the commit.

How to use AI to set up dbt faster

AI can 10x the process of building a data model directly from the context of your raw data.

nao simplifies this by packaging your data connection, dbt setup, and an AI copilot in one seamless workflow. The AI copilot has context of your codebase and data warehouse, and leverages tools around dbt to generate models, documentation, and environment setup automatically.

Here's an example workflow:

Prompt the agent with a clear description of your dbt setup and the models you want to create.
Let nao Agent generate the repository, Python environment, source definitions, staging and fact tables, and documentation automatically.

See it in action 👇

Example prompt:

text

I want to init a dbt repository for **<your company/project name>**

Steps:
- init the dbt repository
- check the data schema I have in **<your data warehouse name>**
- create staging and fct tables for **<your available datasets>**
- write the commands for me to create the python virtual env with dbt

Output:
I want a fct table **<your fact table name>** to do analytics on **<your dataset context, e.g., emails, orders>**

Setup:
Use the '<your dev dataset/schema>' dataset in **<your data warehouse>** for dbt development
Start with the simplest dbt setup - no extra packages
Keep all data in **<your region, e.g., EU>**

👉 Try nao free for two weeks and fast-track your dbt setup – download here!

Conclusion

Setting up dbt may seem detailed the first time, but the workflow is always the same:

Prepare your environment.
Structure your project into bronze/silver/gold layers.
Define sources, write staging models, create marts, and add tests.
Run and validate your models.

Once you master these basics, you can layer on automation, testing, and AI-driven acceleration to make your workflow even faster. A well-structured dbt project is also the fastest path to reliable AI analytics — read How to Build Production-Ready AI Agents for Data Analytics to see how dbt lineage powers agent context.

🔗 If you're into exploring AI use cases in data, you can check out more resources on nao's documentation, use case examples, and join our Slack community to connect with other data professionals speeding up their data work with AI.

Claire

For nao team

Frequently Asked Questions

product updates

How to set up dbt with AI - create your first dbt model without a single line of code

Table of contents

Introduction

What is dbt?

4 key decisions to prepare your dbt setup

1. How to organize your dev/prod environments

2. Choose: dbt Core vs dbt Cloud

3. Choose your coding tool

4. Define your data modeling strategy

Creating your first dbt model

1. Create a project folder

2. Setup up a virtual environment and install dbt with dependencies

3. Initialize dbt project

4. Write your first model

5. Run and validate

6. Push your dbt project to Git

How to use AI to set up dbt faster

Conclusion

Frequently Asked Questions

Related articles

We're launching the first Open Source Analytics Agent Builder

Launching nao Context Engineering skills

Agentic Analytics Meetup Paris 🇫🇷