How to set up dbt with AI - create your first dbt model without a single line of code
Use cases

How to set up dbt with AI - create your first dbt model without a single line of code

September 2025

Introduction

dbt is one of the go-to tools for transformation in the modern data stack. In this guide, we'll show teams new to dbt how to plan their setup, configure it properly, and leverage AI to fast-track the process.

What is dbt?

dbt is an open-source transformation tool that helps analysts and engineers:

  • Transform raw data into trusted datasets using SQL.
  • Organize transformations into layers (bronze, silver, gold).
  • Add version control, testing, and documentation.
  • Create a clear lineage of data transformations.

Concretely, the impact is that ETL is no longer reserved only for software or data engineers who can write Python - anyone who knows SQL can now create their own ETL workflows.

4 key decisions to prepare your dbt setup

1. How to organize your dev/prod environments

Before using dbt, make sure you:

  • Have your source data already ingested in a database or data warehouse.
  • Define whether your development environment will be a separate dataset or an entirely different project.
  • Determine where dbt should create your production data.

πŸ‘‰ Best practice: always work in a dedicated development schema (dev) separate from production. This avoids accidental overwrites and lets you experiment safely.

2. Choose: dbt Core vs dbt Cloud

There are two versions of dbt you can choose from:

  • dbt Core β†’ free, open-source CLI tool you run locally. You manage the environment, installation, and scheduling yourself.
  • dbt Cloud β†’ SaaS offering with a UI, built-in scheduler, IDE, and easier team collaboration.

Pros and cons:

dbt Core (CLI) dbt Cloud (SaaS)
Cost Free Free for 1 user, then $100/user
Scheduling You handle orchestration (Airflow, GitHub Actions) Built-in scheduler
Tooling Local IDE + extensions Cloud IDE
AI features AI IDEs (nao) Limited

3. Choose your coding tool

Options you can work with:

  • dbt Cloud β†’ Run dbt commands and preview data directly in the browser.
  • VS Code β†’ Local setup that requires Python and dbt extension.
  • nao β†’ All-in-one local setup with dbt, environment management, and AI copilot built in.

4. Define your data modeling strategy

It is important to define your data modeling and layering strategy from the start to build a structure that will scale.

Data layers help separate datasets by their quality and level of preparation, while also controlling which users can access which data.

The typical approach used by data teams is the medallion architecture (bronze / silver / gold):

Layer Description Naming Convention
Bronze - Direct mapping of raw tables.
- Clean column names, apply type casting, handle nulls.
- Goal: make raw data usable, but not business-ready.
stg_<table_name>
Silver - Joins and enrichments across staging models.
- Apply business rules (e.g., filter out test accounts, standardize country codes).
- Fact and dimension tables that BI tools connect to.
fct_<topic> (facts) and dim_<topic> (dimensions)
Gold - Final layer for analysis and reporting.
- Aggregated data for self-serve analytics
- Metrics-ready
mart_<KPIs>

πŸ‘‰ Facts vs Dimensions:

  • Fact tables = events or transactions (e.g., orders, messages, pageviews). They are long tables with many rows, capturing measurable activities.
  • Dimension tables = descriptive context (e.g., users, products, dates). They are wide tables with attributes that add context to facts.

In a typical setup, business users only access the gold layer (which BI tools connect to), while data power users can also access the silver layer for more ad-hoc, in-depth analysis. This structure ensures business users never work directly with raw, uncleaned data, reducing the risk of wrong numbers.

Creating your first dbt model

This guide covers setting up dbt Core on BigQuery, following a medallion architecture (bronze β†’ silver β†’ gold) to organize your models and transformations.

1. Create a project folder

Open a blank folder on your computer using your preferred code editor and create a directory for your dbt project. This is just organizing your workspace - think of it as your "dbt repo," where all models, config files, and tests will live.

2. Setup up a virtual environment and install dbt with dependencies

Make sure Python is already installed on your system. Check your version by running:

python --version

A Python virtual environment provides an isolated space for your project's dependencies, preventing conflicts with other projects.

python -m venv dbt_venv
# Windows
.\dbt_venv\Scripts\activate
# Mac/Linux
source dbt_venv/bin/activate

Then install dbt and the BigQuery adapter:

pip install dbt-bigquery

3. Initialize dbt project

Run:

dbt init

This command sets up the structure for your new dbt project and will ask you to provide information including:

  • Project name
  • Target schema (e.g., dev)
  • Database connection (e.g. BigQuery)

After running it, dbt automatically creates:

Finally, verify connectivity with:

dbt debug

πŸ‘‰ dbt debug checks that dbt can connect to your database and confirms that your project and profile are correctly configured. It helps you identify authentication or configuration issues before you start building models - Learn more in dbt docs: About dbt debug command

4. Write your first model

a) Define sources (sources.yml):

Declare your raw tables in sources.yml file to tell dbt where your data lives and reference it in your models without hardcoding database or schema names.

version: 2

sources:
  - name: raw
    description: "Raw tables imported from the source system, untransformed and ready for staging."
    tables:
      - name: messages
        description: "Table containing raw message data, including sender, recipient, timestamps, and subject."
        columns:
          - name: id
            description: "Unique identifier for each message."
          - name: sender
            description: "Email or user ID of the sender."
          - name: recipient
            description: "Email or user ID of the recipient."
          - name: sentAt
            description: "Timestamp when the message was sent."
          - name: subject
            description: "Subject line of the message."

b) Create staging model (stg_messages.sql):

Use a Common Table Expression (CTE) to keep the query organized and easier to extend.

with raw as (
  select * from {{ source('raw', 'messages') }}
)

select
  id as message_id,
  lower(trim(sender)) as sender_email,
  lower(trim(recipient)) as recipient_email,
  cast(sentAt as timestamp) as sent_timestamp,
  subject
from raw

About source(): references a table defined in sources.yml instead of hardcoding paths, improving maintainability and tracking lineage. Example: {{ source('raw', 'messages') }} uses the messages table from the raw source - Learn more in dbt official documentation: About source function

c) Create fact table (fct_messages.sql):

Following the same logic:

with stg as (
  select * from {{ ref('stg_messages') }}
)

select
  message_id,
  sender_email,
  recipient_email,
  date_trunc(sent_timestamp, month) as month,
  subject,
  length(subject) as subject_length
from stg

About ref(): dbt compiles this into a select from the referenced model's output, ensuring lineage tracking and environment portability - Learn more in dbt official documentation: About ref function

πŸ‘‰ Best practice: Always build silver models (fact/dimension) on top of bronze (staging) models using ref(), never directly from source tables. This keeps transformations modular, consistent, and easier to maintain.

d) Add documentation & tests (schema.yml):

This file documents your models and applies built-in tests to validate data quality - Learn more in dbt docs: Add data tests to your DAG

version: 2
models:
  - name: fct_messages
    description: "Fact table for analyzing messages"
    columns:
      - name: message_id
        description: "Unique identifier of the message"
        data tests: 
          - not_null
          - unique
      - name: sender_email
        description: "Email of the sender"
        data tests: 
          - not_null
          - unique

5. Run and validate

Once your models and tests are ready, you can materialize the models and validate the data.

dbt run

What this does:

  • Compiles your .sql models into raw SQL statements.
  • Executes them in your data warehouse, creating views or tables.
  • Example: stg_messages becomes a view, fct_messages becomes a table.
dbt test

What this does:

  • Runs the tests defined in your schema.yml.
  • Example: verifies message_id is never null and always unique.

πŸ‘‰ Other useful dbt commands:

  • dbt build β†’ Runs all models, tests, snapshots, and seeds in one command.
  • dbt run -s model_name β†’ Runs only the specified model (model_name).
  • dbt run -s +model_name β†’ Runs the specified model and all its upstream dependencies.
  • dbt run -s model_name+ β†’ Runs the specified model and all its downstream dependencies.

You can find the complete list of dbt commands on the official dbt documentation: Complete list of dbt commands

6. Push your dbt project to Git

Finally, you can version-control your project with Git:

git init
git add .
git commit -m "Initial dbt project with staging and fact models"
git remote add origin <your-repo-url>
git push -u origin main

Or using your IDE:

  1. Initialize a Git repository (often via a "Git: Initialize Repository" button).
  2. Stage all project files for commit.
  3. Commit with a descriptive message (e.g., "Initial dbt project with staging and fact models").
  4. Connect to your remote repository (GitHub, GitLab, etc.) and push the commit.

How to use AI to set up dbt faster

AI can 10x the process of building a data model directly from the context of your raw data.

nao simplifies this by packaging your data connection, dbt setup, and an AI copilot in one seamless workflow. The AI copilot has context of your codebase and data warehouse, and leverages tools around dbt to generate models, documentation, and environment setup automatically.

Here's an example workflow:

  • Prompt the agent with a clear description of your dbt setup and the models you want to create.
  • Let nao Agent generate the repository, Python environment, source definitions, staging and fact tables, and documentation automatically.

See it in action πŸ‘‡

Example prompt:

I want to init a dbt repository for **<your company/project name>**  

Steps:
- init the dbt repository
- check the data schema I have in **<your data warehouse name>**
- create staging and fct tables for **<your available datasets>**
- write the commands for me to create the python virtual env with dbt

Output:
I want a fct table **<your fact table name>** to do analytics on **<your dataset context, e.g., emails, orders>**

Setup:
Use the '<your dev dataset/schema>' dataset in **<your data warehouse>** for dbt development
Start with the simplest dbt setup - no extra packages 
Keep all data in **<your region, e.g., EU>**

πŸ‘‰ Try nao free for two weeks and fast-track your dbt setup – download here!

Conclusion

Setting up dbt may seem detailed the first time, but the workflow is always the same:

  1. Prepare your environment.
  2. Structure your project into bronze/silver/gold layers.
  3. Define sources, write staging models, create marts, and add tests.
  4. Run and validate your models.

Once you master these basics, you can layer on automation, testing, and AI-driven acceleration to make your workflow even faster.

πŸ”— If you're into exploring AI use cases in data, you can check out more resources on nao's documentation, use case examples, and join our Slack community to connect with other data professionals speeding up their data work with AI.


Frequently Asked Questions

What is dbt?

dbt (data build tool) is an open-source framework that lets analysts and engineers transform raw data in a data warehouse using SQL, while adding version control, testing, and documentation.

What is the difference between dbt Core and dbt Cloud?

dbt Core is the free, open-source command-line version you run locally, while dbt Cloud is a managed SaaS offering with a web-based IDE, built-in scheduler, and easier team collaboration.

What is a dbt model?

A dbt model is a single SQL file that defines a transformation, resulting in a table or view in your warehouse. Models can reference sources or other models.

What is the purpose of dbt sources and {{ source() }}?

Sources define the raw tables in your warehouse. Using {{ source() }} in SQL references these tables without hardcoding schema names, improving maintainability and enabling lineage tracking.

What is the purpose of {{ ref() }} in dbt?

ref() references other dbt models and builds dependencies. dbt uses it to determine the order of model execution and to track lineage automatically.

What are staging, intermediate, and mart layers in dbt?

dbt projects often use the medallion architecture:

  • Bronze (Staging): clean raw data for usability.
  • Silver (Dim/Fact): join, enrich, or apply business rules.
  • Gold (Marts): analytics-ready aggregated data.

How do I test my dbt models?

Use dbt test with built-in tests (e.g., not_null, unique) defined in schema.yml to validate data quality and catch issues early.

Can I use dbt with any data warehouse?

dbt supports many warehouses including BigQuery, Snowflake, Redshift, Postgres, and more. You just need the corresponding adapter (e.g., dbt-bigquery).

Do I need Python to run dbt?

Yes, dbt is a Python-based tool. Python 3.8+ is required, and it's best to use a virtual environment to isolate dbt dependencies.

How can AI accelerate my dbt workflow?

Tools like nao can automate environment setup, source and model creation, documentation, and even materialization of tables, saving time and reducing errors.

How do I structure my dbt project for best practices?

Organize your project with clear folders for staging, intermediate, and mart layers. Use consistent naming conventions (stg_, int_, fct_, dim_) and keep your SQL modular and well-documented.

What is the best way to start learning dbt?

Begin with dbt Core to understand project structure, sources, models, and tests. Then explore dbt Cloud for collaboration and automation. Following tutorials with example datasets (like messages or orders) helps build practical skills - Explore official dbt learning resources

Which code editor can I use to work on dbt?

You can use dbt Cloud (web IDE), VS Code locally, or nao, which bundles dbt, environment setup, and an AI copilot.

What is nao?

nao is a local AI development environment and agent for data teams. It connects to your warehouse, helps generate dbt models and docs, and runs commands from one interface.

Does nao work with dbt Core or Cloud?

nao works with dbt Core locally. If you're using dbt Cloud, you can still work locally on nao. You'll just need to define local dbt profile.

Does nao have access to my data?

nao runs locally and connects to your warehouse using your credentials. Queries execute in your environment; no raw data leaves your setup unless you explicitly allows the agent to access it.

nao team