How to set up dbt with AI - create your first dbt model without a single line of code
Complete step-by-step guide to setting up dbt, creating your first models, and leveraging AI to accelerate your data transformation workflow

1 September 2025
By ClaireCo-founder & CEOTable of contents
- Introduction
- What is dbt?
- 4 key decisions to prepare your dbt setup
- Creating your first dbt model
- How to use AI to set up dbt faster
- Conclusion
- Frequently Asked Questions
Introduction
dbt is one of the go-to tools for transformation in the modern data stack. In this guide, we'll show teams new to dbt how to plan their setup, configure it properly, and leverage AI to fast-track the process.
What is dbt?
dbt is an open-source transformation tool that helps analysts and engineers:
- Transform raw data into trusted datasets using SQL.
- Organize transformations into layers (bronze, silver, gold).
- Add version control, testing, and documentation.
- Create a clear lineage of data transformations.
Concretely, the impact is that ETL is no longer reserved only for software or data engineers who can write Python - anyone who knows SQL can now create their own ETL workflows.
4 key decisions to prepare your dbt setup
1. How to organize your dev/prod environments
Before using dbt, make sure you:
- Have your source data already ingested in a database or data warehouse.
- Define whether your development environment will be a separate dataset or an entirely different project.
- Determine where dbt should create your production data.
👉 Best practice: always work in a dedicated development schema (dev) separate from production. This avoids accidental overwrites and lets you experiment safely.
2. Choose: dbt Core vs dbt Cloud
There are two versions of dbt you can choose from:
- dbt Core → free, open-source CLI tool you run locally. You manage the environment, installation, and scheduling yourself.
- dbt Cloud → SaaS offering with a UI, built-in scheduler, IDE, and easier team collaboration.
Pros and cons:
| dbt Core (CLI) | dbt Cloud (SaaS) | |
|---|---|---|
| Cost | Free | Free for 1 user, then $100/user |
| Scheduling | You handle orchestration (Airflow, GitHub Actions) | Built-in scheduler |
| Tooling | Local IDE + extensions | Cloud IDE |
| AI features | AI IDEs (nao) | Limited |
3. Choose your coding tool
Options you can work with:
- dbt Cloud → Run dbt commands and preview data directly in the browser.
- VS Code → Local setup that requires Python and dbt extension.
- nao → All-in-one local setup with dbt, environment management, and AI copilot built in.
4. Define your data modeling strategy
It is important to define your data modeling and layering strategy from the start to build a structure that will scale.
Data layers help separate datasets by their quality and level of preparation, while also controlling which users can access which data.
The typical approach used by data teams is the medallion architecture (bronze / silver / gold):
| Layer | Description | Naming Convention |
|---|---|---|
| Bronze | - Direct mapping of raw tables. - Clean column names, apply type casting, handle nulls. - Goal: make raw data usable, but not business-ready. | stg_<table_name> |
| Silver | - Joins and enrichments across staging models. - Apply business rules (e.g., filter out test accounts, standardize country codes). - Fact and dimension tables that BI tools connect to. | fct_<topic> (facts) and dim_<topic> (dimensions) |
| Gold | - Final layer for analysis and reporting. - Aggregated data for self-serve analytics - Metrics-ready | mart_<KPIs> |
👉 Facts vs Dimensions:
- Fact tables = events or transactions (e.g., orders, messages, pageviews). They are long tables with many rows, capturing measurable activities.
- Dimension tables = descriptive context (e.g., users, products, dates). They are wide tables with attributes that add context to facts.
In a typical setup, business users only access the gold layer (which BI tools connect to), while data power users can also access the silver layer for more ad-hoc, in-depth analysis. This structure ensures business users never work directly with raw, uncleaned data, reducing the risk of wrong numbers.
Creating your first dbt model
This guide covers setting up dbt Core on BigQuery, following a medallion architecture (bronze → silver → gold) to organize your models and transformations.
1. Create a project folder
Open a blank folder on your computer using your preferred code editor and create a directory for your dbt project. This is just organizing your workspace - think of it as your "dbt repo," where all models, config files, and tests will live.
2. Setup up a virtual environment and install dbt with dependencies
Make sure Python is already installed on your system. Check your version by running:
A Python virtual environment provides an isolated space for your project's dependencies, preventing conflicts with other projects.
Then install dbt and the BigQuery adapter:
3. Initialize dbt project
Run:
This command sets up the structure for your new dbt project and will ask you to provide information including:
- Project name
- Target schema (e.g.,
dev) - Database connection (e.g. BigQuery)
After running it, dbt automatically creates:
dbt_project.yml→ project-level configuration such as model paths, naming conventions, and version - Learn more in dbt docs: dbt_project.ymlprofiles.yml(in~/.dbt/) → stores database connection settings you provided during setup - Learn more in dbt docs: About profiles.yml
Finally, verify connectivity with:
👉 dbt debug checks that dbt can connect to your database and confirms that your project and profile are correctly configured. It helps you identify authentication or configuration issues before you start building models - Learn more in dbt docs: About dbt debug command
4. Write your first model
a) Define sources (sources.yml):
Declare your raw tables in sources.yml file to tell dbt where your data lives and reference it in your models without hardcoding database or schema names.
b) Create staging model (stg_messages.sql):
Use a Common Table Expression (CTE) to keep the query organized and easier to extend.
About source(): references a table defined in
sources.ymlinstead of hardcoding paths, improving maintainability and tracking lineage. Example:{{ source('raw', 'messages') }}uses themessagestable from therawsource - Learn more in dbt official documentation: About source function
c) Create fact table (fct_messages.sql):
Following the same logic:
About ref(): dbt compiles this into a select from the referenced model's output, ensuring lineage tracking and environment portability - Learn more in dbt official documentation: About ref function
👉 Best practice: Always build silver models (fact/dimension) on top of bronze (staging) models using ref(), never directly from source tables. This keeps transformations modular, consistent, and easier to maintain.
d) Add documentation & tests (schema.yml):
This file documents your models and applies built-in tests to validate data quality - Learn more in dbt docs: Add data tests to your DAG
5. Run and validate
Once your models and tests are ready, you can materialize the models and validate the data.
What this does:
- Compiles your
.sqlmodels into raw SQL statements. - Executes them in your data warehouse, creating views or tables.
- Example:
stg_messagesbecomes a view,fct_messagesbecomes a table.
What this does:
- Runs the tests defined in your
schema.yml. - Example: verifies
message_idis never null and always unique.
👉 Other useful dbt commands:
dbt build→ Runs all models, tests, snapshots, and seeds in one command.dbt run -s model_name→ Runs only the specified model (model_name).dbt run -s +model_name→ Runs the specified model and all its upstream dependencies.dbt run -s model_name+→ Runs the specified model and all its downstream dependencies.
You can find the complete list of dbt commands on the official dbt documentation: Complete list of dbt commands
6. Push your dbt project to Git
Finally, you can version-control your project with Git:
Or using your IDE:
- Initialize a Git repository (often via a "Git: Initialize Repository" button).
- Stage all project files for commit.
- Commit with a descriptive message (e.g., "Initial dbt project with staging and fact models").
- Connect to your remote repository (GitHub, GitLab, etc.) and push the commit.
How to use AI to set up dbt faster
AI can 10x the process of building a data model directly from the context of your raw data.
nao simplifies this by packaging your data connection, dbt setup, and an AI copilot in one seamless workflow. The AI copilot has context of your codebase and data warehouse, and leverages tools around dbt to generate models, documentation, and environment setup automatically.
Here's an example workflow:
- Prompt the agent with a clear description of your dbt setup and the models you want to create.
- Let nao Agent generate the repository, Python environment, source definitions, staging and fact tables, and documentation automatically.
See it in action 👇
Example prompt:
👉 Try nao free for two weeks and fast-track your dbt setup – download here!
Conclusion
Setting up dbt may seem detailed the first time, but the workflow is always the same:
- Prepare your environment.
- Structure your project into bronze/silver/gold layers.
- Define sources, write staging models, create marts, and add tests.
- Run and validate your models.
Once you master these basics, you can layer on automation, testing, and AI-driven acceleration to make your workflow even faster. A well-structured dbt project is also the fastest path to reliable AI analytics — read How to Build Production-Ready AI Agents for Data Analytics to see how dbt lineage powers agent context.
🔗 If you're into exploring AI use cases in data, you can check out more resources on nao's documentation, use case examples, and join our Slack community to connect with other data professionals speeding up their data work with AI.

Claire
For nao team
Frequently Asked Questions
Related articles
product updates
We're launching the first Open Source Analytics Agent Builder
We're open sourcing nao — an analytics agent framework built on context engineering. Here's our vision for what comes after black-box BI.
Buyer's Guide
What's the Best Analytics Agent for Your Data Team?
A practical guide to choose the best option for you to deploy an analytics agent.
Technical Guide
How to Set Up an AI Analytics Slack Bot with an Open Source Framework
A practical step-by-step guide to set up an AI analytics Slack bot with an open source framework so your team can chat with data directly in Slack.