Basics of Git data analysts should learn
Use cases

Basics of Git data analysts should learn

October 2025

Introduction

If you work in data and have ever struggled with messy SQL queries, Jupyter notebooks, or Python scripts, Git will bring order to your workflow. This guide covers Git essentials for data professionals, a practical workflow, and how AI can help automate much of it.

What is Git & why it matters for Data People

Git is a version control system - it tracks changes to your code and lets you collaborate safely with teammates.

For data teams, Git means:

  • Version control for SQL queries, Python scripts, R scripts, and notebooks.
  • Collaborating without overwriting each other's work.
  • Tracking changes and rolling back mistakes easily.
  • Reviewing and approving changes before they affect production dashboards.

If you've ever saved a file as query_final_v3_clean_reallyFINAL.sql, Git fixes that.

Git basics you should know

Where code lives

Git organizes your project across four areas:

  1. Working directory → your local files.
  2. Staging area → files you've marked for the next snapshot.
  3. Local repository → history of commits on your computer.
  4. Remote repository → shared repo on GitHub, GitLab, or Bitbucket.

These Git states interact with each other through Git commands:

Git workflow diagram

Core Git workflow (Hands-on example)

Sometimes, you want to make changes to your repo but don't want to mess everything up. Here's how to do it safely using Git.

1. Create a branch - isolate your work

Use a branch for each report, analysis, or dataset update so main stays stable.

git checkout -b report/monthly-sales

Creates and switches to a new branch called report/monthly-sales.

2. Make changes - edit files

Edit SQL, Python, or Jupyter notebooks in your working directory.

These edits are local until you stage/commit them.

3. Check status - see what changed

Quick check before staging:

git status

Shows modified/untracked files so you know what to add.

4. Stage changes - prepare the snapshot

Select files for the next commit.

git add path/to/file.sql
# or stage everything:
git add .

Moves changes into the staging area; lets you control exactly what will be committed.

5. Commit - save a snapshot

Create a recorded snapshot with a message explaining why you changed things.

git commit -m "Add monthly sales report"

Creates a local commit (a versioned checkpoint). Good commit messages help teammates understand intent.

6. Push - share your branch

Send your commits to the remote repo so others can see your work.

git push origin report/monthly-sales

Uploads your branch and commits to the remote (GitHub/GitLab).

7. Open a Pull Request (PR) - propose and review changes

In GitHub/GitLab, open a PR from your branch into main (or the target branch).

Describe the purpose, list screenshots or sample queries, and link related issues. This is where reviewers ask questions and request edits.

8. Review & merge - finalize into main

After approval, merge the PR through the web UI (or via CLI).

If merging locally:

git checkout main
git pull origin main        # ensure main is up to date
git merge report/monthly-sales
git push origin main

Combine changes into main and push the updated main branch.

How AI can help: Automating Git with nao

nao packages your Git workflow with an AI copilot that understands your project. For example, instead of running commands manually, you can prompt nao to do it for you.

Example prompt:

Ops team want to have country_name in this model

Steps:
- Create a new branch
- Add the column in the model  
- Commit with message & push
- Create the PR with a full description

nao handles branching, committing, pushing, and even drafting PR descriptions automatically. It saves data people from remembering every command and lets you focus on your analysis.

See it in action 👇

Conclusion

Git helps data people collaborate safely, track changes, and maintain reproducible workflows. To make the most of it:

  • Commit often with clear messages (e.g., "Add monthly_sales.sql").
  • Use branches for experiments or feature work.
  • Never commit raw datasets - rely on queries, connections, or .gitignore.
  • Write clear PRs to explain your changes.
  • Sync frequently (git pull) to avoid conflicts.

Following these practices, and leveraging tools like nao to automate routine tasks, lets you focus on the actual analysis instead of version control.


Frequently Asked Questions

I'm not comfortable with the terminal - what should I do?

Use Git inside your IDE (e.g., VS Code, PyCharm) or GitHub's web UI. Same concepts, just point-and-click.

What is the difference between Git and GitHub?

Git is a version control system for tracking code changes locally. GitHub is a cloud platform to host Git repositories and collaborate.

Do I need GitHub to use Git?

No. Git works locally, but GitHub/GitLab/Bitbucket make collaboration easier.

Can I use Git for SQL queries and notebooks?

Yes. Git works with any text-based files, including SQL, Python, R scripts, and Jupyter notebooks.

How do I revert a commit?

You can use git revert <commit> to create a new commit that undoes changes, or git reset to move the branch pointer.

What is a Git branch and why should I use one?

A branch is a parallel workspace to test features or experiments without affecting the main code.

How often should I commit my changes?

Commit often with meaningful messages for easier tracking and collaboration.

What is the staging area?

The staging area lets you select which changes are included in your next commit.

Can I use Git without coding experience?

Yes. Basic Git commands are simple, and IDEs or platforms like GitHub make it visual.

What is .gitignore and why is it important?

.gitignore tells Git which files or folders to skip, such as raw datasets or credentials.

How do I sync my work with the remote repository?

Use git pull to fetch and merge changes from the remote repository.

How do pull requests work?

Pull requests let you propose changes, get feedback, and merge code safely after review.

Can Git track changes in large datasets?

Git is not ideal for large binary files; use queries, scripts, or tools like DVC.

What is a merge conflict and how do I resolve it?

Conflicts happen when multiple changes clash; you resolve them manually or using IDE tools.

Can I use Git offline?

Yes. Git works locally, but pushing and pulling requires internet access.

How can AI help with Git for data people?

AI tools like nao can automate branching, committing, PR creation, and keep workflows consistent.

Should I use Git for personal projects or only at work?

Use it for any project; Git helps you track changes, experiment safely, and revert mistakes.

How do I undo changes that haven't been committed?

Use git checkout -- <file> to discard local changes in a file.

How do I set up a Git repository from scratch?

Use git init to start locally, then connect to a remote repo like GitHub.

Can Git track changes in Jupyter notebooks effectively?

Yes, but consider using tools like nbdime for better notebook diff visualization.

What is nao?

nao is a local AI development environment and agent for data teams. It connects to your warehouse, helps generate code, dbt models and documentation, and can automate Git workflows.

Can nao automate Git workflows for data projects?

Yes, nao can create branches, stage changes, commit, push, and even draft pull requests - all through simple AI commands.

Can I customize nao's AI for my company's data workflow?

Yes, nao can adapt to your code conventions and Git workflow by using .naorules to provide context-aware suggestions.

Can nao integrate Git with dbt workflows?

Yes, For example, you can ask nao to create a branch, update a dbt model, commit changes, and open a PR - all in one command.

Does nao support GitHub, GitLab, and Bitbucket?

Yes, nao works with all standard Git remote repositories. You just need to authenticate your account locally.

nao team