Using Taskfile with uv and pyproject.toml to manage your Python Machine Learning projects

Using Taskfile with uv and pyproject.toml to manage your Python Machine Learning projects

"Use uv to manage your Python projects" is not a new take anymore, but have you tried Taskfile? 👀

I'm not even exaggerating when I say that I don’t want to work with Python without a tool like 'uv' again. (Creating virtual environments - with whatever Python version you might need - is SO painless and easy that this alone makes it worth it to me.) 🤩

BUT it’s still a command-line tool, and for many data scientists, that can be intimidating. Notebooks - whether it’s Jupyter or something managed like Databricks - are very visual and interactive. Typing obscure commands into a terminal and configuring a project with something called a pyproject.toml (what even is this file ending?? 😟) makes you feel lost all the time.

Long story short, let me suggest Taskfile. In theory, it's similar to Makefiles and can help you automate your build process for software projects. But I found that it can also be an excellent beginner cheat sheet 📋😊

1️⃣ You learn a new command somewhere and try it, for example, how to add a package with uv to the dev dependencies: uv add <package> --dev
2️⃣ If you're like me, you need to google this information at least 50 more times before you finally remember it for two weeks - only to forget it after the next long weekend.
2️⃣.5️⃣ Instead: Write it in your Taskfile.yml. You can add documentation and give it a nicer name, but most importantly, it's documented directly in your repo.
3️⃣ …
4️⃣ Profit. (You don’t need to google 50 times anymore, new teammates on your project already have a list of all the needed commands, and in the future you can explore combining tasks into full pipelines, task dependencies, and automations.)

Here's the documentation: https://taskfile.dev/

If you are using Homebrew (if you are not doing that yet, why??), you can get started simply by:

brew install go-task

# once installed, create your first Taskfile.yml by:
task --init

Here are some commands I put in my Taskfile to remember them or to not have to type them repeatedly (like the cluster-ID, which is fake for the purpose of this post):

# https://taskfile.dev

version: '3'


tasks:
  precommit:
    cmds:
      - pre-commit run --all-files
    silent: false

  start_cluster_data_eng:
    desc: "Starts Data Engineering Cluster on Databricks"
    cmds:
      - databricks clusters start --profile default 1597-824310-3qv9btxa

  list_clusters:
    desc: "Lists all pinned clusters (to exclude a never ending list of job clusters"
    cmds:
      - databricks clusters list --profile default --is-pinned

  fill_null_values:
    cmds:
      - uv run scripts/fill_null_values.py
    silent: false

  install_dev_package:
    desc: "Add dev dependency via uv"
    cmds:
      - uv add {{ .CLI_ARGS }} --optional dev
    silent: false

Disclaimer since I've never had an original thought in my life: I only started using Taskfile because Maria and Basak from MarvelousMLOps showed it in their MLOps course - and my colleague Vanessa convinced me to start using it in our current work repository afterward. (This is not sponsored, I just learned a lot from their course and good courses are sort of hard to find sometimes.)