Skip to content

Module 2: Model & Data Versioning (The Time Machine)

📚 Module 2: Model & Data Versioning (DVC)

Course ID: OPS-602
Subject: The Data Time Machine

In software, we use Git for code. But in AI, Data is too big for Git. We use DVC.


🏗️ Step 1: DVC (The “Library Card”)

  1. The Library: A cloud bucket (S3, Drive) for your 100GB datasets.
  2. The Card: A tiny .dvc file in your code that points to the data.

🧩 The Analogy: The Restaurant Menu

Your menu (Code) doesn’t contain 1,000lb of flour. It says: “Go to the warehouse and get Flour v1.0.”


🥅 Module 2 Review

  1. DVC: Data Version Control.
  2. Reproducibility: Getting the same result tomorrow that you got today.
  3. Pointer: A tiny file that finds a giant one.

:::tip Slow Learner Note Remember: Code in Git. Data in DVC. They are partners! :::