Module 2: Model & Data Versioning (The Time Machine)
📚 Module 2: Model & Data Versioning (DVC)
Course ID: OPS-602
Subject: The Data Time Machine
In software, we use Git for code. But in AI, Data is too big for Git. We use DVC.
🏗️ Step 1: DVC (The “Library Card”)
- The Library: A cloud bucket (S3, Drive) for your 100GB datasets.
- The Card: A tiny
.dvcfile in your code that points to the data.
🧩 The Analogy: The Restaurant Menu
Your menu (Code) doesn’t contain 1,000lb of flour. It says: “Go to the warehouse and get Flour v1.0.”
🥅 Module 2 Review
- DVC: Data Version Control.
- Reproducibility: Getting the same result tomorrow that you got today.
- Pointer: A tiny file that finds a giant one.
:::tip Slow Learner Note Remember: Code in Git. Data in DVC. They are partners! :::