Skip to content

Step 6: Build your first Pipeline

Step 6: Build your first Pipeline

The best way to prove you are a Data Engineer is to build a project that moves data from a source to a destination automatically.


πŸ—οΈ Project Spec: Weather Data Pipeline

1. The Source

Use the Open-Meteo API to fetch the current temperature of your city.

2. The Transformation

  • Convert the temperature from Celsius to Fahrenheit.
  • Add a β€œtimestamp” column of when the data was fetched.

3. The Destination

Load the data into a PostgreSQL database (Local or Cloud).

4. The Automation

Schedule the script to run every hour.


πŸ› οΈ Code Snippet: Loading to Postgres

import psycopg2
from sqlalchemy import create_engine

# Connection string
conn_str = "postgresql://user:password@localhost:5432/weather_db"
engine = create_engine(conn_str)

# Load Pandas DataFrame to SQL
df.to_sql('hourly_temp', engine, if_exists='append', index=False)

πŸ“ˆ The Professional Portfolio

To showcase this project:

  1. Dockerize it: Put your script and Postgres in a docker-compose.yml file.
  2. README: Explain the architecture and how to run it.
  3. Visualization: Connect Metabase or Tableau to your Postgres DB to show a chart of the temperature over time.

🎯 Congratulations!

You have completed the Data Engineering beginner roadmap. You are now ready for Advanced DE Foundations.