Step 6: Build your first Pipeline
Step 6: Build your first Pipeline
The best way to prove you are a Data Engineer is to build a project that moves data from a source to a destination automatically.
ποΈ Project Spec: Weather Data Pipeline
1. The Source
Use the Open-Meteo API to fetch the current temperature of your city.
2. The Transformation
- Convert the temperature from Celsius to Fahrenheit.
- Add a βtimestampβ column of when the data was fetched.
3. The Destination
Load the data into a PostgreSQL database (Local or Cloud).
4. The Automation
Schedule the script to run every hour.
π οΈ Code Snippet: Loading to Postgres
import psycopg2
from sqlalchemy import create_engine
# Connection string
conn_str = "postgresql://user:password@localhost:5432/weather_db"
engine = create_engine(conn_str)
# Load Pandas DataFrame to SQL
df.to_sql('hourly_temp', engine, if_exists='append', index=False)π The Professional Portfolio
To showcase this project:
- Dockerize it: Put your script and Postgres in a
docker-compose.ymlfile. - README: Explain the architecture and how to run it.
- Visualization: Connect Metabase or Tableau to your Postgres DB to show a chart of the temperature over time.
π― Congratulations!
You have completed the Data Engineering beginner roadmap. You are now ready for Advanced DE Foundations.