Live DemoData Engineering
2025

Job Market Analytics Pipeline

A complete end-to-end data engineering pipeline: extract, process, and visualize job market data with Airflow, APIs, and Streamlit.

End-to-End Pipeline Overview

This project demonstrates how to build a fully automated data pipeline for job market analytics, from data extraction to interactive visualization. The pipeline is orchestrated with Airflow, retrieves data from the Adzuna API, stores it in a SQLite database, processes and transforms the data, and finally presents insights through a Streamlit dashboard.

The architecture is modular and production-ready: each step (API extraction, storage, transformation, analytics) is decoupled and can be extended or replaced. Airflow ensures reliability and automation, while Streamlit provides a modern, interactive UI for business users.

  • Automated, scheduled data collection (Airflow DAGs)
  • Robust data storage and transformation (SQLite, Python, SQL)
  • Real-time analytics and visualization (Streamlit, Plotly)
  • Easy to extend for new data sources or analytics

Tech Stack

PythonAirflowAPIsSQLiteStreamlitPlotlyDBT

Key Features

Real-time Data Extraction

Automated retrieval of job data from the Adzuna API using Airflow.

Database Storage

Store raw and processed data in a local SQLite database for reliability and easy querying.

Data Transformation

Clean, enrich, and aggregate job data using Python and SQL workflows.

Interactive Analytics

Visualize trends and insights with Streamlit and Plotly dashboards.

Orchestration with Airflow

Schedule and monitor the entire pipeline for full automation.

Key Insights Delivered

  • Salary trends by role and location
  • Top hiring companies analysis
  • Skills demand tracking
  • Geographic job distribution

Quick Start Tutorial: Build Your Own Automated Pipeline

  1. Clone the repository:
    git clone https://github.com/HumbledDS/job-market-pipeline
  2. Install dependencies:
    pip install -r requirements.txt
  3. Configure your API keys and settings (see config/ folder).
  4. Run the complete pipeline with Airflow:
    python scripts/run_complete_pipeline.py
    This script triggers the Airflow DAG to extract, load, and transform job data automatically.
  5. Launch the Streamlit dashboard:
    streamlit run dashboard/job_market_dashboard.py
    Explore salary trends, top companies, skills demand, and more in real time.
Tip: You can customize the pipeline by editing the Airflow DAGs, adding new data sources, or extending the dashboard with new visualizations.

Ready to Build Your Own Data Pipeline?

Explore the code, try the live demo, and use this project as a template for your own data engineering workflows!