An end-to-end data analytics project demonstrating ETL pipeline design, data warehousing, and business intelligence using PostgreSQL and Power BI.
This project processes raw ecommerce datasets, transforms them into a structured analytical model, and delivers actionable business insights through an interactive Power BI dashboard.
It simulates a real-world data workflow β from raw data ingestion to business decision-making.
- Python (Pandas, SQLAlchemy)
- PostgreSQL (Data Warehouse)
- SQL (Transformations)
- Power BI (Visualization & Insights)
- Raw datasets loaded from CSV files:
- Customers
- Orders
- Payments
- Products
- Order Items
- Data cleaning & preprocessing:
- Handling missing values
- Data type corrections
- Removing inconsistencies
- Business logic applied:
- Revenue calculation
- Order aggregation
- Customer mapping
- Data loaded into PostgreSQL:
- Staging Tables (
stg_*) - Warehouse Tables
- Staging Tables (
fact_sales
dim_customersdim_productsdim_date
This structure enables efficient analytical queries and scalable reporting.
- Total Revenue
- Total Orders
- Average Order Value (AOV)
- Shipping Revenue
- Revenue shows a strong upward trend, indicating consistent business growth.
- A peak suggests a possible seasonal or promotional impact.
- A post-peak drop may indicate demand normalization or data irregularities.
- Orders increase significantly over time, reflecting strong customer acquisition.
- Growth aligns with revenue, confirming volume-driven expansion.
- Indicates increasing platform usage and transaction frequency.
- Rapid growth observed during early business phase (2016β2017).
- Sharp spike likely due to high-demand events or campaigns.
- Stabilization indicates transition to a mature business stage.
- AOV remains stable (~150β180), showing consistent pricing strategy.
- A sharp drop in early 2017 indicates a potential data anomaly.
- Stability confirms revenue growth is not driven by increased spending per order.
- A few product categories dominate overall revenue.
- Indicates dependency on high-performing categories.
- Lower-performing categories present optimization opportunities.
- π Revenue growth is primarily driven by increase in order volume
- β AOV remains stable β consistent pricing strategy
- π¦ Seasonal spikes suggest event-driven demand
- ποΈ Revenue concentration highlights key product categories
β οΈ Data anomaly detected in AOV β potential data quality issue
The business is scaling through customer acquisition and transaction growth, rather than increasing customer spend.
- Increase AOV via bundling and upselling
- Expand high-performing product categories
- Improve data validation processes
pip install -r requirements.txt
python scripts/load.py



