abhisheks008 · Naveen-Boddepalli · May 30, 2026 · May 30, 2026 · May 30, 2026 · May 30, 2026
diff --git a/Real-Estate-Price-Prediction/Dataset/Bengaluru_House_Data.csv b/Real-Estate-Price-Prediction/Dataset/Bengaluru_House_Data.csv
diff --git a/Real-Estate-Price-Prediction/Dataset/README.md b/Real-Estate-Price-Prediction/Dataset/README.md
@@ -0,0 +1,32 @@
+# Dataset: Bengaluru House Price Data
+
+## Source
+[Kaggle - Bengaluru House Price Data](https://www.kaggle.com/datasets/amitabhajoy/bengaluru-house-price-data)
+
+## Description
+This dataset contains real estate listing information from Bengaluru, India, with features that influence housing prices.
+
+## Columns
+| Column | Description |
+|--------|-------------|
+| `area_type` | Type of area (Super built-up, Built-up, Plot, Carpet) |
+| `availability` | Possession status (Ready to Move / specific date) |
+| `location` | Locality in Bengaluru |
+| `size` | Number of BHK/Bedrooms (e.g., "2 BHK", "3 Bedroom") |
+| `society` | Name of the housing society (if applicable) |
+| `total_sqft` | Total square footage of the property |
+| `bath` | Number of bathrooms |
+| `balcony` | Number of balconies |
+| `price` | Price in Lakhs (INR) |
+
+## Preprocessing Notes
+- `size` column needs parsing to extract numeric BHK count
+- `total_sqft` may contain ranges (e.g., "2000-2500") — use midpoint
+- Missing values present in `bath`, `balcony`, `society`
+- Outliers in `price_per_sqft` removed using std deviation logic per location
+- `location` has high cardinality — rare locations grouped as "other" before OHE
+
+## Download Instructions
+1. Visit the [Kaggle dataset page](https://www.kaggle.com/datasets/amitabhajoy/bengaluru-house-price-data)
+2. Download `Bengaluru_House_Data.csv`
+3. Place it in this `Dataset/` folder
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.48.11_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.48.11_PM.png
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.48.40_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.48.40_PM.png
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.48.56_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.48.56_PM.png
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.49.38_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.49.38_PM.png
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.50.06_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.50.06_PM.png
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.53.18_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.53.18_PM.png
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.53.45_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.53.45_PM.png
diff --git a/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.54.02_PM.png b/Real-Estate-Price-Prediction/Images/Screenshot_2026-05-30_at_5.54.02_PM.png
diff --git a/Real-Estate-Price-Prediction/Model/README.md b/Real-Estate-Price-Prediction/Model/README.md
diff --git a/Real-Estate-Price-Prediction/Model/Real_Estate_Price_Prediction.ipynb b/Real-Estate-Price-Prediction/Model/Real_Estate_Price_Prediction.ipynb
@@ -0,0 +1,229 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cell-01",
+   "metadata": {},
+   "source": "# Real Estate Price Prediction\n**Dataset:** Bengaluru House Price Data (Kaggle)  \n**Author:** Naveen-Boddepalli  \n**Event:** GSSoC 2026"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-02",
+   "metadata": {},
+   "source": "## 1. Imports & Setup"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-03",
+   "metadata": {},
+   "outputs": [],
+   "source": "import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport warnings\nwarnings.filterwarnings('ignore')\n\n# ML\nfrom sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV\nfrom sklearn.linear_model import LinearRegression, Lasso\nfrom sklearn.ensemble import RandomForestRegressor\nfrom sklearn.preprocessing import StandardScaler, OneHotEncoder\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error\nfrom xgboost import XGBRegressor\nimport joblib\n\n# DL\nimport torch\nimport torch.nn as nn\nfrom torch.utils.data import DataLoader, TensorDataset\n\nprint(\"All imports successful!\")"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-04",
+   "metadata": {},
+   "source": "## 2. Load Dataset"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-05",
+   "metadata": {},
+   "outputs": [],
+   "source": "df = pd.read_csv('../Dataset/Bengaluru_House_Data.csv')\nprint(f\"Shape: {df.shape}\")\ndf.head()"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-06",
+   "metadata": {},
+   "outputs": [],
+   "source": "print(df.info())\nprint(\"\\nMissing values:\")\nprint(df.isnull().sum())"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-07",
+   "metadata": {},
+   "source": "## 3. Data Preprocessing"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-08",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Drop rows where location or size is null (very few)\ndf.dropna(subset=['location', 'size'], inplace=True)\n\n# Fill missing bath with median\ndf['bath'] = df['bath'].fillna(df['bath'].median())\ndf['balcony'] = df['balcony'].fillna(df['balcony'].median())\n\n# Drop society (too many nulls, low signal)\ndf.drop(columns=['society'], inplace=True)\n\nprint(\"After basic cleaning:\", df.shape)"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-09",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Parse 'size' -> BHK count\ndef parse_bhk(size):\n    try:\n        return int(str(size).split()[0])\n    except:\n        return np.nan\n\ndf['bhk'] = df['size'].apply(parse_bhk)\ndf.dropna(subset=['bhk'], inplace=True)\ndf['bhk'] = df['bhk'].astype(int)\ndf.drop(columns=['size'], inplace=True)\nprint(df['bhk'].value_counts().head(10))"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-10",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Parse 'total_sqft' (handles ranges like \"2000-2500\")\ndef parse_sqft(sqft):\n    try:\n        if '-' in str(sqft):\n            parts = sqft.split('-')\n            return (float(parts[0]) + float(parts[1])) / 2\n        return float(sqft)\n    except:\n        return np.nan\n\ndf['total_sqft'] = df['total_sqft'].apply(parse_sqft)\ndf.dropna(subset=['total_sqft'], inplace=True)\nprint(f\"Shape after sqft parsing: {df.shape}\")"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-11",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Engineer price_per_sqft\ndf['price_per_sqft'] = df['price'] * 1e5 / df['total_sqft']\n\n# Group rare locations\nlocation_counts = df['location'].value_counts()\ndf['location'] = df['location'].apply(\n    lambda x: x.strip() if location_counts[x.strip()] >= 10 else 'other'\n)\nprint(f\"Unique locations after grouping: {df['location'].nunique()}\")"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-12",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Remove outliers: price_per_sqft outside mean +/- std per location\ndef remove_pps_outliers(df):\n    df_out = pd.DataFrame()\n    for loc, subdf in df.groupby('location'):\n        m = subdf['price_per_sqft'].mean()\n        s = subdf['price_per_sqft'].std()\n        reduced = subdf[(subdf['price_per_sqft'] > (m - s)) & (subdf['price_per_sqft'] < (m + s))]\n        df_out = pd.concat([df_out, reduced], ignore_index=True)\n    return df_out\n\ndf = remove_pps_outliers(df)\n\n# Remove extreme bath counts (bath > bhk+2 is suspicious)\ndf = df[df['bath'] <= df['bhk'] + 2]\n\nprint(f\"Shape after outlier removal: {df.shape}\")"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-13",
+   "metadata": {},
+   "source": "## 4. Exploratory Data Analysis"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-14",
+   "metadata": {},
+   "outputs": [],
+   "source": "fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n\n# Price distribution\naxes[0].hist(df['price'], bins=50, color='steelblue', edgecolor='white')\naxes[0].set_title('Price Distribution (Lakhs)', fontsize=13)\naxes[0].set_xlabel('Price'); axes[0].set_ylabel('Count')\n\n# Sqft vs Price scatter\naxes[1].scatter(df['total_sqft'], df['price'], alpha=0.3, color='coral', s=5)\naxes[1].set_title('Total Sqft vs Price', fontsize=13)\naxes[1].set_xlabel('Total Sqft'); axes[1].set_ylabel('Price (Lakhs)')\n\n# BHK distribution\ndf['bhk'].value_counts().sort_index().plot(kind='bar', ax=axes[2], color='mediumseagreen')\naxes[2].set_title('BHK Distribution', fontsize=13)\naxes[2].set_xlabel('BHK'); axes[2].set_ylabel('Count')\n\nplt.tight_layout()\nplt.savefig('../Images/eda_overview.png', dpi=150, bbox_inches='tight')\nplt.show()"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-15",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Correlation heatmap\nplt.figure(figsize=(8, 6))\nnumeric_cols = ['total_sqft', 'bath', 'balcony', 'bhk', 'price_per_sqft', 'price']\ncorr = df[numeric_cols].corr()\nsns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', square=True)\nplt.title('Correlation Heatmap', fontsize=14)\nplt.tight_layout()\nplt.savefig('../Images/correlation_heatmap.png', dpi=150, bbox_inches='tight')\nplt.show()"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-16",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Top 10 locations by median price\ntop10 = df.groupby('location')['price'].median().sort_values(ascending=False).head(10)\nplt.figure(figsize=(12, 5))\ntop10.plot(kind='bar', color='orchid', edgecolor='white')\nplt.title('Top 10 Locations by Median Price (Lakhs)', fontsize=14)\nplt.xticks(rotation=45, ha='right')\nplt.ylabel('Median Price (Lakhs)')\nplt.tight_layout()\nplt.savefig('../Images/top_locations.png', dpi=150, bbox_inches='tight')\nplt.show()"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-17",
+   "metadata": {},
+   "source": "## 5. Feature Engineering & Train/Test Split"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-18",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Drop derived feature before modeling (avoid leakage)\ndf.drop(columns=['price_per_sqft'], inplace=True)\n\nX = df.drop(columns=['price'])\ny = df['price']\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\nprint(f\"Train: {X_train.shape}, Test: {X_test.shape}\")"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-19",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Preprocessing pipeline\ncategorical_features = ['area_type', 'availability', 'location']\nnumerical_features = ['total_sqft', 'bath', 'balcony', 'bhk']\n\npreprocessor = ColumnTransformer(transformers=[\n    ('num', StandardScaler(), numerical_features),\n    ('cat', OneHotEncoder(handle_unknown='ignore', sparse_output=False), categorical_features)\n])"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-20",
+   "metadata": {},
+   "source": "## 6. ML Models"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-21",
+   "metadata": {},
+   "outputs": [],
+   "source": "results = {}\n\ndef evaluate(name, y_true, y_pred):\n    r2 = r2_score(y_true, y_pred)\n    mae = mean_absolute_error(y_true, y_pred)\n    rmse = np.sqrt(mean_squared_error(y_true, y_pred))\n    results[name] = {'R2': round(r2, 4), 'MAE': round(mae, 2), 'RMSE': round(rmse, 2)}\n    print(f\"{name}: R2={r2:.4f} | MAE={mae:.2f} | RMSE={rmse:.2f}\")\n\n# Linear Regression\nlr_pipe = Pipeline([('pre', preprocessor), ('model', LinearRegression())])\nlr_pipe.fit(X_train, y_train)\nevaluate('Linear Regression', y_test, lr_pipe.predict(X_test))\n\n# Lasso\nlasso_pipe = Pipeline([('pre', preprocessor), ('model', Lasso(alpha=1.0))])\nlasso_pipe.fit(X_train, y_train)\nevaluate('Lasso', y_test, lasso_pipe.predict(X_test))\n\n# Random Forest\nrf_pipe = Pipeline([('pre', preprocessor), ('model', RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1))])\nrf_pipe.fit(X_train, y_train)\nevaluate('Random Forest', y_test, rf_pipe.predict(X_test))\n\n# XGBoost\nxgb_pipe = Pipeline([('pre', preprocessor), ('model', XGBRegressor(n_estimators=300, learning_rate=0.05, max_depth=6, random_state=42, n_jobs=-1))])\nxgb_pipe.fit(X_train, y_train)\nevaluate('XGBoost', y_test, xgb_pipe.predict(X_test))"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-22",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Save best ML model\nbest_ml_model = xgb_pipe  # Update if another model wins\njoblib.dump(best_ml_model, 'real_estate_best_model.pkl')\nprint(\"Saved: real_estate_best_model.pkl\")"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-23",
+   "metadata": {},
+   "source": "## 7. Deep Learning Models (PyTorch)"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-24",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Preprocess for DL: use the same preprocessor fitted on ML data\nX_train_proc = preprocessor.fit_transform(X_train)\nX_test_proc = preprocessor.transform(X_test)\n\nX_train_t = torch.FloatTensor(X_train_proc)\ny_train_t = torch.FloatTensor(y_train.values).unsqueeze(1)\nX_test_t  = torch.FloatTensor(X_test_proc)\ny_test_t  = torch.FloatTensor(y_test.values).unsqueeze(1)\n\ninput_dim = X_train_t.shape[1]\nprint(f\"Input dimension: {input_dim}\")\n\ntrain_ds = TensorDataset(X_train_t, y_train_t)\ntrain_dl = DataLoader(train_ds, batch_size=64, shuffle=True)"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-25",
+   "metadata": {},
+   "outputs": [],
+   "source": "# \u2500\u2500 MLP \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nclass MLP(nn.Module):\n    def __init__(self, input_dim):\n        super().__init__()\n        self.net = nn.Sequential(\n            nn.Linear(input_dim, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(0.3),\n            nn.Linear(256, 128),       nn.BatchNorm1d(128), nn.ReLU(), nn.Dropout(0.2),\n            nn.Linear(128, 64),        nn.ReLU(),\n            nn.Linear(64, 1)\n        )\n    def forward(self, x): return self.net(x)\n\n# \u2500\u2500 Wide & Deep \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nclass WideDeep(nn.Module):\n    def __init__(self, input_dim):\n        super().__init__()\n        self.wide = nn.Linear(input_dim, 1)\n        self.deep = nn.Sequential(\n            nn.Linear(input_dim, 256), nn.ReLU(), nn.Dropout(0.3),\n            nn.Linear(256, 128),       nn.ReLU(), nn.Dropout(0.2),\n            nn.Linear(128, 1)\n        )\n    def forward(self, x):\n        return self.wide(x) + self.deep(x)\n\ndef train_model(model, epochs=50, lr=1e-3):\n    opt = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-5)\n    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(opt, patience=5, factor=0.5)\n    loss_fn = nn.MSELoss()\n    model.train()\n    for epoch in range(epochs):\n        total = 0\n        for xb, yb in train_dl:\n            opt.zero_grad()\n            loss = loss_fn(model(xb), yb)\n            loss.backward()\n            opt.step()\n            total += loss.item()\n        avg = total / len(train_dl)\n        scheduler.step(avg)\n        if (epoch+1) % 10 == 0:\n            print(f\"  Epoch {epoch+1}/{epochs} \u2014 loss: {avg:.4f}\")\n    return model\n\ndef eval_dl(name, model):\n    model.eval()\n    with torch.no_grad():\n        preds = model(X_test_t).squeeze().numpy()\n    evaluate(name, y_test.values, preds)"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-26",
+   "metadata": {},
+   "outputs": [],
+   "source": "print(\"Training MLP...\")\nmlp = MLP(input_dim)\nmlp = train_model(mlp, epochs=50)\neval_dl(\"MLP\", mlp)\n\nprint(\"\\nTraining Wide & Deep...\")\nwd = WideDeep(input_dim)\nwd = train_model(wd, epochs=50)\neval_dl(\"Wide & Deep\", wd)"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-27",
+   "metadata": {},
+   "source": "## 8. Model Comparison"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-28",
+   "metadata": {},
+   "outputs": [],
+   "source": "results_df = pd.DataFrame(results).T.sort_values('R2', ascending=False)\nprint(results_df.to_string())\n\nfig, axes = plt.subplots(1, 3, figsize=(16, 5))\nfor ax, metric in zip(axes, ['R2', 'MAE', 'RMSE']):\n    colors = ['steelblue' if i > 0 else 'coral' for i in range(len(results_df))]\n    results_df[metric].plot(kind='bar', ax=ax, color='steelblue', edgecolor='white')\n    ax.set_title(f'{metric} by Model', fontsize=13)\n    ax.set_xticklabels(results_df.index, rotation=30, ha='right', fontsize=9)\n    ax.set_ylabel(metric)\n\nplt.tight_layout()\nplt.savefig('../Images/model_comparison.png', dpi=150, bbox_inches='tight')\nplt.show()"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-29",
+   "metadata": {},
+   "source": "## 9. Conclusions\n\n> *(Update after running \u2014 best model, key insights, feature importance)*\n\n- **Best model:** TBD\n- **Key features:** location, total_sqft, bhk\n- **Suggestions:** TabNet and XGBoost likely top performers on this tabular dataset"
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/Real-Estate-Price-Prediction/README.md b/Real-Estate-Price-Prediction/README.md
@@ -0,0 +1,93 @@
+# Real Estate Price Prediction — Model Documentation
+
+## Objective
+Predict real estate prices in Bengaluru using ML and Deep Learning models on the Bengaluru House Price dataset.
+
+---
+
+## Approach
+
+### 1. Data Preprocessing
+- Handle missing values in `bath`, `balcony`, `society`
+- Parse `size`: extract numeric BHK (e.g., "2 BHK" → 2)
+- Handle `total_sqft` ranges (e.g., "2000–2500" → 2250)
+- Engineer `price_per_sqft` feature
+- Remove outliers using mean ± std deviation per location group
+- Group rare locations (< 10 listings) as "other"
+- One-Hot Encode `location` and `area_type`
+
+### 2. Exploratory Data Analysis (EDA)
+- Price distribution across top locations (bar/violin plots)
+- Correlation heatmap of numerical features
+- Scatter: `total_sqft` vs `price`
+- Box plots for outlier visualization
+
+### 3. Models Implemented
+
+#### Machine Learning Baseline
+| Model | Notes |
+|-------|-------|
+| Linear Regression | Baseline |
+| Lasso Regression | L1 regularization; feature selection |
+| Random Forest | Ensemble; handles non-linearity |
+| XGBoost | Gradient boosting; top ML performer |
+
+Tuning: GridSearchCV with 5-Fold Cross Validation
+
+#### Deep Learning Models
+| Model | Notes |
+|-------|-------|
+| MLP (Feedforward NN) | ReLU + Dropout + BatchNorm layers |
+| Wide & Deep Network | Linear memorization + deep generalization (Google) |
+| TabNet | Attention-based; interpretable; great for tabular data |
+| Embedding-based DNN | Learned entity embeddings for `location` + DNN |
+
+Stack: TensorFlow/Keras or PyTorch
+
+### 4. Evaluation Metrics
+| Metric | Description |
+|--------|-------------|
+| R² Score | Proportion of variance explained |
+| MAE | Mean Absolute Error (Lakhs INR) |
+| RMSE | Root Mean Squared Error (Lakhs INR) |
+| Cross-val Score | 5-fold CV for reliability |
+
+---
+
+## Results
+
+> *(Update after running the notebook)*
+
+| Model | R² | MAE | RMSE |
+|-------|----|-----|------|
+| Linear Regression | — | — | — |
+| Lasso Regression | — | — | — |
+| Random Forest | — | — | — |
+| XGBoost | — | — | — |
+| MLP | — | — | — |
+| Wide & Deep | — | — | — |
+| TabNet | — | — | — |
+| Embedding DNN | — | — | — |
+
+---
+
+## Visualizations
+
+> *(Add saved plots from the Images/ folder)*
+
+- `price_distribution.png` — Price spread across locations
+- `correlation_heatmap.png` — Feature correlation matrix
+- `sqft_vs_price.png` — Scatter plot
+- `model_comparison.png` — R² / RMSE comparison bar chart
+
+---
+
+## Conclusions
+
+> *(Fill in after training)*
+
+---
+
+## Saved Artifacts
+- `real_estate_best_model.pkl` — Best ML model (joblib)
+- `real_estate_dl_model.h5` — Best DL model (Keras) or `.pt` (PyTorch)
diff --git a/Real-Estate-Price-Prediction/requirements.txt b/Real-Estate-Price-Prediction/requirements.txt
@@ -0,0 +1,28 @@
+# Core
+numpy==1.26.4
+pandas==2.2.2
+scikit-learn==1.4.2
+
+# ML Models
+xgboost==2.0.3
+
+# Deep Learning
+torch==2.3.0
+torchvision==0.18.0
+tensorflow==2.16.1
+pytorch-tabnet==4.1.0
+
+# Visualization
+matplotlib==3.9.0
+seaborn==0.13.2
+plotly==5.22.0
+
+# Notebook
+jupyter==1.0.0
+ipykernel==6.29.4
+
+# Model persistence
+joblib==1.4.2
+
+# Utilities
+tqdm==4.66.4