diff --git a/Text Summarizer using DL/Model/summarizer_models.ipynb b/Text Summarizer using DL/Model/summarizer_models.ipynb new file mode 100644 index 000000000..c19f5972f --- /dev/null +++ b/Text Summarizer using DL/Model/summarizer_models.ipynb @@ -0,0 +1,316 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "40c8fdcd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Defaulting to user installation because normal site-packages is not writeable\n", + "Requirement already satisfied: transformers in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (5.10.2)\n", + "Requirement already satisfied: torch in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (2.12.0)\n", + "Requirement already satisfied: huggingface-hub<2.0,>=1.5.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (1.18.0)\n", + "Requirement already satisfied: numpy>=1.17 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (2.4.6)\n", + "Requirement already satisfied: packaging>=20.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (26.1)\n", + "Requirement already satisfied: pyyaml>=5.1 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (6.0.3)\n", + "Requirement already satisfied: regex>=2025.10.22 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (2026.5.9)\n", + "Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (0.22.2)\n", + "Requirement already satisfied: typer in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (0.25.1)\n", + "Requirement already satisfied: safetensors>=0.4.3 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (0.7.0)\n", + "Requirement already satisfied: tqdm>=4.27 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from transformers) (4.68.1)\n", + "Requirement already satisfied: click>=8.4.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from huggingface-hub<2.0,>=1.5.0->transformers) (8.4.1)\n", + "Requirement already satisfied: filelock>=3.10.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from huggingface-hub<2.0,>=1.5.0->transformers) (3.29.1)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from huggingface-hub<2.0,>=1.5.0->transformers) (2026.4.0)\n", + "Requirement already satisfied: hf-xet<2.0.0,>=1.4.3 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from huggingface-hub<2.0,>=1.5.0->transformers) (1.5.0)\n", + "Requirement already satisfied: httpx<1,>=0.23.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from huggingface-hub<2.0,>=1.5.0->transformers) (0.28.1)\n", + "Requirement already satisfied: typing-extensions>=4.1.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from huggingface-hub<2.0,>=1.5.0->transformers) (4.15.0)\n", + "Requirement already satisfied: anyio in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from httpx<1,>=0.23.0->huggingface-hub<2.0,>=1.5.0->transformers) (4.13.0)\n", + "Requirement already satisfied: certifi in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from httpx<1,>=0.23.0->huggingface-hub<2.0,>=1.5.0->transformers) (2026.4.22)\n", + "Requirement already satisfied: httpcore==1.* in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from httpx<1,>=0.23.0->huggingface-hub<2.0,>=1.5.0->transformers) (1.0.9)\n", + "Requirement already satisfied: idna in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from httpx<1,>=0.23.0->huggingface-hub<2.0,>=1.5.0->transformers) (3.13)\n", + "Requirement already satisfied: h11>=0.16 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from httpcore==1.*->httpx<1,>=0.23.0->huggingface-hub<2.0,>=1.5.0->transformers) (0.16.0)\n", + "Requirement already satisfied: shellingham>=1.3.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from typer->transformers) (1.5.4)\n", + "Requirement already satisfied: rich>=13.8.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from typer->transformers) (15.0.0)\n", + "Requirement already satisfied: annotated-doc>=0.0.2 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from typer->transformers) (0.0.4)\n", + "Requirement already satisfied: setuptools<82 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from torch) (81.0.0)\n", + "Requirement already satisfied: sympy>=1.13.3 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from torch) (1.14.0)\n", + "Requirement already satisfied: networkx>=2.5.1 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from torch) (3.6.1)\n", + "Requirement already satisfied: jinja2 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from torch) (3.1.6)\n", + "Requirement already satisfied: colorama in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from click>=8.4.0->huggingface-hub<2.0,>=1.5.0->transformers) (0.4.6)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from rich>=13.8.0->typer->transformers) (4.2.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from rich>=13.8.0->typer->transformers) (2.20.0)\n", + "Requirement already satisfied: mdurl~=0.1 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from markdown-it-py>=2.2.0->rich>=13.8.0->typer->transformers) (0.1.2)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from sympy>=1.13.3->torch) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in C:\\Users\\piyus\\AppData\\Roaming\\Python\\Python314\\site-packages (from jinja2->torch) (3.0.3)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 26.0.1 -> 26.1.2\n", + "[notice] To update, run: C:\\Python314\\python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "%pip install transformers torch" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "349d13bc", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "c:\\Users\\piyus\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Original Text Length: 114 words\n", + "\n", + "--- Loading T5 Model ---\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Device set to use cpu\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "[T5 Summary]:\n", + "leading AI textbooks define the field as the study of \"intelligent agents\" some popular accounts use the term \"artificial intelligence\" to describe machines that mimic \"cognitive\" functions that humans associate with the human mind \n", + "--------------------------------------------------\n", + "--- Loading BART Model ---\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Device set to use cpu\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "[BART Summary]:\n", + "Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. Some popular accounts use the term \"artificial intelligence\" to describe machines that mimic \"cognitive\" functions that humans\n", + "--------------------------------------------------\n" + ] + } + ], + "source": [ + "from transformers import pipeline\n", + "\n", + "# Ek dummy text jisko hum summarize karenge\n", + "sample_text = \"\"\"\n", + "Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. Leading AI textbooks define the field as the study of \"intelligent agents\": any system that perceives its environment and takes actions that maximize its chance of achieving its goals. Some popular accounts use the term \"artificial intelligence\" to describe machines that mimic \"cognitive\" functions that humans associate with the human mind, such as \"learning\" and \"problem solving\". As machines become increasingly capable, tasks considered to require \"intelligence\" are often removed from the definition of AI, a phenomenon known as the AI effect. A quip in Tesler's Theorem says \"AI is whatever hasn't been done yet.\"\n", + "\"\"\"\n", + "\n", + "print(\"Original Text Length:\", len(sample_text.split()), \"words\\n\")\n", + "\n", + "# ==========================================\n", + "# 1. T5 (Text-to-Text Transfer Transformer)\n", + "# ==========================================\n", + "print(\"--- Loading T5 Model ---\")\n", + "# framework=\"pt\" forces the pipeline to use PyTorch\n", + "t5_summarizer = pipeline(\"summarization\", model=\"t5-small\", framework=\"pt\") \n", + "\n", + "t5_summary = t5_summarizer(sample_text, max_length=50, min_length=15, do_sample=False)\n", + "print(\"\\n[T5 Summary]:\")\n", + "print(t5_summary[0]['summary_text'])\n", + "print(\"-\" * 50)\n", + "\n", + "\n", + "# ==========================================\n", + "# 2. BART (Bidirectional and Auto-Regressive Transformers)\n", + "# ==========================================\n", + "print(\"--- Loading BART Model ---\")\n", + "bart_summarizer = pipeline(\"summarization\", model=\"facebook/bart-large-cnn\", framework=\"pt\")\n", + "\n", + "bart_summary = bart_summarizer(sample_text, max_length=50, min_length=15, do_sample=False)\n", + "print(\"\\n[BART Summary]:\")\n", + "print(bart_summary[0]['summary_text'])\n", + "print(\"-\" * 50)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "c008092d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: sumy in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (0.12.0)\n", + "Requirement already satisfied: nltk in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (3.9.1)\n", + "Requirement already satisfied: breadability>=0.1.20 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from sumy) (0.1.20)\n", + "Requirement already satisfied: docopt-ng>=0.6.1 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from sumy) (0.9.0)\n", + "Requirement already satisfied: lxml-html-clean in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from sumy) (0.4.5)\n", + "Requirement already satisfied: pycountry>=18.2.23 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from sumy) (26.2.16)\n", + "Requirement already satisfied: requests>=2.7.0 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from sumy) (2.32.5)\n", + "Requirement already satisfied: setuptools>=65.0.0 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from sumy) (80.9.0)\n", + "Requirement already satisfied: click in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from nltk) (8.1.8)\n", + "Requirement already satisfied: joblib in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from nltk) (1.4.2)\n", + "Requirement already satisfied: regex>=2021.8.3 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from nltk) (2024.11.6)\n", + "Requirement already satisfied: tqdm in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from nltk) (4.67.1)\n", + "Requirement already satisfied: docopt<0.7,>=0.6.1 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from breadability>=0.1.20->sumy) (0.6.2)\n", + "Requirement already satisfied: chardet in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from breadability>=0.1.20->sumy) (5.2.0)\n", + "Requirement already satisfied: lxml>=2.0 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from breadability>=0.1.20->sumy) (6.1.1)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from requests>=2.7.0->sumy) (3.4.1)\n", + "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from requests>=2.7.0->sumy) (2.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from requests>=2.7.0->sumy) (2.3.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from requests>=2.7.0->sumy) (2025.1.31)\n", + "Requirement already satisfied: colorama in c:\\users\\piyus\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from click->nltk) (0.4.6)\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING: Ignoring invalid distribution ~orch (c:\\Users\\piyus\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages)\n", + "WARNING: Ignoring invalid distribution ~orch (c:\\Users\\piyus\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages)\n", + "WARNING: Ignoring invalid distribution ~orch (c:\\Users\\piyus\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages)\n", + "\n", + "[notice] A new release of pip is available: 25.1.1 -> 26.1.2\n", + "[notice] To update, run: python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "%pip install sumy nltk" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "dd404b4b", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package punkt to\n", + "[nltk_data] C:\\Users\\piyus\\AppData\\Roaming\\nltk_data...\n", + "[nltk_data] Package punkt is already up-to-date!\n", + "[nltk_data] Downloading package punkt_tab to\n", + "[nltk_data] C:\\Users\\piyus\\AppData\\Roaming\\nltk_data...\n", + "[nltk_data] Package punkt_tab is already up-to-date!\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--- Running TextRank Model ---\n", + "\n", + "[TextRank Summary]:\n", + "Leading AI textbooks define the field as the study of \"intelligent agents\": any system that perceives its environment and takes actions that maximize its chance of achieving its goals.\n", + "As machines become increasingly capable, tasks considered to require \"intelligence\" are often removed from the definition of AI, a phenomenon known as the AI effect.\n", + "--------------------------------------------------\n", + "--- Running LSA Model ---\n", + "\n", + "[LSA Summary]:\n", + "Some popular accounts use the term \"artificial intelligence\" to describe machines that mimic \"cognitive\" functions that humans associate with the human mind, such as \"learning\" and \"problem solving\".\n", + "A quip in Tesler's Theorem says \"AI is whatever hasn't been done yet.\"\n", + "--------------------------------------------------\n" + ] + } + ], + "source": [ + "import nltk\n", + "# NLTK ka tokenizer download karna zaroori hai text ko sentences mein todne ke liye\n", + "nltk.download('punkt')\n", + "nltk.download('punkt_tab')\n", + "\n", + "from sumy.parsers.plaintext import PlaintextParser\n", + "from sumy.nlp.tokenizers import Tokenizer\n", + "from sumy.summarizers.text_rank import TextRankSummarizer\n", + "from sumy.summarizers.lsa import LsaSummarizer\n", + "\n", + "# Same purana sample text use kar rahe hain\n", + "parser = PlaintextParser.from_string(sample_text, Tokenizer(\"english\"))\n", + "sentences_count = 2 # Humein summary mein exactly 2 important sentences chahiye\n", + "\n", + "# ==========================================\n", + "# 3. TextRank Summarizer (Based on Google's PageRank algorithm)\n", + "# ==========================================\n", + "print(\"--- Running TextRank Model ---\")\n", + "tr_summarizer = TextRankSummarizer()\n", + "tr_summary = tr_summarizer(parser.document, sentences_count)\n", + "\n", + "print(\"\\n[TextRank Summary]:\")\n", + "for sentence in tr_summary:\n", + " print(sentence)\n", + "print(\"-\" * 50)\n", + "\n", + "# ==========================================\n", + "# 4. LSA Summarizer (Latent Semantic Analysis)\n", + "# ==========================================\n", + "print(\"--- Running LSA Model ---\")\n", + "lsa_summarizer = LsaSummarizer()\n", + "lsa_summary = lsa_summarizer(parser.document, sentences_count)\n", + "\n", + "print(\"\\n[LSA Summary]:\")\n", + "for sentence in lsa_summary:\n", + " print(sentence)\n", + "print(\"-\" * 50)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd202355", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Text Summarizer using DL/readme.md b/Text Summarizer using DL/readme.md new file mode 100644 index 000000000..3bdd00b4b --- /dev/null +++ b/Text Summarizer using DL/readme.md @@ -0,0 +1,26 @@ +# Text Summarizer using Deep Learning + +This project implements and compares two primary approaches to Text Summarization: **Abstractive Summarization** (using Deep Learning Transformers) and **Extractive Summarization** (using Statistical Natural Language Processing). + +The system evaluation is performed on a standard text sample using 4 distinct models to highlight the differences in semantic understanding, sentence structure, and formulation. + +## 🛠️ Models Implemented + +### 1. Abstractive Summarization (Hugging Face Transformers) +Abstractive models understand the underlying context of the text and generate novel sentences to formulate the summary, mimicking human-like generation. +* **T5 (Text-to-Text Transfer Transformer):** Evaluated using `t5-small`. Highly efficient for standard structural summaries. +* **BART (Bidirectional and Auto-Regressive Transformers):** Evaluated using `facebook/bart-large-cnn`. Specifically fine-tuned on news architectures, delivering natural, fluid, and coherent summaries. + +### 2. Extractive Summarization (`sumy` library) +Extractive models evaluate the existing text and score sentences based on structural algorithms, pulling out the highest-ranked sentences verbatim without altering words. +* **TextRank:** Graph-based ranking algorithm inspired by Google's PageRank, evaluating sentence importance based on word overlaps and connections. +* **LSA (Latent Semantic Analysis):** Algebraic/statistical method that applies Singular Value Decomposition (SVD) to capture hidden semantic patterns across sentences. + +--- + +## 🚀 How to Setup and Run + +### Step 1: Install Dependencies +Ensure you have the required Python packages installed in your active environment: +```bash +pip install -r requirements.txt \ No newline at end of file diff --git a/Text Summarizer using DL/requirements.txt b/Text Summarizer using DL/requirements.txt new file mode 100644 index 000000000..6df66655f --- /dev/null +++ b/Text Summarizer using DL/requirements.txt @@ -0,0 +1,5 @@ +transformers +torch +sumy +nltk +tf-keras \ No newline at end of file