Name Entity Recognition (NER) Project
AIM
To develop a system that identifies and classifies named entities (such as persons, organizations, locations, dates, etc.) in text using Named Entity Recognition (NER) with SpaCy.
DATASET LINK
N/A (This project uses text input for NER analysis, not a specific dataset) - It uses real time data as input .
NOTEBOOK LINK
https://colab.research.google.com/drive/1pBIEFA4a9LzyZKUFQMCypQ22M6bDbXM3?usp=sharing
LIBRARIES NEEDED
- SpaCy
DESCRIPTION
What is the requirement of the project?
- Named Entity Recognition (NER) is essential to automatically extract and classify key entities from text, such as persons, organizations, locations, and more.
- This helps in analyzing and organizing data efficiently, enabling various NLP applications like document analysis and information retrieval.
Why is it necessary?
- NER is used for understanding and structuring unstructured text, which is widely applied in industries such as healthcare, finance, and e-commerce.
- It allows users to extract actionable insights from large volumes of text data
How is it beneficial and used?
- NER plays a key role in tasks such as document summarization, information retrieval.
- It automates the extraction of relevant entities, which reduces manual effort and improves efficiency.
How did you start approaching this project? (Initial thoughts and planning)
- The project leverages SpaCy's pre-trained NER models, enabling easy text analysis without the need for training custom models.
Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)
- SpaCy Documentation: SpaCy NER
- NLP in Python by Steven Bird et al.
EXPLANATION
DETAILS OF THE DIFFERENT ENTITY TYPES
The system extracts the following entity types:
Entity Type | Description |
---|---|
PERSON | Names of people (e.g., "Anuska") |
ORG | Organizations (e.g., "Google", "Tesla") |
LOC | Locations (e.g., "New York", "Mount Everest") |
DATE | Dates (e.g., "January 1st, 2025") |
GPE | Geopolitical entities (e.g., "India", "California") |
WHAT I HAVE DONE
Step 1: Data collection and preparation
- Gathered sample text for analysis (provided by users in the app).
- Explored the text structure and identified entity types.
Step 2: NER model implementation
- Integrated SpaCy's pre-trained NER model (
en_core_web_sm
). - Extracted named entities and visualized them with labels and color coding.
Step 3: Testing and validation
- Validated results with multiple test cases to ensure entity accuracy.
- Allowed users to input custom text for NER analysis in real-time.
PROJECT TRADE-OFFS AND SOLUTIONS
Trade Off 1: Pre-trained model vs. custom model
- Pre-trained models provide quick results but may lack accuracy for domain-specific entities.
- Custom models can improve accuracy but require additional data and training time.
Trade Off 2: Real-time analysis vs. batch processing
- Real-time analysis in a web app enhances user interaction but might slow down with large text inputs.
- Batch processing could be more efficient for larger datasets.
SCREENSHOTS
NER Example
``` mermaid graph LR A[Start] --> B[Text Input]; B --> C[NER Analysis]; C --> D{Entities Extracted}; D -->|Person| E[Anuska]; D -->|Location| F[New York]; D -->|Organization| G[Google]; D -->|Date| H[January 1st, 2025];