Skip to content

Name Entity Recognition (NER) Project

AIM

To develop a system that identifies and classifies named entities (such as persons, organizations, locations, dates, etc.) in text using Named Entity Recognition (NER) with SpaCy.

N/A (This project uses text input for NER analysis, not a specific dataset) - It uses real time data as input .

https://colab.research.google.com/drive/1pBIEFA4a9LzyZKUFQMCypQ22M6bDbXM3?usp=sharing

LIBRARIES NEEDED

  • SpaCy

DESCRIPTION

What is the requirement of the project?

  • Named Entity Recognition (NER) is essential to automatically extract and classify key entities from text, such as persons, organizations, locations, and more.
  • This helps in analyzing and organizing data efficiently, enabling various NLP applications like document analysis and information retrieval.
Why is it necessary?
  • NER is used for understanding and structuring unstructured text, which is widely applied in industries such as healthcare, finance, and e-commerce.
  • It allows users to extract actionable insights from large volumes of text data
How is it beneficial and used?
  • NER plays a key role in tasks such as document summarization, information retrieval.
  • It automates the extraction of relevant entities, which reduces manual effort and improves efficiency.
How did you start approaching this project? (Initial thoughts and planning)
  • The project leverages SpaCy's pre-trained NER models, enabling easy text analysis without the need for training custom models.

Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)

  • SpaCy Documentation: SpaCy NER
  • NLP in Python by Steven Bird et al.

EXPLANATION

DETAILS OF THE DIFFERENT ENTITY TYPES

The system extracts the following entity types:

Entity Type Description
PERSON Names of people (e.g., "Anuska")
ORG Organizations (e.g., "Google", "Tesla")
LOC Locations (e.g., "New York", "Mount Everest")
DATE Dates (e.g., "January 1st, 2025")
GPE Geopolitical entities (e.g., "India", "California")

WHAT I HAVE DONE

Step 1: Data collection and preparation

  • Gathered sample text for analysis (provided by users in the app).
  • Explored the text structure and identified entity types.

Step 2: NER model implementation

  • Integrated SpaCy's pre-trained NER model (en_core_web_sm).
  • Extracted named entities and visualized them with labels and color coding.

Step 3: Testing and validation

  • Validated results with multiple test cases to ensure entity accuracy.
  • Allowed users to input custom text for NER analysis in real-time.

PROJECT TRADE-OFFS AND SOLUTIONS

Trade Off 1: Pre-trained model vs. custom model

  • Pre-trained models provide quick results but may lack accuracy for domain-specific entities.
  • Custom models can improve accuracy but require additional data and training time.

Trade Off 2: Real-time analysis vs. batch processing

  • Real-time analysis in a web app enhances user interaction but might slow down with large text inputs.
  • Batch processing could be more efficient for larger datasets.

SCREENSHOTS

NER Example

``` mermaid graph LR A[Start] --> B[Text Input]; B --> C[NER Analysis]; C --> D{Entities Extracted}; D -->|Person| E[Anuska]; D -->|Location| F[New York]; D -->|Organization| G[Google]; D -->|Date| H[January 1st, 2025];