Named Entity Recognition (NER) with PyTorch + BERT

Let’s be real—language models like ChatGPT and BERT are super smart. But how do they actually know who “Elon Musk” is or what counts as a "location"? That is where Named Entity Recognition (NER) comes in; and this project dives into building an NER system using PyTorch and the powerful BERT model.

The Dataset: Tagging the World, One Word at a Time

The dataset we used is made for NER tasks. Basically, it is a bunch of sentences where each word is labeled as a specific type of entity—or not an entity at all. Common entity tags include:

B-PER (Beginning of a Person's name)
I-ORG (Inside an Organization name)
O (Just a regular word—nothing special)

Imagine this sentence: "Ayushi is an amazing Data Scientist."

Word	Label
Ayushi	B-PER
is	O
an	O
amazing	O
Data	O
Scientist	O
.	O

Each word is paired with a tag. It's simple, but super powerful for training models to recognize patterns.

Why This Project?

This was not just about making BERT do NER tricks it was also about learning how things actually work under the hood. I wanted to:

Get hands-on with PyTorch, especially compared to other tools like TensorFlow and Keras.
Understand how BERT handles language.
Learn what is really happening during tokenization and fine-tuning.

Even though modern models are crazy accurate, it’s still essential to grasp the basics if you want to build, customize, or improve them.

PyTorch vs. the Rest: A Quick Comparison

Here is the lowdown on how PyTorch stacks up:

PyTorch: Super flexible, perfect for research, and easy to debug thanks to its dynamic computation graph.
Keras: Great for beginners and prototyping—simple and high-level, but a bit limited for custom tweaks.
TensorFlow: Optimized for performance and production, but trickier to work with due to its static graph approach.

TL;DR: PyTorch = best for experimenting. TensorFlow = best for deployment.

What I built

The core of the project was creating an NER model using BERT + PyTorch. The goal? Train the model to spot entities like names, places, and organizations from raw text. The result is a system that can help power real-world tools like: Search engines, Chatbots, Information extraction tools (think: pulling key info from legal or medical documents).

Meet BERT: The Language Model That Changed the Game

BERT (from Google, 2018) is a transformer-based model that processes language bidirectionally. That means it looks at the whole sentence—not just left to right like older models (e.g., early GPT versions). BERT was trained on huge text corpora like Wikipedia and BookCorpus, using something called Masked Language Modeling—predicting missing words in a sentence using surrounding context. For this project, I used BertForTokenClassification, a BERT variant that tags each word (or word-piece) in a sentence with an entity label.

Tokenization: Breaking Words Down

Before BERT sees the text, it breaks it into tokens using WordPiece tokenization. For instance, a rare name like "Ayushi" might get split into:


['[CLS]', 'Ay', '##ushi', '[SEP]']

[CLS] = Classification token (used mostly for classification tasks)
[SEP] = Separator token (used between segments)

Each subword still gets labeled individually. So "Ay" might be tagged as B-PER and "##ushi" as I-PER.

Summarizing

In this project, I built a working NER model using BERT and PyTorch. Along the way, I got to:

Explore how tokenization works.
Understand the BERT architecture.
Fine-tune a pre-trained model for a custom NLP task

Even though tools like GPT-4 can do this stuff out of the box, learning to build it yourself helps you understand why it works—and how to make it better for your own use cases.

See BERT and PyTorch in Action!

In this demo video, I show how the model takes a user-provided sentence, breaks it down into tokens using BERT’s tokenizer (including subwords and special tokens), and then identifies named entities like people, organizations, or locations. The model highlights each token along with its predicted label (like B-PER for the beginning of a person’s name), showcasing how fine-tuned transformers perform real-time entity recognition.

You can test how the Named Entity recognition takes place here!