202511191259
Status: idea
Tags: Datascience, NLP, Text representation Technique

One-Hot Encoding

One hot encoding is a method that converts categorical values into numerical vectors so that machine learning models can process them. Each category is turned into a binary vector where exactly one position is marked with a 1 and all others are 0. This removes any unintended ordering between categories.

Why it is used

Models cannot work with raw text categories.
Prevents the model from assuming numeric relationships between categories.
Simple and effective for small to medium category sets.

How it works

Example 1: Simple categories

Original values:

Value
Red
Blue
Green

One hot encoded:

Value	Red	Blue	Green
Red	1	0	0
Blue	0	1	0
Green	0	0	1

Example 2: Repeated categories

Original data:

Item	Color
A	Red
B	Green
C	Red
D	Blue

One hot encoded:

Item	Red	Blue	Green
A	1	0	0
B	0	0	1
C	1	0	0
D	0	1	0

Advantages

Removes false numeric relationships.
Easy to interpret.
Works well for many ML algorithms.

Limitations

Produces wide vectors when categories are numerous.
High memory usage for large vocabularies.
Sparse representation may slow down some models.

When to use it

When categories are not too many.
When preserving non-ordinal relationships is important.
When working with simple classical ML models.

References

Dit is iets wat we leren voor Datascience. dit was informatie vanuit avans 2-1 datascience 2025-11-12. en daarbij horen deze slides
I was writing a note about NLP which mentions this.
geeks for geeks: https://www.geeksforgeeks.org/machine-learning/ml-one-hot-encoding/

🌵OldMartijntje

Explorer

One-Hot Encoding

One-Hot Encoding

Why it is used

How it works

Example 1: Simple categories

Example 2: Repeated categories

Advantages

Limitations

When to use it

References

Graph View

Table of Contents

Backlinks