202511191250 Status: idea Tags: Datascience, NLP
NLP Stopword Removal
- Stopwords : Common words (articles, prepositions, pronouns) with little semantic value.
- Examples : ātheā, āandā, āisā, āinā.
- Importance : Proper handling improves NLP performance and efficiency
When do you remove stopwords?
Beneficial for:
- Text classification
- sentiment analysis
- Information retrieval
- search engines
- Topic modeling
- clustering
- Keyword extraction Keep stopwords for:
- Machine translation (grammar structure)
- Text summarization (sentence coherence)
- Question answering systems (syntactic relations)
- Grammar checking, parsing
Types of stopwords
- Standard stopwords : Function words (āaā, ātheā, āinā, āonā)
- Domain-specific stopwords : Field-dependent terms (e.g., āpatientā in medical texts)
- Contextual stopwords : Extremely frequent dataset-specific words
- Numerical stopwords : Digits, punctuation marks, single character
References
- Dit is iets wat we leren voor Datascience. dit was informatie vanuit avans 2-2 datascience 2025-11-12. en daarbij horen deze slides
- I was writing a note about NLP which mentions this.