202511171507 Status: idea Tags: Datascience

Natural Language Processing NLP

Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that enables machines to understand and process human languages, either in text or audio form.

NLP is widely used for speech recognition, language translation and text summarization.

NLP kan opgedeeld worden onder 2 categorieën:

Techniques

In general gaat de volgorde van gebruikte technieken ongeveer zo:

  1. Text Normalization
  2. Tokenization
  3. Stopword Removal
  4. Stemming
  5. Lemmatization
  6. POS Tagging

Tasks

Dingen waar NLP vooral voor gebruikt wordt zijn:

  • Text generation
  • Text summarization
  • Machine translation
  • Sentiment Analysis
  • Named Entity Recognition
  • Text Classification.

Applications

In wat voor eindproducten zit NLP dan bijvoorbeeld verwerkt?

  • Voice assistants
  • Speech-to-text
  • Text analysis
  • Chatbots
  • Information retrieval
  • Content Reccomendation

NLP vs NLU vs NLG

This table shows the differences between them.

AspectNLPNLGNLU
InputRaw or structured languageStructured dataNatural language text
OutputStructured or unstructured textHuman-readable textMachine-readable meaning
GoalInterpret & produce languageGenerate natural-sounding textUnderstanding meaning & intent
TechniquesParsing, tagging, vectorizationTemplates, ML models, transformersSyntax analysis, semantics, embeddings
TasksTranslation, speech-to-text, summarizationReports writing, product descriptionsIntent detection, sentiment analysis
ToolsspaCy, NLTK, Hugging FaceGPT, T5, SimpleNLGBERT, RoBERTa, Dialogflow

Text representation Techniques

Convert textual data into numerical vectors:


References