202511221446
Status: idea
Tags: Datascience

TF-IDF

TF-IDF (Term Frequency–Inverse Document Frequency) is a statistical method used in natural language processing and information retrieval to evaluate how important a word is to a document in relation to a larger collection of documents. TF-IDF combines two components:

1. Term Frequency (TF): Measures how often a word appears in a document. A higher frequency suggests greater importance. If a term appears frequently in a document, it is likely relevant to the document’s content. 2. Inverse Document Frequency (IDF): Reduces the weight of common words across multiple documents while increasing the weight of rare words. If a term appears in fewer documents, it is more likely to be meaningful and specific.

This balance allows TF-IDF to highlight terms that are both frequent within a specific document and distinctive across the text document, making it a useful tool for tasks like search ranking, text classification and keyword extraction.

References

Ik was aan het werken aan de avans 2-1 LU2 opdracht, en toen kwam ik er achter dat Bag of Words toch niet de meest geschikte optie was.
een geeks for geeks article: https://www.geeksforgeeks.org/machine-learning/understanding-tf-idf-term-frequency-inverse-document-frequency/

🌵OldMartijntje

Explorer

TF-IDF Term Frequency–Inverse Document Frequency

TF-IDF

References

Graph View

Table of Contents

Backlinks