202511171448 Status: idea Tags: Datascience, Machine Learning
Silhouette Score
The Silhouette Score () measures how well each data point fits within its cluster. The value of the silhouette score ranges from -1 to +1.
1. Calculation for a Single Data Point ()
For a single data point , the silhouette score is calculated as:
Where:
- = Average distance from to other points in the same cluster (Cohesion). This measures how close the data point is to the other points in its own cluster. A smaller is desirable.
- = Average distance from to points in the nearest different cluster (Separation). This measures how far the data point is from the points in the nearest neighboring cluster. A larger is desirable.
2. Interpretation of
- : The data point is well-clustered and far from the neighboring clusters. This indicates strong cohesion and separation.
- : The data point is on or very close to the decision boundary between two neighboring clusters.
- : The data point is misclassified and is likely assigned to the wrong cluster. Its nearest neighbors are in a different cluster.
3. The Mean Silhouette Score ()
The overall Mean Silhouette Score () for the entire clustering result is the average of the silhouette scores for all data points () in the dataset:
The mean silhouette score is used to evaluate the overall quality of the clustering solution. A higher mean score indicates a better, more dense, and well-separated clustering structure.
References
Dit is iets wat we leren voor Datascience. dit was informatie vanuit avans 2-2 datascience 2025-11-10. en daarbij horen deze slides