202511060909 Status: idea Tags: Datascience, LLM
Cosine Similarity
Cosine simularity is a method to compare how simular 2 non-zero vectors are.
<?php
function cosineSimilarity($vector1, $vector2)
{
$dotProduct = 0.0;
$magnitude1 = 0.0;
$magnitude2 = 0.0;
foreach ($vector1 as $key => $value) {
$dotProduct += $value * $vector2[$key];
$magnitude1 += pow($value, 2);
$magnitude2 += pow($vector2[$key], 2);
}
$magnitude1 = sqrt($magnitude1);
$magnitude2 = sqrt($magnitude2);
$epsilon = 1e-8; // A small epsilon value to avoid division by very small numbers
if (abs($magnitude1 * $magnitude2) < $epsilon) {
return 0;
} else {
return $dotProduct / ($magnitude1 * $magnitude2);
}
}This can be used to compare how close-by 2 strings of text are with a LLM.
References
I first used this when trying to compare chatbot asked questions to FAQ questions, I didn’t really know what it did, jsut that it worked.
But now after this datascience lesson I realized that it is just comparing dataset columns (that had used dimensionality reduction)