topologic.embedding.distance package

topologic.embedding.distance.cosine(first_vector: numpy.ndarray, second_vector: numpy.ndarray) → float[source]

Distance function for two vectors of equal length.

Cosine distance

See also: https://en.wikipedia.org/wiki/Cosine_similarity

Parameters
  • first_vector (numpy.ndarray) – nonzero vector. must be same length as second_vector

  • second_vector (numpy.ndarray) – nonzero vector. must be same length as first_vector

Returns

cosine distance - Resulting range is between 0 and 2. Values closer to 0 are more similar. Values closer to 2 are approaching total dissimilarity.

Return type

float

Examples
>>> cosine(np.array([1,3,5]), np.array([2,3,4]))        
0.026964528109766017
topologic.embedding.distance.euclidean(first_vector: numpy.ndarray, second_vector: numpy.ndarray) → float[source]

Distance function for two vectors of equal length

Euclidean distance

See also: https://en.wikipedia.org/wiki/Euclidean_distance

Parameters
  • first_vector (numpy.ndarray) – nonzero vector. must be same length as second_vector

  • second_vector (numpy.ndarray) – nonzero vector. must be same length as first_vector

Returns

euclidean distance - Resulting range is a positive real number. Values closer to 0 are more similar.

Return type

float

Examples
>>> euclidean(np.array([1,3,5]), np.array([2,3,4]))    
1.4142135623730951
topologic.embedding.distance.mahalanobis(inverse_covariance: numpy.ndarray) → Callable[[numpy.ndarray, numpy.ndarray], float][source]

Unlike cosine and euclidean distances which scipy provides that take in only two vectors, mahalanobis also requires an inverse covariance matrix. This function can be used but first this matrix must be provided and a curried function handler returned, which can then be passed in to the vector_distance and embedding_distances_from functions.

See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.mahalanobis.html

Parameters

inverse_covariance (np.ndarray) – The inverse covariance matrix

Returns

A curried function that now takes in 2 vectors and determines distance based on the inverse_covariance provided.

topologic.embedding.distance.valid_distance_functions() → KeysView[str][source]

The topologic builtin list of valid distance functions. Any function that return a float when given two np.ndarray 1d vectors is a valid choice, but the only ones we support without any other work are cosine or euclidean.

Returns

A set-like view of the string names of the functions we support

topologic.embedding.distance.vector_distance(first_vector: numpy.ndarray, second_vector: numpy.ndarray, method: Union[str, Callable[[numpy.ndarray, numpy.ndarray], float]] = <function cosine>) → float[source]

Vector distance is a function that will do any distance function you would like on two vectors. This is most commonly used by changing the method parameter, as a string, from “cosine” to “euclidean” - allowing you to change your flow based on configuration not on code changes to the actual cosine and euclidean functions.

Parameters
  • first_vector (np.ndarray) – A 1d array-like (list, tuple, np.array) that represents the first vector

  • second_vector (np.ndarray) – A 1d array-like (list, tuple, np.array) that represents the second vector

  • method (Union[str, Callable[[np.ndarray, np.ndarray], float]]) – Method can be any distance function that takes in 2 parameters. It can also be the string mapping to that function (as described by valid_distance_functions()). Note that you can also provide other functions, such as mahalanobis, but they require more information than just the comparative vectors.

Returns

A float indicating the distance between two vectors.

topologic.embedding.distance.embedding_distances_from(vector: numpy.ndarray, embedding: Union[topologic.embedding.embedding_container.EmbeddingContainer, numpy.ndarray], method: Union[str, Callable[[numpy.ndarray, numpy.ndarray], float]] = <function cosine>) → numpy.ndarray[source]

This function will return a 1d np.ndarray of floats by doing a distance calculation from the given vector to each vector stored in the embedding (likely including itself).

The distance calculation can be provided either as a function reference or a string representation mapped to the 2 standard distance functions we natively support. The functions supported are cosine and euclidean, both of which are scipy implementations. There is also a mahalanobis generator function that can be used, but first you must provide it with the inverse covariance matrix necessary for the distance calculations to be performed.

Parameters
  • vector (np.ndarray) – A 1d array-like (list, tuple, np.array) that represents the vector to compare against every other vector in the embedding

  • np.ndarray] embedding (Union[EmbeddingContainer,) – The embedding is either a 2d np array, where each row is a vector and the number of columns is identical to the length of the vector to compare against.

  • method (Union[str, Callable[[np.ndarray, np.ndarray], float]]) – Method can be any distance function that takes in 2 parameters. It can also be the string mapping to that function (as described by valid_distance_functions()). Note that you can also provide other functions, such as mahalanobis, but they require more information than just the comparative vectors.

Returns

np.ndarray of dtype float the same length as the count of embedded vectors

Examples
>>> vector = [0.3, 0.4, 0.5]
>>> embedding = np.array([[0.3, 0.4, 0.5], [0.31, 0.44, 0.7]])
>>> embedding_distances_from(vector, embedding, method="cosine") # using string version of method name
array([0.        , 0.00861606])
>>> embedding_distances_from(vector, embedding, method=euclidean) # using function handle
array([0.        , 0.20420578])