# dissimilarity

## summed_dissimilarity_sampler(X, X_ref, n=1, metric='euclidean')

This dissimilarity samples re-arranges the pool of IV conditions according to their dissimilarity with respect to a reference pool X_ref. The default dissimilarity is calculated as the average of the pairwise distances between the conditions in X and X_ref.

Parameters:

Name Type Description Default
X np.ndarray

pool of IV conditions to evaluate dissimilarity

required
X_ref np.ndarray

reference pool of IV conditions

required
n int

number of samples to select

1
metric str

dissimilarity measure. Options: 'euclidean', 'manhattan', 'chebyshev', 'minkowski', 'wminkowski', 'seuclidean', 'mahalanobis', 'haversine', 'hamming', 'canberra', 'braycurtis', 'matching', 'jaccard', 'dice', 'kulsinski', 'rogerstanimoto', 'russellrao', 'sokalmichener', 'sokalsneath', 'yule'. See sklearn.metrics.DistanceMetric for more details.

'euclidean'

Returns:

Type Description
np.ndarray

Sampled pool

Source code in autora/experimentalist/sampler/dissimilarity.py
 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 def summed_dissimilarity_sampler( X: np.ndarray, X_ref: np.ndarray, n: int = 1, metric: AllowedMetrics = "euclidean" ) -> np.ndarray: """ This dissimilarity samples re-arranges the pool of IV conditions according to their dissimilarity with respect to a reference pool X_ref. The default dissimilarity is calculated as the average of the pairwise distances between the conditions in X and X_ref. Args: X: pool of IV conditions to evaluate dissimilarity X_ref: reference pool of IV conditions n: number of samples to select metric (str): dissimilarity measure. Options: 'euclidean', 'manhattan', 'chebyshev', 'minkowski', 'wminkowski', 'seuclidean', 'mahalanobis', 'haversine', 'hamming', 'canberra', 'braycurtis', 'matching', 'jaccard', 'dice', 'kulsinski', 'rogerstanimoto', 'russellrao', 'sokalmichener', 'sokalsneath', 'yule'. See [sklearn.metrics.DistanceMetric][] for more details. Returns: Sampled pool """ if isinstance(X, Iterable): X = np.array(list(X)) if isinstance(X_ref, Iterable): X_ref = np.array(list(X_ref)) if X.ndim == 1: X = X.reshape(-1, 1) if X_ref.ndim == 1: X_ref = X_ref.reshape(-1, 1) if X.shape[1] != X_ref.shape[1]: raise ValueError( f"X and X_ref must have the same number of columns.\n" f"X has {X.shape[1]} columns, while X_ref has {X_ref.shape[1]} columns." ) if X.shape[0] < n: raise ValueError( f"X must have at least {n} rows matching the number of requested samples." ) dist = DistanceMetric.get_metric(metric) # create a list to store the summed distances for each row in matrix1 summed_distances = [] # loop over each row in first matrix for row in X: # calculate the distances between the current row in matrix1 and all other rows in matrix2 summed_distance = 0 for X_ref_row in X_ref: distance = dist.pairwise([row, X_ref_row])[0, 1] summed_distance += distance # store the summed distance for the current row summed_distances.append(summed_distance) # sort the rows in matrix1 by their summed distances sorted_X = X[np.argsort(summed_distances)[::-1]] return sorted_X[:n]