If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:
np.exp(-X ** 2 / (2. * delta ** 2))
Creates an affinity matrix for X using the selected affinity, then applies spectral clustering to this affinity matrix.,Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for affinity='rbf'.,Affinity matrix used for clustering. Available only if after calling fit.,If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:
When calling fit
, an affinity matrix is constructed using either kernel function such the Gaussian (aka RBF) kernel of the euclidean distanced d(X, X)
:
np.exp(-gamma * d(X, X) ** 2)
If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:
np.exp(-dist_matrix ** 2 / (2. * delta ** 2))
Examples
>>> from sklearn.cluster
import SpectralClustering
>>>
import numpy as np >>>
X = np.array([
[1, 1],
[2, 1],
[1, 0],
...[4, 7],
[3, 5],
[3, 6]
]) >>>
clustering = SpectralClustering(n_clusters = 2,
...assign_labels = "discretize",
...random_state = 0).fit(X) >>>
clustering.labels_
array([1, 1, 1, 0, 0, 0]) >>>
clustering
SpectralClustering(affinity = 'rbf', assign_labels = 'discretize', coef0 = 1,
degree = 3, eigen_solver = None, eigen_tol = 0.0, gamma = 1.0,
kernel_params = None, n_clusters = 2, n_init = 10, n_jobs = None,
n_neighbors = 10, random_state = 0)
.SpectralClustering() , .spectral_clustering()
def test_spectral_clustering(eigen_solver, assign_labels):
S = np.array([
[1.0, 1.0, 1.0, 0.2, 0.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.2, 0.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.2, 0.0, 0.0, 0.0],
[0.2, 0.2, 0.2, 1.0, 1.0, 1.0, 1.0],
[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0],
[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0],
[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]
])
for mat in (S, sparse.csr_matrix(S)):
model = SpectralClustering(random_state = 0, n_clusters = 2,
affinity = 'precomputed',
eigen_solver = eigen_solver,
assign_labels = assign_labels
).fit(mat)
labels = model.labels_
if labels[0] == 0:
labels = 1 - labels
assert adjusted_rand_score(labels, [1, 1, 1, 0, 0, 0, 0]) == 1
model_copy = pickle.loads(pickle.dumps(model))
assert model_copy.n_clusters == model.n_clusters
assert model_copy.eigen_solver == model.eigen_solver
assert_array_equal(model_copy.labels_, model.labels_)
def spectral_clustering(n_clusters, samples, size = False): "" " Run k - means clustering on vertex coordinates. Parameters: - - - - - n_clusters: int number of clusters to generate samples: array adjacency matrix of surface or region "" " # Run Spectral Clustering spectral = cluster.SpectralClustering( n_clusters = n_clusters, affinity = 'precomputed') spectral.fit(samples) labels = spectral.labels_.copy() labels = labels.astype(np.int32) + 1 return labels
def post_proC(C, K, d, alpha):
# C: coefficient matrix, K: number of clusters, d: dimension of each subspace
C = 0.5 * (C + C.T)
r = min(d * K + 1, C.shape[0] - 1)
U, S, _ = svds(C, r, v0 = np.ones(C.shape[0]))
U = U[: , ::-1]
S = np.sqrt(S[::-1])
S = np.diag(S)
U = U.dot(S)
U = normalize(U, norm = 'l2', axis = 1)
Z = U.dot(U.T)
Z = Z * (Z > 0)
L = np.abs(Z ** alpha)
L = L / L.max()
L = 0.5 * (L + L.T)
spectral = cluster.SpectralClustering(n_clusters = K, eigen_solver = 'arpack', affinity = 'precomputed', assign_labels = 'discretize', random_state = 66)
spectral.fit(L)
grp = spectral.fit_predict(L) + 1
return grp, L
def computeSourceNodes(A, C): "" " computeSourceNodes: compute source nodes for the source localization problem Input: A(np.array): adjacency matrix of shape N x N C(int): number of classes Output: sourceNodes(list): contains the indices of the C source nodes Uses the adjacency matrix to compute C communities by means of spectral clustering, and then selects the node with largest degree within each community "" " sourceNodes = [] degree = np.sum(A, axis = 0) # degree of each vector # Compute communities communityClusters = SpectralClustering(n_clusters = C, affinity = 'precomputed', assign_labels = 'discretize') communityClusters = communityClusters.fit(A) communityLabels = communityClusters.labels_ # For each community for c in range(C): communityNodes = np.nonzero(communityLabels == c)[0] degreeSorted = np.argsort(degree[communityNodes]) sourceNodes = sourceNodes + [communityNodes[degreeSorted[-1]]] return sourceNodes
def test_spectral_unknown_mode():
# Test that SpectralClustering fails with an unknown mode set.
centers = np.array([
[0., 0., 0.],
[10., 10., 10.],
[20., 20., 20.],
])
X, true_labels = make_blobs(n_samples=100, centers=centers,
cluster_std=1., random_state=42)
D = pairwise_distances(X) # Distance matrix
S = np.max(D) - D # Similarity matrix
S = sparse.coo_matrix(S)
assert_raises(ValueError, spectral_clustering, S, n_clusters=2,
random_state=0, eigen_solver="<unknown>")
def test_spectral_unknown_assign_labels():
# Test that SpectralClustering fails with an unknown assign_labels set.
centers = np.array([
[0., 0., 0.],
[10., 10., 10.],
[20., 20., 20.],
])
X, true_labels = make_blobs(n_samples=100, centers=centers,
cluster_std=1., random_state=42)
D = pairwise_distances(X) # Distance matrix
S = np.max(D) - D # Similarity matrix
S = sparse.coo_matrix(S)
assert_raises(ValueError, spectral_clustering, S, n_clusters=2,
random_state=0, assign_labels="<unknown>")
Spectral Clustering can also be used to partition graphs via their spectral embeddings. In this case, the affinity matrix is the adjacency matrix of the graph, and SpectralClustering is initialized with affinity='precomputed':,Each clustering algorithm comes in two variants: a class, that implements the fit method to learn the clusters on train data, and a function, that, given train data, returns an array of integer labels corresponding to the different clusters. For the class, the labels over the training data can be found in the labels_ attribute.,between two clusterings computed by considering all pairs of samples and counting pairs that are assigned into the same or into different clusters under the true and predicted clusterings., Adjustment for chance in clustering performance evaluation: Analysis of the impact of the dataset size on the value of clustering measures for random assignments.
Note that if the values of your similarity matrix are not well distributed, e.g. with negative values or with a distance matrix rather than a similarity, the spectral problem will be singular and the problem not solvable. In which case it is advised to apply a transformation to the entries of the matrix. For instance, in the case of a signed distance matrix, is common to apply a heat kernel:
similarity = np.exp(-beta * distance / distance.std())
This is not the case for completeness_score
and homogeneity_score
: both are bound by the relationship:
homogeneity_score(a, b) == completeness_score(b, a)
Invalid archive member string specified
Invalid archive member string specified