dclustval
- dclustval.cluster.catelogue_labs(temp_cell_labels)
Catalogs the indices of unique labels in a list.
- Parameters:
temp_cell_labels (list) – The list of cell labels.
- Returns:
out_dict – A dictionary mapping each unique label to a list of indices at which the label occurs.
- Return type:
dict
- dclustval.cluster.comp_to_mat(comp_list)
Converts a list of components into an adjacency matrix.
- Parameters:
comp_list (list) – A list of components.
- Returns:
out_mat – A matrix that represents the adjacency of the components.
- Return type:
ndarray
- dclustval.cluster.dense_rank(in_vect)
- dclustval.cluster.dense_rank_both(in_vect1, in_vect2)
Performs a dense rank operation on two input vectors.
- Parameters:
in_vect1 (ndarray) – The first input vector.
in_vect2 (ndarray) – The second input vector.
- Returns:
out_vect1 (ndarray) – The dense-ranked version of the first input vector.
out_vect2 (ndarray) – The dense-ranked version of the second input vector.
- dclustval.cluster.do_cluster_validation(mat_1_dist, mat_2_dist, temp_cell_labels, alpha=0.01, plot_dir='', validation_merge=True)
Performs cluster validation and potentially merges clusters.
- Parameters:
mat_1_dist (ndarray) – The first distance matrix.
mat_2_dist (ndarray) – The second distance matrix.
temp_cell_labels (list) – A list of temporary cell labels.
alpha (float, optional) – The significance level, default is 0.01.
plot_dir (str, optional) – The directory to save the plots, default is an empty string.
validation_merge (bool, optional) – Whether to perform cluster merging, default is True.
- Returns:
stat_mat (ndarray) – The statistic matrix for each cluster pair.
p_mat_adj (ndarray) – The adjusted p-value matrix for each cluster pair.
final_labels (list) – The final labels for each cell.
Examples
>>> import numpy as np >>> from sklearn.metrics.pairwise import euclidean_distances as euc >>> from dclustval.cluster import do_cluster_validation >>> np.random.seed(123456) >>> n_obs = 400 >>> n_features = 2 >>> dist1 = euc(np.random.random(size=(n_obs,n_features))) >>> dist2 = euc(np.random.random(size=(n_obs,n_features))) >>> bad_labels = np.array([0 for _ in range(int(n_obs)/2)]+[1 for _ in range(int(n_obs)/2)]) >>> stat_mat, p_mat_adj, final_labels = do_cluster_validation(dist1, dist2, bad_labels)
- dclustval.cluster.finalize_comp_list(comps_list, p_mat_adj)
Finalizes a list of components by merging component pairs into clusters based on their p-values.
- Parameters:
comps_list (list) – A list of components.
p_mat_adj (ndarray) – An adjacency matrix of p-values.
- Returns:
final_comp_list – A list of finalized components.
- Return type:
list
- dclustval.cluster.get_final_labels(temp_cell_labels, sig_mat, p_mat_adj)
Generates the final labels for each cell.
- Parameters:
temp_cell_labels (list) – A list of temporary cell labels.
sig_mat (ndarray) – A significance matrix.
p_mat_adj (ndarray) – An adjacency matrix of p-values.
- Returns:
final_labels – The final labels for each cell.
- Return type:
list
- dclustval.cluster.get_merged_clusters(first, second, p)
Creates a network graph and recursively finds and removes highest-weight cliques.
- Parameters:
first (ndarray) – The first input vector.
second (ndarray) – The second input vector.
p (ndarray) – A vector of p-values, each corresponding to a pair of elements in the input vectors.
- Returns:
final_merged_clusters – A list of all highest-weight cliques removed from the graph.
- Return type:
list
- dclustval.cluster.get_ordered_list_by_p(comp, p_mat_adj)
Generates a list of merged clusters by ordering component pairs by their p-values.
- Parameters:
comp (list) – A list of components.
p_mat_adj (ndarray) – An adjacency matrix of p-values.
- Returns:
merged_comps_list – A list of merged clusters, with each cluster represented as a list of its component labels.
- Return type:
list
- dclustval.cluster.get_recursive_cliques(G)
Finds and removes highest-weight cliques recursively from a network graph.
- Parameters:
G (NetworkX graph) – The input network graph.
- Returns:
final_out_mergers – A list of all highest-weight cliques removed from the graph.
- Return type:
list
- dclustval.cluster.get_weighted_cliques(G)
Finds and removes the highest-weight clique in a network graph.
- Parameters:
G (NetworkX graph) – The input network graph.
- Returns:
winner_clique (list) – The highest-weight clique in the input graph.
G (NetworkX graph) – The input graph with the highest-weight clique removed.