dclustval

dclustval.cluster.catelogue_labs(temp_cell_labels)

Catalogs the indices of unique labels in a list.

Parameters:: temp_cell_labels (list) – The list of cell labels.
Returns:: out_dict – A dictionary mapping each unique label to a list of indices at which the label occurs.
Return type:: dict

dclustval.cluster.comp_to_mat(comp_list)

Converts a list of components into an adjacency matrix.

Parameters:: comp_list (list) – A list of components.
Returns:: out_mat – A matrix that represents the adjacency of the components.
Return type:: ndarray

dclustval.cluster.dense_rank(in_vect)

dclustval.cluster.dense_rank_both(in_vect1, in_vect2)

Performs a dense rank operation on two input vectors.

Parameters:

in_vect1 (ndarray) – The first input vector.
in_vect2 (ndarray) – The second input vector.

Returns:

out_vect1 (ndarray) – The dense-ranked version of the first input vector.
out_vect2 (ndarray) – The dense-ranked version of the second input vector.

dclustval.cluster.do_cluster_validation(mat_1_dist, mat_2_dist, temp_cell_labels, alpha=0.01, plot_dir='', validation_merge=True)

Performs cluster validation and potentially merges clusters.

Parameters:

mat_1_dist (ndarray) – The first distance matrix.
mat_2_dist (ndarray) – The second distance matrix.
temp_cell_labels (list) – A list of temporary cell labels.
alpha (float, optional) – The significance level, default is 0.01.
plot_dir (str, optional) – The directory to save the plots, default is an empty string.
validation_merge (bool, optional) – Whether to perform cluster merging, default is True.

Returns:

stat_mat (ndarray) – The statistic matrix for each cluster pair.
p_mat_adj (ndarray) – The adjusted p-value matrix for each cluster pair.
final_labels (list) – The final labels for each cell.

Examples

>>> import numpy as np
>>> from sklearn.metrics.pairwise import euclidean_distances as euc
>>> from dclustval.cluster import do_cluster_validation
>>> np.random.seed(123456)
>>> n_obs = 400
>>> n_features = 2
>>> dist1 = euc(np.random.random(size=(n_obs,n_features)))
>>> dist2 = euc(np.random.random(size=(n_obs,n_features)))
>>> bad_labels = np.array([0 for _ in range(int(n_obs)/2)]+[1 for _ in range(int(n_obs)/2)])
>>> stat_mat, p_mat_adj, final_labels = do_cluster_validation(dist1, dist2, bad_labels)

dclustval.cluster.finalize_comp_list(comps_list, p_mat_adj)

Finalizes a list of components by merging component pairs into clusters based on their p-values.

Parameters:

comps_list (list) – A list of components.
p_mat_adj (ndarray) – An adjacency matrix of p-values.

Returns:

final_comp_list – A list of finalized components.

Return type:

list

dclustval.cluster.get_final_labels(temp_cell_labels, sig_mat, p_mat_adj)

Generates the final labels for each cell.

Parameters:

temp_cell_labels (list) – A list of temporary cell labels.
sig_mat (ndarray) – A significance matrix.
p_mat_adj (ndarray) – An adjacency matrix of p-values.

Returns:

final_labels – The final labels for each cell.

Return type:

list

dclustval.cluster.get_merged_clusters(first, second, p)

Creates a network graph and recursively finds and removes highest-weight cliques.

Parameters:

first (ndarray) – The first input vector.
second (ndarray) – The second input vector.
p (ndarray) – A vector of p-values, each corresponding to a pair of elements in the input vectors.

Returns:

final_merged_clusters – A list of all highest-weight cliques removed from the graph.

Return type:

list

dclustval.cluster.get_ordered_list_by_p(comp, p_mat_adj)

Generates a list of merged clusters by ordering component pairs by their p-values.

Parameters:

comp (list) – A list of components.
p_mat_adj (ndarray) – An adjacency matrix of p-values.

Returns:

merged_comps_list – A list of merged clusters, with each cluster represented as a list of its component labels.

Return type:

list

dclustval.cluster.get_recursive_cliques(G)

Finds and removes highest-weight cliques recursively from a network graph.

Parameters:: G (NetworkX graph) – The input network graph.
Returns:: final_out_mergers – A list of all highest-weight cliques removed from the graph.
Return type:: list

dclustval.cluster.get_weighted_cliques(G)

Finds and removes the highest-weight clique in a network graph.

Parameters:

G (NetworkX graph) – The input network graph.

Returns:

winner_clique (list) – The highest-weight clique in the input graph.
G (NetworkX graph) – The input graph with the highest-weight clique removed.