dclustval

dclustval.cluster.catelogue_labs(temp_cell_labels)

Catalogs the indices of unique labels in a list.

Parameters:

temp_cell_labels (list) – The list of cell labels.

Returns:

out_dict – A dictionary mapping each unique label to a list of indices at which the label occurs.

Return type:

dict

dclustval.cluster.comp_to_mat(comp_list)

Converts a list of components into an adjacency matrix.

Parameters:

comp_list (list) – A list of components.

Returns:

out_mat – A matrix that represents the adjacency of the components.

Return type:

ndarray

dclustval.cluster.dense_rank(in_vect)
dclustval.cluster.dense_rank_both(in_vect1, in_vect2)

Performs a dense rank operation on two input vectors.

Parameters:
  • in_vect1 (ndarray) – The first input vector.

  • in_vect2 (ndarray) – The second input vector.

Returns:

  • out_vect1 (ndarray) – The dense-ranked version of the first input vector.

  • out_vect2 (ndarray) – The dense-ranked version of the second input vector.

dclustval.cluster.do_cluster_validation(mat_1_dist, mat_2_dist, temp_cell_labels, alpha=0.01, plot_dir='', validation_merge=True)

Performs cluster validation and potentially merges clusters.

Parameters:
  • mat_1_dist (ndarray) – The first distance matrix.

  • mat_2_dist (ndarray) – The second distance matrix.

  • temp_cell_labels (list) – A list of temporary cell labels.

  • alpha (float, optional) – The significance level, default is 0.01.

  • plot_dir (str, optional) – The directory to save the plots, default is an empty string.

  • validation_merge (bool, optional) – Whether to perform cluster merging, default is True.

Returns:

  • stat_mat (ndarray) – The statistic matrix for each cluster pair.

  • p_mat_adj (ndarray) – The adjusted p-value matrix for each cluster pair.

  • final_labels (list) – The final labels for each cell.

Examples

>>> import numpy as np
>>> from sklearn.metrics.pairwise import euclidean_distances as euc
>>> from dclustval.cluster import do_cluster_validation
>>> np.random.seed(123456)
>>> n_obs = 400
>>> n_features = 2
>>> dist1 = euc(np.random.random(size=(n_obs,n_features)))
>>> dist2 = euc(np.random.random(size=(n_obs,n_features)))
>>> bad_labels = np.array([0 for _ in range(int(n_obs)/2)]+[1 for _ in range(int(n_obs)/2)])
>>> stat_mat, p_mat_adj, final_labels = do_cluster_validation(dist1, dist2, bad_labels)
dclustval.cluster.finalize_comp_list(comps_list, p_mat_adj)

Finalizes a list of components by merging component pairs into clusters based on their p-values.

Parameters:
  • comps_list (list) – A list of components.

  • p_mat_adj (ndarray) – An adjacency matrix of p-values.

Returns:

final_comp_list – A list of finalized components.

Return type:

list

dclustval.cluster.get_final_labels(temp_cell_labels, sig_mat, p_mat_adj)

Generates the final labels for each cell.

Parameters:
  • temp_cell_labels (list) – A list of temporary cell labels.

  • sig_mat (ndarray) – A significance matrix.

  • p_mat_adj (ndarray) – An adjacency matrix of p-values.

Returns:

final_labels – The final labels for each cell.

Return type:

list

dclustval.cluster.get_merged_clusters(first, second, p)

Creates a network graph and recursively finds and removes highest-weight cliques.

Parameters:
  • first (ndarray) – The first input vector.

  • second (ndarray) – The second input vector.

  • p (ndarray) – A vector of p-values, each corresponding to a pair of elements in the input vectors.

Returns:

final_merged_clusters – A list of all highest-weight cliques removed from the graph.

Return type:

list

dclustval.cluster.get_ordered_list_by_p(comp, p_mat_adj)

Generates a list of merged clusters by ordering component pairs by their p-values.

Parameters:
  • comp (list) – A list of components.

  • p_mat_adj (ndarray) – An adjacency matrix of p-values.

Returns:

merged_comps_list – A list of merged clusters, with each cluster represented as a list of its component labels.

Return type:

list

dclustval.cluster.get_recursive_cliques(G)

Finds and removes highest-weight cliques recursively from a network graph.

Parameters:

G (NetworkX graph) – The input network graph.

Returns:

final_out_mergers – A list of all highest-weight cliques removed from the graph.

Return type:

list

dclustval.cluster.get_weighted_cliques(G)

Finds and removes the highest-weight clique in a network graph.

Parameters:

G (NetworkX graph) – The input network graph.

Returns:

  • winner_clique (list) – The highest-weight clique in the input graph.

  • G (NetworkX graph) – The input graph with the highest-weight clique removed.