largeVis embedding does not utilize n.cores argument #129

rrydbirk · 2022-09-19T05:36:15Z

Running embedGraph(method = "largeVis", n.cores > 1) only induces estimation of embedding using 1 core.

The text was updated successfully, but these errors were encountered:

evanbiederstedt · 2022-09-19T17:50:08Z

Hi @rrydbirk

Have you tried setting the n.cores in the Conos object?

i.e. con <- Conos$new(panel.preprocessed, n.cores>1)

conos/R/conclass.R

Line 47 in 61d0004

#' @field n.cores number of cores

That's fed here:

conos/R/conclass.R

Line 703 in 61d0004

    
           self$embedding <- embedGraphUmap(self$graph, verbose=verbose, return.all=FALSE, n.cores=self$n.cores, target.dims=target.dims, ...)

self$embedding <- embedGraphUmap(self$graph, verbose=verbose, return.all=FALSE, 
    n.cores=self$n.cores, target.dims=target.dims, ...)

You'll see most of the methods in the Conos class expect that multiple cores will be accessible via that constructor, i.e. check the file here for self$n.cores

I guess the revised logic to allow users to modify the n.cores is to add this to each of the class methods....I could do that.

evanbiederstedt · 2022-09-19T18:07:56Z

I guess the revised logic to allow users to modify the n.cores is to add this to each of the class methods....I could do that.

On second thought, this logic is too contorted.

We don't explicitly mention the n.cores arguments for the functions; we only do this for the Conos class.

If users want to change the number of cores, it's straightforward to do:

## n.cores = 1
 con <- Conos$new(panel.preprocessed, n.cores=1)
 print(con)

## outputs below
<Conos>
 Public:
   addSamples: function (x, replace = FALSE, verbose = FALSE) 
   buildGraph: function (k = 15, k.self = 10, k.self.weight = 0.1, alignment.strength = NULL, 
   clone: function (deep = FALSE) 
   clusters: list
   correctGenes: function (genes = NULL, n.od.genes = 500, fading = 10, fading.const = 0.5, 
   embedding: NULL
   embeddings: list
   embedGraph: function (method = "largeVis", embedding.name = method, M = 1, 
   expression.adj: list
   findCommunities: function (method = leiden.community, min.group.size = 0, name = NULL, 
   getClusterCountMatrices: function (clustering = NULL, groups = NULL, common.genes = TRUE, 
   getDatasetPerCell: function () 
   getDifferentialGenes: function (clustering = NULL, groups = NULL, z.threshold = 3, 
   getJointCountMatrix: function (raw = FALSE) 
   graph: NULL
   initialize: function (x, ..., n.cores = parallel::detectCores(logical = FALSE), 
   misc: list
   n.cores: 1
   override.conos.plot.theme: FALSE
   pairs: list
   plotClusterStability: function (clustering = NULL, what = "all") 
   plotGraph: function (color.by = "cluster", clustering = NULL, embedding = NULL, 
   plotPanel: function (clustering = NULL, groups = NULL, colors = NULL, gene = NULL, 
   propagateLabels: function (labels, method = "diffusion", ...) 
   samples: list
 Private:
   adjustTheme: function (theme) 
   getEmbedding: function (embedding) 
   updatePairs: function (space = "PCA", data.type = "counts", ncomps = 50, n.odgenes = 1000, 
   ```
   
   ```
## n.cores = 20
con <- Conos$new(panel.preprocessed, n.cores=1)
print(con)

## outputs below
<Conos>
Public:
  addSamples: function (x, replace = FALSE, verbose = FALSE) 
  buildGraph: function (k = 15, k.self = 10, k.self.weight = 0.1, alignment.strength = NULL, 
  clone: function (deep = FALSE) 
  clusters: list
  correctGenes: function (genes = NULL, n.od.genes = 500, fading = 10, fading.const = 0.5, 
  embedding: NULL
  embeddings: list
  embedGraph: function (method = "largeVis", embedding.name = method, M = 1, 
  expression.adj: list
  findCommunities: function (method = leiden.community, min.group.size = 0, name = NULL, 
  getClusterCountMatrices: function (clustering = NULL, groups = NULL, common.genes = TRUE, 
  getDatasetPerCell: function () 
  getDifferentialGenes: function (clustering = NULL, groups = NULL, z.threshold = 3, 
  getJointCountMatrix: function (raw = FALSE) 
  graph: NULL
  initialize: function (x, ..., n.cores = parallel::detectCores(logical = FALSE), 
  misc: list
  n.cores: 20
  override.conos.plot.theme: FALSE
  pairs: list
  plotClusterStability: function (clustering = NULL, what = "all") 
  plotGraph: function (color.by = "cluster", clustering = NULL, embedding = NULL, 
  plotPanel: function (clustering = NULL, groups = NULL, colors = NULL, gene = NULL, 
  propagateLabels: function (labels, method = "diffusion", ...) 
  samples: list
Private:
  adjustTheme: function (theme) 
  getEmbedding: function (embedding) 
  updatePairs: function (space = "PCA", data.type = "counts", ncomps = 50, n.odgenes = 1000,

Please check if this works @rrydbirk. I could try to clarify in the documents.

rrydbirk · 2022-09-20T07:00:05Z

Hi @evanbiederstedt
I set n.cores=100 as default. UMAP embedding uses this, but largeVis doesn't (I track this using htop). I only saw this after I started to embed large graphs and therefore became inpatient.

As I see it, the n.cores parameter should be passed:
From $embedGraph():

if (method == 'largeVis') {
        wij <- as_adj(self$graph,attr='weight')
        if(!is.na(perplexity)) {
          wij <- buildWijMatrix(wij,perplexity=perplexity, threads=self$n.cores)
        }
        coords <- projectKNNs(wij = wij, dim=target.dims, verbose = verbose,sgd_batches = sgd_batches,gamma=gamma, M=M, seed=seed, alpha=alpha, rho=1, **threads=self$n.cores**)
        colnames(coords) <- V(self$graph)$name
        self$embedding <- t(coords)
        embedding.result <- self$embedding
      }

From projectKNNs():

  coords <- sgd(coords,
                targets_i = is,
                sources_j = js,
                ps = wij@p,
                weights = wij@x,
                alpha = as.double(alpha),
                gamma = as.double(gamma),
                M = as.integer(M),
                rho = as.double(rho),
                n_samples = sgd_batches,
                momentum = momentum,
                useDegree = as.logical(useDegree),
                seed = seed,
                **threads = threads**,
                verbose = as.logical(verbose))

From sgd():
.Call('_conos_sgd', PACKAGE = 'conos', coords, targets_i, sources_j, ps, weights, gamma, rho, n_samples, M, alpha, momentum, useDegree, seed, **threads**, verbose)

From largeVis.cpp:

rma::mat sgd(arma::mat& coords,
              arma::ivec& targets_i, // vary randomly
              arma::ivec& sources_j, // ordered
              arma::ivec& ps, // N+1 length vector of indices to start of each row j in vector is
              arma::vec& weights, // w{ij}
              const double& gamma,
              const double& rho,
              const arma::uword& n_samples,
              const int& M,
              const double& alpha,
              const Rcpp::Nullable<Rcpp::NumericVector> momentum,
              const bool& useDegree,
              const Rcpp::Nullable<Rcpp::NumericVector> seed,
              **const Rcpp::Nullable<Rcpp::IntegerVector> threads**,
              const bool verbose)

I'm not that experienced with C++ code, so I lose track of the threads parameter hereafter. Could you doublecheck if it's working for you?

evanbiederstedt · 2022-09-20T12:53:37Z

Long story short, I'll need to play with the OpenMP directives here:

https://github.com/kharchenkolab/conos/blob/main/src/largeVis.cpp#L210-L280

The parallelization is here:

#ifdef _OPENMP
#pragma omp parallel for schedule(static) 
#endif
	for (iterationtype eIdx = 0; eIdx < barrier; eIdx += batchSize) if (progress.increment(batchSize)) {
		(*v)(eIdx, batchSize);
	}
#ifdef _OPENMP
#pragma omp barrier
#endif
	for (iterationtype eIdx = barrier; eIdx < n_samples; eIdx += batchSize) if (progress.increment(batchSize)) (*v)(eIdx, batchSize);
	delete v;

evanbiederstedt · 2022-09-29T04:03:25Z

I resolved this with some modifications to the openmp directives, i.e.

	Progress progress(n_samples, verbose);
	const unsigned int batchSize = 8192;
	const iterationtype barrier = (n_samples * .99 < n_samples - coords.n_cols) ? n_samples * .99 : n_samples - coords.n_cols;

#ifdef _OPENMP
#pragma omp barrier
 	if (threads.isNotNull()) {
		int nthreads = IntegerVector(threads)[0];
		if (nthreads > 0) {
			omp_set_num_threads(nthreads);
		}
	}	

        iterationtype eIdx;
#pragma omp parallel for private(eIdx) schedule(static) 
	for (eIdx = 0; eIdx < barrier; eIdx += batchSize) if (progress.increment(batchSize)) {
		(*v)(eIdx, batchSize);
	}
#else 
	for (eIdx = 0; eIdx < barrier; eIdx += batchSize) if (progress.increment(batchSize)) {
		(*v)(eIdx, batchSize)
	}
#endif

#ifdef _OPENMP
#pragma omp barrier
#endif

	for (iterationtype eIdx = barrier; eIdx < n_samples; eIdx += batchSize) if (progress.increment(batchSize)) (*v)(eIdx, batchSize);

	delete v;
	return coords;
};

I'll include in the next version release

rrydbirk added the invalid This doesn't seem right label Sep 19, 2022

evanbiederstedt removed the invalid This doesn't seem right label Sep 19, 2022

evanbiederstedt closed this as completed Sep 19, 2022

rrydbirk reopened this Sep 20, 2022

evanbiederstedt closed this as completed Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

largeVis embedding does not utilize n.cores argument #129

largeVis embedding does not utilize n.cores argument #129

rrydbirk commented Sep 19, 2022

evanbiederstedt commented Sep 19, 2022 •

edited

Loading

evanbiederstedt commented Sep 19, 2022 •

edited

Loading

rrydbirk commented Sep 20, 2022

evanbiederstedt commented Sep 20, 2022

evanbiederstedt commented Sep 29, 2022 •

edited

Loading

largeVis embedding does not utilize n.cores argument #129

largeVis embedding does not utilize n.cores argument #129

Comments

rrydbirk commented Sep 19, 2022

evanbiederstedt commented Sep 19, 2022 • edited Loading

evanbiederstedt commented Sep 19, 2022 • edited Loading

rrydbirk commented Sep 20, 2022

evanbiederstedt commented Sep 20, 2022

evanbiederstedt commented Sep 29, 2022 • edited Loading

evanbiederstedt commented Sep 19, 2022 •

edited

Loading

evanbiederstedt commented Sep 19, 2022 •

edited

Loading

evanbiederstedt commented Sep 29, 2022 •

edited

Loading