Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

largeVis embedding does not utilize n.cores argument #129

Closed
rrydbirk opened this issue Sep 19, 2022 · 5 comments
Closed

largeVis embedding does not utilize n.cores argument #129

rrydbirk opened this issue Sep 19, 2022 · 5 comments

Comments

@rrydbirk
Copy link

Running embedGraph(method = "largeVis", n.cores > 1) only induces estimation of embedding using 1 core.

@rrydbirk rrydbirk added the invalid This doesn't seem right label Sep 19, 2022
@evanbiederstedt evanbiederstedt removed the invalid This doesn't seem right label Sep 19, 2022
@evanbiederstedt
Copy link
Collaborator

evanbiederstedt commented Sep 19, 2022

Hi @rrydbirk

Have you tried setting the n.cores in the Conos object?

i.e. con <- Conos$new(panel.preprocessed, n.cores>1)

#' @field n.cores number of cores

That's fed here:

self$embedding <- embedGraphUmap(self$graph, verbose=verbose, return.all=FALSE, n.cores=self$n.cores, target.dims=target.dims, ...)

self$embedding <- embedGraphUmap(self$graph, verbose=verbose, return.all=FALSE, 
    n.cores=self$n.cores, target.dims=target.dims, ...)

You'll see most of the methods in the Conos class expect that multiple cores will be accessible via that constructor, i.e. check the file here for self$n.cores

I guess the revised logic to allow users to modify the n.cores is to add this to each of the class methods....I could do that.

@evanbiederstedt
Copy link
Collaborator

evanbiederstedt commented Sep 19, 2022

I guess the revised logic to allow users to modify the n.cores is to add this to each of the class methods....I could do that.

On second thought, this logic is too contorted.

We don't explicitly mention the n.cores arguments for the functions; we only do this for the Conos class.

If users want to change the number of cores, it's straightforward to do:

## n.cores = 1
 con <- Conos$new(panel.preprocessed, n.cores=1)
 print(con)
## outputs below
<Conos>
 Public:
   addSamples: function (x, replace = FALSE, verbose = FALSE) 
   buildGraph: function (k = 15, k.self = 10, k.self.weight = 0.1, alignment.strength = NULL, 
   clone: function (deep = FALSE) 
   clusters: list
   correctGenes: function (genes = NULL, n.od.genes = 500, fading = 10, fading.const = 0.5, 
   embedding: NULL
   embeddings: list
   embedGraph: function (method = "largeVis", embedding.name = method, M = 1, 
   expression.adj: list
   findCommunities: function (method = leiden.community, min.group.size = 0, name = NULL, 
   getClusterCountMatrices: function (clustering = NULL, groups = NULL, common.genes = TRUE, 
   getDatasetPerCell: function () 
   getDifferentialGenes: function (clustering = NULL, groups = NULL, z.threshold = 3, 
   getJointCountMatrix: function (raw = FALSE) 
   graph: NULL
   initialize: function (x, ..., n.cores = parallel::detectCores(logical = FALSE), 
   misc: list
   n.cores: 1
   override.conos.plot.theme: FALSE
   pairs: list
   plotClusterStability: function (clustering = NULL, what = "all") 
   plotGraph: function (color.by = "cluster", clustering = NULL, embedding = NULL, 
   plotPanel: function (clustering = NULL, groups = NULL, colors = NULL, gene = NULL, 
   propagateLabels: function (labels, method = "diffusion", ...) 
   samples: list
 Private:
   adjustTheme: function (theme) 
   getEmbedding: function (embedding) 
   updatePairs: function (space = "PCA", data.type = "counts", ncomps = 50, n.odgenes = 1000, 
   ```
   
   ```
## n.cores = 20
con <- Conos$new(panel.preprocessed, n.cores=1)
print(con)
## outputs below
<Conos>
Public:
  addSamples: function (x, replace = FALSE, verbose = FALSE) 
  buildGraph: function (k = 15, k.self = 10, k.self.weight = 0.1, alignment.strength = NULL, 
  clone: function (deep = FALSE) 
  clusters: list
  correctGenes: function (genes = NULL, n.od.genes = 500, fading = 10, fading.const = 0.5, 
  embedding: NULL
  embeddings: list
  embedGraph: function (method = "largeVis", embedding.name = method, M = 1, 
  expression.adj: list
  findCommunities: function (method = leiden.community, min.group.size = 0, name = NULL, 
  getClusterCountMatrices: function (clustering = NULL, groups = NULL, common.genes = TRUE, 
  getDatasetPerCell: function () 
  getDifferentialGenes: function (clustering = NULL, groups = NULL, z.threshold = 3, 
  getJointCountMatrix: function (raw = FALSE) 
  graph: NULL
  initialize: function (x, ..., n.cores = parallel::detectCores(logical = FALSE), 
  misc: list
  n.cores: 20
  override.conos.plot.theme: FALSE
  pairs: list
  plotClusterStability: function (clustering = NULL, what = "all") 
  plotGraph: function (color.by = "cluster", clustering = NULL, embedding = NULL, 
  plotPanel: function (clustering = NULL, groups = NULL, colors = NULL, gene = NULL, 
  propagateLabels: function (labels, method = "diffusion", ...) 
  samples: list
Private:
  adjustTheme: function (theme) 
  getEmbedding: function (embedding) 
  updatePairs: function (space = "PCA", data.type = "counts", ncomps = 50, n.odgenes = 1000, 

Please check if this works @rrydbirk. I could try to clarify in the documents.

@rrydbirk
Copy link
Author

Hi @evanbiederstedt
I set n.cores=100 as default. UMAP embedding uses this, but largeVis doesn't (I track this using htop). I only saw this after I started to embed large graphs and therefore became inpatient.

As I see it, the n.cores parameter should be passed:
From $embedGraph():

if (method == 'largeVis') {
        wij <- as_adj(self$graph,attr='weight')
        if(!is.na(perplexity)) {
          wij <- buildWijMatrix(wij,perplexity=perplexity, threads=self$n.cores)
        }
        coords <- projectKNNs(wij = wij, dim=target.dims, verbose = verbose,sgd_batches = sgd_batches,gamma=gamma, M=M, seed=seed, alpha=alpha, rho=1, **threads=self$n.cores**)
        colnames(coords) <- V(self$graph)$name
        self$embedding <- t(coords)
        embedding.result <- self$embedding
      }

From projectKNNs():

  coords <- sgd(coords,
                targets_i = is,
                sources_j = js,
                ps = wij@p,
                weights = wij@x,
                alpha = as.double(alpha),
                gamma = as.double(gamma),
                M = as.integer(M),
                rho = as.double(rho),
                n_samples = sgd_batches,
                momentum = momentum,
                useDegree = as.logical(useDegree),
                seed = seed,
                **threads = threads**,
                verbose = as.logical(verbose))

From sgd():
.Call('_conos_sgd', PACKAGE = 'conos', coords, targets_i, sources_j, ps, weights, gamma, rho, n_samples, M, alpha, momentum, useDegree, seed, **threads**, verbose)

From largeVis.cpp:

rma::mat sgd(arma::mat& coords,
              arma::ivec& targets_i, // vary randomly
              arma::ivec& sources_j, // ordered
              arma::ivec& ps, // N+1 length vector of indices to start of each row j in vector is
              arma::vec& weights, // w{ij}
              const double& gamma,
              const double& rho,
              const arma::uword& n_samples,
              const int& M,
              const double& alpha,
              const Rcpp::Nullable<Rcpp::NumericVector> momentum,
              const bool& useDegree,
              const Rcpp::Nullable<Rcpp::NumericVector> seed,
              **const Rcpp::Nullable<Rcpp::IntegerVector> threads**,
              const bool verbose)

I'm not that experienced with C++ code, so I lose track of the threads parameter hereafter. Could you doublecheck if it's working for you?

@rrydbirk rrydbirk reopened this Sep 20, 2022
@evanbiederstedt
Copy link
Collaborator

Long story short, I'll need to play with the OpenMP directives here:

https://github.com/kharchenkolab/conos/blob/main/src/largeVis.cpp#L210-L280

The parallelization is here:

#ifdef _OPENMP
#pragma omp parallel for schedule(static) 
#endif
	for (iterationtype eIdx = 0; eIdx < barrier; eIdx += batchSize) if (progress.increment(batchSize)) {
		(*v)(eIdx, batchSize);
	}
#ifdef _OPENMP
#pragma omp barrier
#endif
	for (iterationtype eIdx = barrier; eIdx < n_samples; eIdx += batchSize) if (progress.increment(batchSize)) (*v)(eIdx, batchSize);
	delete v;

@evanbiederstedt
Copy link
Collaborator

evanbiederstedt commented Sep 29, 2022

I resolved this with some modifications to the openmp directives, i.e.

	Progress progress(n_samples, verbose);
	const unsigned int batchSize = 8192;
	const iterationtype barrier = (n_samples * .99 < n_samples - coords.n_cols) ? n_samples * .99 : n_samples - coords.n_cols;

#ifdef _OPENMP
#pragma omp barrier
 	if (threads.isNotNull()) {
		int nthreads = IntegerVector(threads)[0];
		if (nthreads > 0) {
			omp_set_num_threads(nthreads);
		}
	}	

        iterationtype eIdx;
#pragma omp parallel for private(eIdx) schedule(static) 
	for (eIdx = 0; eIdx < barrier; eIdx += batchSize) if (progress.increment(batchSize)) {
		(*v)(eIdx, batchSize);
	}
#else 
	for (eIdx = 0; eIdx < barrier; eIdx += batchSize) if (progress.increment(batchSize)) {
		(*v)(eIdx, batchSize)
	}
#endif

#ifdef _OPENMP
#pragma omp barrier
#endif

	for (iterationtype eIdx = barrier; eIdx < n_samples; eIdx += batchSize) if (progress.increment(batchSize)) (*v)(eIdx, batchSize);

	delete v;
	return coords;
};

I'll include in the next version release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants