Skip to content

Commit

Permalink
Patch up issues with vignettes.
Browse files Browse the repository at this point in the history
  • Loading branch information
AnthonyChristidis committed Sep 20, 2024
1 parent 049d921 commit 3612d67
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 25 deletions.
22 changes: 10 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ To install the release version of the package from Bioconductor use the followin
BiocManager::install("scDiagnostics")
```

NOTE: you will need the [**BiocManager**](https://cran.r-project.org/web/packages/BiocManager/index.html) package to install from GitHub.
NOTE: you will need the [**BiocManager**](https://cran.r-project.org/web/packages/BiocManager/index.html) package to install from Bioconductor.

To build the package vignettes upon installation use:

Expand All @@ -31,35 +31,33 @@ BiocManager::install("scDiagnostics",
To install the development version of the package from GitHub use the following command:

``` r
remotes::install_github("ccb-hms/scDiagnostics")
BiocManager::install("ccb-hms/scDiagnostics")
```

NOTE: you will need the [**remotes**](https://cran.r-project.org/web/packages/remotes/index.html) package to install from GitHub.

To build the package vignettes upon installation use:

``` r
remotes::install_github("ccb-hms/scDiagnostics",
build_vignettes = TRUE,
dependencies = TRUE)
BiocManager::install("ccb-hms/scDiagnostics",
build_vignettes = TRUE,
dependencies = TRUE)
```

# Usage

You may browse the [**scDiagnostics website**](https://ccb-hms.github.io/scDiagnostics/) website for an overview of the functionality of the package. The complete documentation of each available function in `scDiagnostics`, which includes implementation details and working examples, is available in the [**reference tab**](https://ccb-hms.github.io/scDiagnostics/reference/index.html).
You may browse the [**scDiagnostics website**](https://ccb-hms.github.io/scDiagnostics/) website for an overview of the functionality of the package. The individual documentation of each available function in `scDiagnostics`, which includes usage details and executable examples, is available in the [**reference tab**](https://ccb-hms.github.io/scDiagnostics/reference/index.html).

## Key Features

To get an overview of the main functionality of the scDiagnostics package, refer to the [**Getting Started with scDiagnostics**](https://ccb-hms.github.io/scDiagnostics/articles/scDiagnostics.html) vignette. The links below direct you to vignettes that explore functions organized around a common theme or purpose, highlighting how they interact or complement each other in specific contexts.

- [**Visualization of Cell Type Annotations**](https://ccb-hms.github.io/scDiagnostics/articles/VisualizationTools.html): Provides graphical representations of cell type annotations from both the query and reference datasets. This visualization enables a direct comparison of how cell types are distributed and annotated in each dataset, highlighting any significant differences or similarities in the cell type classification across datasets.
- [**Visualization of Cell Type Annotations**](https://ccb-hms.github.io/scDiagnostics/articles/VisualizationTools.html): Illustrates the distributions of cell type annotations of the query and reference dataset, allowing the user to identify potential differences in the cell type composition between datasets.

- [**Evaluation of QC and Annotation Scores**](https://ccb-hms.github.io/scDiagnostics/articles/QCandAnnotationScores.html): Assesses the quality control (QC) metrics and annotation scores for cells in the query and reference datasets. This involves evaluating various QC parameters and comparing annotation scores to ensure consistency and accuracy. By analyzing these scores, researchers can identify potential issues in data quality and annotation reliability, facilitating improvements in data preprocessing and annotation accuracy. This evaluation helps in ensuring that both datasets meet the necessary quality standards for robust and reliable cell type analysis.
- [**Evaluation of QC and Annotation Scores**](https://ccb-hms.github.io/scDiagnostics/articles/QCandAnnotationScores.html): Provides functionality for assessing the impact of frequently used QC criteria on the cell type annotation confidence, allowing the user to identify systematic relationships between QC metrics and the predicted cell type categories.

- [**Evaluation of Dataset and Marker Gene Alignment**](https://ccb-hms.github.io/scDiagnostics/articles/DatasetAlignment.html): Evaluates the alignment of datasets and marker gene expressions between the query and reference datasets. This often involves projecting query data onto principal components derived from the reference dataset and comparing these projections. In addition, by examining how well the marker genes align across datasets, researchers can identify misalignments or deviations that may indicate batch effects or inconsistencies in cell type annotations, facilitating more accurate dataset integration and interpretation.
- [**Evaluation of Dataset and Marker Gene Alignment**](https://ccb-hms.github.io/scDiagnostics/articles/DatasetAlignment.html): Provides functionality for assessing dataset alignment through quantitative comparison of query-to-reference projections in reduced dimension space. Additional functionality for assessing marker gene expression across datasets allows the user to identify potential misalignments between reference and query on the level of individual genes.

- [**Statistical Measures to Assess Dataset Alignment**](https://ccb-hms.github.io/scDiagnostics/articles/StatisticalMeasures.html): Utilizes statistical tests and metrics to quantitatively assess the alignment between the query and reference datasets. Key measures include p-values from Hotelling's T-squared test and Cramer's V statistic, which evaluate the degree of similarity or dissimilarity in cell type distributions and principal component projections. These statistical assessments help in determining the robustness of dataset alignment and highlight any significant differences that may impact the reliability of cell type annotations.

- [**Detection of Annotation Anomalies**](https://ccb-hms.github.io/scDiagnostics/articles/AnnotationAnomalies.html): Focuses on identifying inconsistencies or anomalies in cell type annotations between the query and reference datasets. This involves comparing expert-generated annotations with those produced by automated methods. By highlighting potential errors or discrepancies, this feature aids in refining and improving the accuracy of cell type classifications, ensuring that annotation results are reliable and accurate.
- [**Detection of Annotation Anomalies**](https://ccb-hms.github.io/scDiagnostics/articles/AnnotationAnomalies.html): Focuses on identifying inconsistencies or anomalies in cell type annotations between the query and reference datasets through comparison of expert annotations with annotations derived from automated methods. By highlighting discrepancies that could be indicative of potential errors, this feature aids in refining and improving the accuracy and reliability of cell type classifications.

- [**Analysis of Distances Between Specific Cells and Cell Populations**](https://ccb-hms.github.io/scDiagnostics/articles/CellDistancesDiagnostics.html): Calculates distances or similarities between individual cells and predefined cell types in both the query and reference datasets. This analysis helps determine how closely each cell in the query dataset matches the cell types defined in the reference dataset. By providing insights into cell type classification accuracy and identifying potential mismatches, this functionality supports more precise annotation and aids in detecting areas that may require further investigation or refinement.
2 changes: 1 addition & 1 deletion vignettes/AnnotationAnomalies.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ The function also provides detailed visualizations and statistical outputs to he

### Parameters

The function takes a `r `BiocStyle::Biocpkg("SingleCellExperiment")` object as `reference_data` and trains an isolation forest model on the reference PCA-projected data, with an optional `query_data` for projecting onto this PCA space for anomaly detection. You can specify cell type annotations through `ref_cell_type_col` and `query_cell_type_col`, and limit the analysis to certain cell types using the `cell_types` parameter. The function allows you to select specific principal components to use to train the isolation forest via `pc_subset`, adjust the number of trees with `n_tree`, and set an `anomaly_threshold` for classifying anomalies.
The function takes a `r BiocStyle::Biocpkg("SingleCellExperiment")` object as `reference_data` and trains an isolation forest model on the reference PCA-projected data, with an optional `query_data` for projecting onto this PCA space for anomaly detection. You can specify cell type annotations through `ref_cell_type_col` and `query_cell_type_col`, and limit the analysis to certain cell types using the `cell_types` parameter. The function allows you to select specific principal components to use to train the isolation forest via `pc_subset`, adjust the number of trees with `n_tree`, and set an `anomaly_threshold` for classifying anomalies.


### Return Value
Expand Down
3 changes: 2 additions & 1 deletion vignettes/CellDistancesDiagnostics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,8 @@ distance_data <- calculateCellDistances(
```

In the code above:
- `query_data` and reference_data: These are `r `BiocStyle::Biocpkg("SingleCellExperiment")` objects containing the respective datasets for analysis.

- `query_data` and reference_data: These are `r BiocStyle::Biocpkg("SingleCellExperiment")` objects containing the respective datasets for analysis.
- `query_cell_type_col` and ref_cell_type_col: These arguments specify the columns in the `colData` of each dataset that contain cell type annotations.
- `pc_subset`: Specifies which principal components (1 to 10) are used to compute distances. PCA is applied for dimensionality reduction before calculating distances.

Expand Down
4 changes: 2 additions & 2 deletions vignettes/DatasetMarkerGeneAlignment.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ data("query_data")
set.seed(0)
```

Some functions in the vignette are designed to work with `r `BiocStyle::Biocpkg("SingleCellExperiment")` objects that contain data from only one cell type. We will create separate `r `BiocStyle::Biocpkg("SingleCellExperiment")` objects that only CD4 cells, to ensure compatibility with these functions.
Some functions in the vignette are designed to work with `r BiocStyle::Biocpkg("SingleCellExperiment")` objects that contain data from only one cell type. We will create separate `r BiocStyle::Biocpkg("SingleCellExperiment")` objects that only CD4 cells, to ensure compatibility with these functions.
```{r, message=FALSE, fig.show='hide'}
# Load library
library(scran)
Expand Down Expand Up @@ -232,7 +232,7 @@ The `plotPairwiseDistancesDensity()` function is designed to calculate and visua

### Functionality

The function operates on `r `BiocStyle::Biocpkg("SingleCellExperiment")` objects, which are commonly used to store single-cell data, including expression matrices and associated metadata. Users specify the cell types of interest in both the query and reference datasets, and the function computes either the distances or correlation coefficients between these cells.
The function operates on `r BiocStyle::Biocpkg("SingleCellExperiment")` objects, which are commonly used to store single-cell data, including expression matrices and associated metadata. Users specify the cell types of interest in both the query and reference datasets, and the function computes either the distances or correlation coefficients between these cells.

When principal component analysis (PCA) is applied, the function projects the expression data into a lower-dimensional PCA space, which can be specified by the user. This allows for a more focused analysis of the major sources of variation in the data. Alternatively, if no dimensionality reduction is desired, the function can directly use the expression data for computation.

Expand Down
14 changes: 5 additions & 9 deletions vignettes/scDiagnostics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ To install the release version of the package from Bioconductor use the followin
BiocManager::install("scDiagnostics")
```

NOTE: you will need the `r BiocStyle::CRANpkg("BiocManager")` package to install from GitHub.
NOTE: you will need the `r BiocStyle::CRANpkg("BiocManager")` package to install from Bioconductor

To build the package vignettes upon installation use:

Expand All @@ -82,19 +82,15 @@ To install the development version of the package from Github, use the
following command:

```{r, eval = FALSE, fig.show='hide'}
remotes::install_github("ccb-hms/scDiagnostics")
BiocManager::install("ccb-hms/scDiagnostics")
```

NOTE: you will need the
`r BiocStyle::Biocpkg("remotes")`
package to install from GitHub.

To build the package vignettes upon installation use:

```{r, eval=FALSE, fig.show='hide'}
remotes::install_github("ccb-hms/scDiagnostics",
build_vignettes = TRUE,
dependencies = TRUE)
BiocManager::install("ccb-hms/scDiagnostics",
build_vignettes = TRUE,
dependencies = TRUE)
```

Once you have installed the package, you can load it with the following code:
Expand Down

0 comments on commit 3612d67

Please sign in to comment.