Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregating functions and NAs #33

Open
TuomasBorman opened this issue Nov 11, 2024 · 2 comments
Open

Aggregating functions and NAs #33

TuomasBorman opened this issue Nov 11, 2024 · 2 comments

Comments

@TuomasBorman
Copy link

Hi,

Functions like mean(), sum(), and others typically include an na.rm parameter, which allows you to control whether NA values should be excluded from calculations. However, it appears that aggregation functions in scuttle do not provide an option for excluding NA values. While one could convert NA values to 0, this approach is not appropriate for calculating the mean.

Could an na.rm option be added to handle this case?


library(scuttle)
sce <- mockSCE()[1:5, 1:2]
ids <- c("A", "B", "A", "C", "D")
assay(sce)[1, ] <- NA
sumCountsAcrossFeatures(assay(sce), ids)
  Cell_001 Cell_002
A       NA       NA
B      111       98
C      260      233
D       64       70
summarizeAssayByGroup(sce, c("A", "A")) |> assay()
              A
Gene_0001    NA
Gene_0002 104.5
Gene_0003   5.0
Gene_0004 246.5
Gene_0005  67.0
@LTLA
Copy link
Owner

LTLA commented Nov 12, 2024

Hm... it's very unusual to have NA in a single-cell count matrix. In fact, I don't think I've ever seen an NA in any assay matrix for any technology I've worked on.

I suppose it's possible to add an na.rm= option, but it's not a scenario that I've ever encountered, so I'm afraid it's not a high priority for me at the moment. If you want to give it a crack, I'm happy to consider a PR. Otherwise, you might consider some more expedient solutions:

  • Just filter out rows with NAs, if they are going to be all-NA rows.
  • Use DelayedArray::rowsum, which does have a na.rm= option.

@TuomasBorman
Copy link
Author

NAs might occur, for instance, when datasets are merged. I think na.rm option would be good addition. I can create a PR, I will come back to you this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants