Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load STARsolo EM output into Seurat: element type 'float' not recognized #1397

Open
MarcusLCC opened this issue Nov 4, 2021 · 4 comments
Open

Comments

@MarcusLCC
Copy link

MarcusLCC commented Nov 4, 2021

Hi Alex,

Thanks for developing such an amazing tool.
I run STARsolo on the single cell RNAseq data.
Since I want to have more insights into two similar genes, I added --soloMultiMappers Uniform PropUnique EM Rescue to see how it goes.

However, when I tried to load the result into Seurat, I found out that the 'float' in the UniqueAndMult-EM.mtx.gz output by STARsolo might not be supported, since the error showed

> seurat = Read10X("../starsolo_out/ControlASolo.out/Gene/raw/EM/")
Error in readMM(file = matrix.loc) : element type 'float' not recognized

I tried with the new ReadSTARsolo function, but still get the same result.

> control_em = ReadSTARsolo("../starsolo_out/ControlASolo.out/Gene/raw/EM")
Error in readMM(file = all.files[["expression matrix"]]) : 
  element type 'float' not recognized

And here is the top rows in the file UniqueAndMult-EM.mtx.gz

%%MatrixMarket matrix coordinate float general
%
25110 6794880 18644153
9793 8 1
9979 16 1
5029 23 1
5031 23 1
6323 23 1
9780 23 1
11862 23 1
25101 23 1
21038 33 1
22510 50 1
5029 55 1
6308 55 1
5027 62 1
5028 71 0.5
5031 71 0.5
5029 73 1.99219
5032 73 0.0078125
13923 73 1
22727 73 1.99219
22729 73 0.0078125

The Barcodes.stat is:

                                        nNoAdapter              0
                                            nNoUMI              0
                                             nNoCB              0
                                            nNinCB             41
                                           nNinUMI        1408113
                                   nUMIhomopolymer          93279
                                          nTooMany              0
                                          nNoMatch        7936486
                               nMismatchesInMultCB              0
                                       nExactMatch      272044289
                                    nMismatchOneWL        3242653
                                 nMismatchToMultWL       13643072

And the command I use to run STARsolo is:

        STAR \
        --genomeDir ${star_ref} \
        --readFilesIn ${fastq_input_dir}/${s}_*_R2_*.fastq.gz ${fastq_input_dir}/${s}_*_R1_*.fastq.gz \
        --readFilesCommand zcat \
        --soloType CB_UMI_Simple \
        --soloCBwhitelist ${barcode_whitelist_3Pv3} \
        --soloUMIlen 12 \
        --clipAdapterType CellRanger4 --outFilterScoreMin 30 \
        --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
        --soloUMIfiltering - \
        --soloUMIdedup 1MM_CR \
        --soloMultiMappers Uniform PropUnique EM Rescue \
        --outFileNamePrefix ${out_dir}/${s} \
        --runThreadN ${thread}

I wonder if the STARsolo result with --soloMultiMappers parameters is compatible with Seurat? Is there a way I can get through this error?

Much appreciated,

Marcus

@alexdobin
Copy link
Owner

Hi Marcus,

the multimappers are included probabilistically, that's why the "counts" become non-integer.
Ideally, the downstream software should import real numbers, since at some point the counts are normalized and become real numbers. It would be good to ask Seurat developers for such a feature.
In the meantime you can simply round the values to integers, e.g.
awk '{if (NR>3) $3=int($3+0.5); print}' matrix.mtx > matrix_rounded.mtx
Note, that you will also need to perform cell filtering (calling) on the rounded matrix, since at the moment multimap options only output raw matrix.

Cheers
Alex

@davhum
Copy link

davhum commented Nov 13, 2021

Timely thread as had similar issue today.
Small adjustment to Alex's short term solution which also amends mtx header:

awk '{if (NR>3) $3=int($3+0.5); print}' matrix.mtx | sed 's/float/integer/' > matrix_rounded.mtx

Cheers,
D

@bugneville
Copy link

bugneville commented Nov 14, 2021

I brought this up in #1369 -- you appear to be able to simply change the header at the top of the file from float to real with no obvious negative impact?:
change
%%MatrixMarket matrix coordinate float general
to
%%MatrixMarket matrix coordinate real general

This way, you do not need to round the inputs.

@alexdobin
Copy link
Owner

Hi David, @bugneville

thanks for the correction. The matrix market format indeed specifies "real" not "float" label.
I will fix it in the next release.

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants