-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request (Alevin): add rhapsody barcode mode #628
Comments
Just in case it helps, I've written a script to splice out cell barcode linker sequences and shift them to before the polyA. In the process of doing this, it also does a 2-distance hamming correction of cell barcode and linker regions. All operations assume there are no INDELs: https://gitlab.com/gringer/bioinfscripts/-/blob/master/synthSquish.pl [usual disclaimers apply: I cannot guarantee that this works; use at your own risk] This script could be used as a stop-gap measure to pre-process Rhapsody reads for use with Alevin via the undocumented custom length settings [--end 5, --barcodeLength 27, --umiLength 8] |
After discovering the alternative geometry format, I see that unmodified Rhapsody reads should have the following settings:
There's a bit of a challenge regarding error correction for the cell barcodes, in that they should be corrected in batches of 9 nucleotides (into 96 clusters of the most commonly-seen sequences). |
Experimental BD Rhapsody Support based on COMBINE-lab/salmon#628 (comment)
Rhapsody has introduced a new, shorter cell barcode specification to work with 51bp R1, which looks like this:
The linker sequences are as follows: L1: In other words, --umi-geometry '1[36-43]' --bc-geometry '1[1-9,14-22,27-35]' --read-geometry '2[1-end]' However... [update] In order to remove the need for Lambda spike-ins on Illumina runs, Rhapsody has included a 0-3bp cell barcode prefix, where either nothing, or
This means that the regions defined in the geometry specification above can appear up to 3bp away from their expected region. I've updated my barcode squishing script (here) to account for this. The script identifies the cell barcode regions, corrects cell barcode sequences according to the Rhapsody Bioinformatics manual, and then shifts the linker sequences to after the UMI region, i.e.:
[The prefix sequence is discarded] After using this script to pre-process R1, with both the old and new cell barcode format (both use 9x9x9 cell barcodes), the following geometry can be used for --umi-geometry '1[28-35]' --bc-geometry '1[1-27]' --read-geometry '2[1-end]' I've attached files containing the 96 barcodes from each region from my most recent Rhapsody single cell sequencing run (with 51bp R1 reads). These were collected by processing reads 2M-12M from R1 of one of our files, and choosing the most abundant sequences:
cell_barcodes_BC1.txt |
It looks like #734 would allow this barcode method to be specified directly: Long original read geometry - |
There has been a small amount of discussion about the BD Rhapsody barcode / sequence format (e.g. see here), but it would be great if the option could be integrated into the code.
BD has produced a Single Cell Genomics Bioinformatics Handbook which has the following information about the R1 read structure on pg 14:
In other words...
R2 reads are transcript-only, and are expected to match a transcript's forward strand, with matches starting within the first five nucleotides (and not match PhiX174).
To Reproduce
Using Salmon v1.4.0, installed via Debian / apt:
Expected behavior
Desktop
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux bullseye/sid
Release: unstable
Codename: sid
Additional context
This is my first run with BD Rhapsody data (and our own single cell data, for that matter). We're currently using SevenBridges for generating gene count tables, but I don't like the black-box nature of that service. I'm much more comfortable when I know what's going on under the hood, and can tweak things when I notice oddness.
The text was updated successfully, but these errors were encountered: