-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ID in all.cell.annotation.meta.txt does not correspond to Barcodes in GEO .h5 files #6
Comments
Hi Jeremy, We used Seurat to integrate the samples. Since the same barcode exists in different samples, so Seurat removed the _1 in the original barcode and add a suffix serially based on the order of the sample integrated (HC1 HC2 HC3 HC4 M1 M2 M3 S2 S1 S3 S4 S5 S6). So you see the barcode of S2 end with _8. So to match the barcode, you can just remove the suffix in metadata and barcode of original data (ID_new), then use ID_new and sample to match the data. |
@polojacky I would like to double-check with you on the mapping from samples to suffix (batch) added to the cell barcodes. From your
However, I found that the smallest batches are batch 1/2 instead after I re-quantify the raw fastq files with kb-python.
The mapping from SRR run IDs to the batches are based on this file
This is obviously a mismatch. for example, how came your |
The metadata in all.cell.annotation.meta.txt is correct. The sample quality is good for healthy controls in our previous experiment so you can see batch 1/2 get many cells (8000+). And for M3, the sample quality is not so good and we harvest fewer cells in our experiment. We used cellranger to quantify the gene expression. I don't know if there is some discrepancy between the two tools. |
Hi folks,
I am interested in analyzing the single cell expression of the Epithelial cells from your severe and healthy controls, but the ID in the
all.cell.annotation.met.txt
does not appear to correspond to the Barcodes in the data files on GEO.For example:
<class 'str'> genome
Note that there is an underscore with the metadata ID, but a hyphen with the S2 barcodes.
However, even if I switch the metadata to hyphen, I still am unable to align the IDs with the barcodes from the s2 dataset for all but 45 barcodes:
(3716, 65813)
Note that the number of s2 barcodes is much smaller than the number of all barcodes, which is as it should be. However, when I check to see how many s2 raw columns there are, there is barely any overlap:
45
When I read the HC4 datafile I had a similar problem:
(737280, 65813)
Note that in this case, the number of HC4 barcodes dwarfs the number of all barcodes.
but there still isn't much of an overlap:
8454
Please advise.
The text was updated successfully, but these errors were encountered: