Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when starting analysis #18

Closed
diabatem opened this issue Oct 12, 2018 · 13 comments
Closed

Error when starting analysis #18

diabatem opened this issue Oct 12, 2018 · 13 comments

Comments

@diabatem
Copy link

Hello, I am attempting to use Enrich for barcode counting and every time I start the analysis I get the same error messages "Enrich2 encountered an error: 'No object named /main/barcodes/counts in the file'" and I do not know how to eliminate this error. I have also tried updating the version of the program but that did not get rid of the problem.

@afrubin
Copy link
Member

afrubin commented Oct 13, 2018

Make sure you're not performing the analysis in a directory that already has Enrich2 output files in it, such as from an analysis attempt that failed. The program tries to read from existing HDF5 files if present, and if those don't contain the required information (such as the /main/barcodes/counts data frame) you would see an error like this.

@diabatem
Copy link
Author

The output directory does not contain any output files within it and I changed the output directory completely and still received the same error message.

@afrubin
Copy link
Member

afrubin commented Oct 13, 2018

It sounds like there is some mismatch between your configuration and the FASTQ files you are processing. Can you post your config file and a sample of your data?

@diabatem
Copy link
Author

Here is the json. The fastq files are too large to upload in this comment.
ctermBRCA1 copy.zip

@afrubin
Copy link
Member

afrubin commented Oct 15, 2018

I see that your minimum barcode count is set to 4000, which is pretty high. It's possible that you are unintentionally discarding all of your barcodes because you don't have sufficient sequencing depth. Try setting the minimum to something much smaller and re-running to see if that fixes the problem.

@diabatem
Copy link
Author

I reduced the minimum barcode count to 50 and the error still persists on.

@afrubin
Copy link
Member

afrubin commented Oct 15, 2018

It's possible that the barcodes are being counted but that there's an issue with trimming or otherwise mapping them to your barcode map.

What is the name of the file being processed when Enrich2 throws the error? Does it end with _lib.h5?

If so, please try running the following code, either as a standalone script or in an interactive shell after substituting the correct file path:

import pandas as pd

store = pd.HDFStore("/path/to/seqlib/hdf5/file.h5")

print("Counted {} unique barcodes.".format(len(store['/raw/barcodes/counts'])))
print("")
print("Ten most abundant barcodes:")
print(store['/raw/barcodes/counts'][:10])

@diabatem
Copy link
Author

The program creates all the normal files .h5 and this is what I see before the error appears.
screen shot 2018-10-16 at 12 03 37 pm

When I ran the standalone script, this is the error I was given.
screen shot 2018-10-16 at 12 17 55 pm

@afrubin
Copy link
Member

afrubin commented Oct 17, 2018

Please run the standalone script again on the HDF5 file ending with _lib.h5 and let me know what happens.

@diabatem
Copy link
Author

I ran the code on the lib.h5 files and got the same error.
screen shot 2018-10-17 at 12 25 00 pm

@afrubin
Copy link
Member

afrubin commented Oct 18, 2018

Let me explain what is happening in this log that you posted:

The program creates all the normal files .h5 and this is what I see before the error appears.
screen shot 2018-10-16 at 12 03 37 pm

Enrich2 first creates empty HDF5 files for each part of the analysis (SeqLib, Selection, and Experiment). Then, it starts processing them one by one. It starts with the first Selection b_rep1 and processes the first time point 4-2brca1-.

We can see from the log that the program successfully counts the raw barcodes for the 4-brca1- time point, finding 2866779 total barcodes and storing them in /raw/barcodes/counts.

Then, it filters these barcodes by including only barcodes that are present in barcode-variant map file and stores these in /main/barcodes/counts. This step is unsuccessful, so the /main/barcodes/counts entry is not created, and the program crashes when trying to access it.

You have provided output from the standalone script on one of the other _lib.h5 files that has no output, because the program crashed before it was processed. Please run it again on the file that contains counts according to the log and send that output.

It is my suspicion that your barcode-variant map does not actually contain the same barcode sequences that Enrich2 is counting. This is usually caused by incorrect read trimming, so please double check that.

@diabatem
Copy link
Author

This is the error for the first rep. Your suspicion is correct and the barcode-variant map does not have the same barcode sequences that Enrich2 is counting.
screen shot 2018-10-18 at 11 20 19 am

@afrubin
Copy link
Member

afrubin commented Oct 18, 2018

I think that is the wrong screenshot, but regardless I'm glad that we were able to get to the bottom of this. I've added a new enhancement issue (#19) to improve the program's behavior in this situation.

@afrubin afrubin closed this as completed Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants