Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range #31

Closed
cmonger opened this issue Sep 15, 2016 · 11 comments
Closed

IndexError: list index out of range #31

cmonger opened this issue Sep 15, 2016 · 11 comments
Assignees

Comments

@cmonger
Copy link

cmonger commented Sep 15, 2016

Hi all,

I am getting the following error message when trying to run DCC for use in FUCHs (with the suggested parameters)

DCC 0.4.4 started
Output folder ./ already exists, reusing
Temporary folder _tmp_DCC/ already exists, reusing
32 CPU cores available, using 2
Please make sure that the read pairs have been mapped both, combined and on a per mate basis
Collecting chimera information from mates-separate mapping
Traceback (most recent call last):
  File "/mnt/work/craig/.local/bin/DCC", line 9, in <module>
    load_entry_point('DCC==0.4.4', 'console_scripts', 'DCC')()
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 215, in main
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 489, in fixall
  File "build/bdist.linux-x86_64/egg/DCC/fix2chimera.py", line 80, in fixchimerics
  File "build/bdist.linux-x86_64/egg/DCC/fix2chimera.py", line 51, in fixmate2
IndexError: list index out of range

This seems to be a similar issue to #7

The command line used was:
~/.local/bin/DCC samplesheet.txt -mt1 mate1.txt -mt2 mate2.txt -D -R ../FUCHS/repeatsUCSC.gtf -an /mnt/work/index/cGriseus/ucsc_C_griseus_v1.0/ucsc_allmRNA_genenames.gtf -Pi -F -M -Nr 5 6 -fg -G -A /mnt/work/index/cGriseus/ucsc_C_griseus_v1.0/criGri1.fa

I have also tried calling the main script directly with:

python ../FUCHS/DCC-0.4.4/DCC/main.py samplesheet.txt -mt1 mate1.txt -mt2 mate2.txt -D -R ../FUCHS/repeatsUCSC.gtf -an /mnt/work/index/cGriseus/ucsc_C_griseus_v1.0/ucsc_allmRNA_genenames.gtf  -Pi -F -M -Nr 5 6 -fg -G -A /mnt/work/index/cGriseus/ucsc_C_griseus_v1.0/criGri1.fa
DCC 0.4.4 started

and get the same error.

I am running the software in a python environment (2.7.10) with

DCC==0.4.4
HTSeq==0.6.1
numpy==1.11.1
pandas==0.18.1
pysam==0.9.1.4
python-dateutil==2.5.3
pytz==2016.6.1
six==1.10.0
@tjakobi tjakobi added the bug label Sep 15, 2016
@tjakobi tjakobi added this to the DCC version 0.4.5 milestone Sep 15, 2016
@tjakobi tjakobi self-assigned this Sep 15, 2016
@tjakobi
Copy link
Contributor

tjakobi commented Sep 15, 2016

Thanks for the feedback @cmonger, I'll have a look at the error asap.

@tjakobi
Copy link
Contributor

tjakobi commented Sep 27, 2016

I obtained an internal data set producing the same error and will now try to fix the error.

@tjakobi
Copy link
Contributor

tjakobi commented Sep 28, 2016

Dear @cmonger , could you please checkout the latest commit (89762f5) and try again? The commit includes a check for junction files which has been the cause for the problem with our in-house data set.

@cmonger
Copy link
Author

cmonger commented Sep 29, 2016

git show
commit b64079b3a4d4964876c156fabd3a582c058a60c7
Merge: 89762f5 cb12793
Author: Tobias Jakobi <tobias.jakobi@med.uni-heidelberg.de>
Date:   Wed Sep 28 16:16:01 2016 +0200

    Merge branch 'master' of github.com:dieterich-lab/DCC

Using command:

~/.local/bin/DCC samplesheet.txt -mt1 mate1.txt -mt2 mate2.txt -D -R ../FUCHS/repeatsUCSC.gtf -an /mnt/work/index/cGriseus/ucsc_C_griseus_v1.0/ucsc_allmRNA_genenames.gtf -Pi -F -M -Nr 5 6 -fg -G -A /mnt/work/index/cGriseus/ucsc_C_griseus_v1.0/criGri1.fa

I get the same error:

`Output folder ./ already exists, reusing
Temporary folder _tmp_DCC/ already exists, reusing
DCC 0.4.4 started
32 CPU cores available, using 2
Please make sure that the read pairs have been mapped both, combined and on a per mate basis
Collecting chimera information from mates-separate mapping
Traceback (most recent call last):
  File "/mnt/work/craig/.local/bin/DCC", line 9, in <module>
    load_entry_point('DCC==0.4.4', 'console_scripts', 'DCC')()
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 218, in main
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 492, in fixall
  File "build/bdist.linux-x86_64/egg/DCC/fix2chimera.py", line 80, in fixchimerics
  File "build/bdist.linux-x86_64/egg/DCC/fix2chimera.py", line 51, in fixmate2
IndexError: list index out of range
`

I also inspected each chimeric junction file to make sure they are not corrupt and the paths specified in the samplesheet/mate files were correct.

@tjakobi
Copy link
Contributor

tjakobi commented Sep 29, 2016

Thank you very much for your response. I'll further look into that this issue then and try to provide a patch soon.

@tjakobi
Copy link
Contributor

tjakobi commented Sep 29, 2016

Would you mind checking out commit c7f822b? It contains a simple check to print out the line in case the parsing fails. This should help to track down the error.

@tjakobi tjakobi added the master label Sep 29, 2016
@cmonger
Copy link
Author

cmonger commented Sep 29, 2016

Besides changes to the line numbers in the error message, I still get the same error!

Output folder ./ already exists, reusing
Temporary folder _tmp_DCC/ already exists, reusing
DCC 0.4.4 started
32 CPU cores available, using 2
Please make sure that the read pairs have been mapped both, combined and on a per mate basis
Collecting chimera information from mates-separate mapping
Traceback (most recent call last):
  File "/mnt/work/craig/.local/bin/DCC", line 9, in <module>
    load_entry_point('DCC==0.4.4', 'console_scripts', 'DCC')()
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 220, in main
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 494, in fixall
  File "build/bdist.linux-x86_64/egg/DCC/fix2chimera.py", line 91, in fixchimerics
  File "build/bdist.linux-x86_64/egg/DCC/fix2chimera.py", line 62, in fixmate2
IndexError: list index out of range

@tjakobi
Copy link
Contributor

tjakobi commented Sep 29, 2016

Argh! Maybe it makes more sense if I check the correct variable... Please try again with commit 2155483. I'm sorry for the unnecessary run.

@cmonger
Copy link
Author

cmonger commented Sep 29, 2016

`WARNING: File mate2.txt, line 1 does not contain all features.
WARNING: mate2.txt is probably corrupt.
WARNING: Offending line: /mnt/work/craig/FUCHS/analysis/37DegreesRep1/mate2/37DegreesRep1_2Chimeric.out.junction
`

I am unsure why this error is occurring as the file does has 14 fields! I will tar up the chimeric junction files etc. so you can instigate further. Ill be in touch by email shortly with a download link.

@tjakobi
Copy link
Contributor

tjakobi commented Sep 29, 2016

Hehe. After looking closer at the command line you supplied I spotted the error:

-mt1 mate1.txt -mt2 mate2.txt should be -mt1 @mate1.txt -mt2 @mate2.txt. That way Python returns the lines directly as list which is then used by DCC. In your case DCC tries to parse the mate file itself as junction files which - of course - fails.

I'll think about some way to catch this illegal command line before the main program starts up.

@cmonger
Copy link
Author

cmonger commented Sep 29, 2016

Haha of course it was something silly. I have not come across this syntax before and assumed the @ was used as notation for the user to specify their file name!

Apologies for unnecessary testing! I look forward to seeing the results.

@tjakobi tjakobi closed this as completed Sep 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants