Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'dict_ref_structure' is not defined #63

Closed
ScatF opened this issue Jul 11, 2019 · 4 comments
Closed

'dict_ref_structure' is not defined #63

ScatF opened this issue Jul 11, 2019 · 4 comments

Comments

@ScatF
Copy link

ScatF commented Jul 11, 2019

Hi,

I'm using human transcriptome ONT data and want to simulate reads. I first had the error message 'DivisionByZero' so I'm actually running with the 2.3-beta version of NanoSim. I reached the simulation stage and I ran into this error :

  File "/NanoSim/src/simulator.py", line 1198, in <module>
    main()
  File "/NanoSim/src/simulator.py", line 1192, in main
    simulation(args.mode, out, dna_type, perfect, kmer_bias, max_len, min_len, None, None, model_ir)
  File "/NanoSim/src/simulator.py", line 722, in simulation
    new_read, new_read_name = extract_read(dna_type, middle_ref)
  File "/NanoSim/src/simulator.py", line 772, in extract_read
    key = random.choice(seq_len.keys())
  File "/miniconda3/lib/python3.7/random.py", line 262, in choice
    return seq[i]
TypeError: 'dict_keys' object is not subscriptable 

I identify the problem : with Python 3.7, seq_len.keys() is an object and not a list. We can easily fix the problem by changing
key = random.choice(seq_len.keys())
into
key = random.choice(list(seq_len.keys()))

And now I get this error :

  File "./simulator.py", line 1198, in <module>
    main()
  File "./simulator.py", line 1192, in main
    simulation(args.mode, out, dna_type, perfect, kmer_bias, max_len, min_len, None, None, model_ir)
  File "./simulator.py", line 754, in simulation
    simulation_aligned_transcriptome(model_ir, out_reads, out_error, kmer_bias, per)
  File "./simulator.py", line 499, in simulation_aligned_transcriptome
    if ref_trx in dict_ref_structure:
NameError: name 'dict_ref_structure' is not defined

And I don't understand why. 'dict_ref_structure' is well defined before, in global and line 321.
Can you help me ?

@SaberHQ
Copy link
Collaborator

SaberHQ commented Jul 11, 2019

Hi @StanislasF

Thanks for bringing this up. I will update the script for seq_len.keys() thing as you mentioned so that it is compatible in Python 3.7.

As for your second question, as you said, that dictionary is very well defined before. Would you mind to write down the exact code you are running with all input variables you use? Thanks.

@ScatF
Copy link
Author

ScatF commented Jul 12, 2019

Yes, sure. I'm running with python 3.7.3, HTSeq 0.11.2, nump 1.16.4, pybedtools 0.8.0, pysam 0.15.2, scipy 1.3.0, scikit-learn 0.21.2, genometools 1.2.1 (you did'nt mentionned genometools in your requirement file but it was needed for me). All my reference come from Ensembl database.

I ran step by step

cd /MYPATH/localLib/NanoSim-2.3-beta/src/

./read_analysis.py transcriptome -i /MYPATH/Data/ONTseq/fasta/raw/T7_3moins_10102018.clean.fa -rt /MYPATH/data/Homo_sapiens.GRCh38.cdna.all.fa  -annot /MYPATH/data/Homo_sapiens.GRCh38.97.chr.gtf -o /MYPATH/workspace/NanoSim/NanoSim_$temps/training -t 4 --no_intron_retention

./read_analysis.py quantify -i /MYPATH/Data/ONTseq/fasta/raw/T7_3moins_10102018.clean.fa -rt /MYPATH/data/Homo_sapiens.GRCh38.cdna.all.fa  -o /MYPATH/workspace/NanoSim/NanoSim_$temps/expression -t 4  

./simulator.py transcriptome -rt /MYPATH/data/Homo_sapiens.GRCh38.cdna.all.fa -e /MYPATH/workspace/NanoSim/NanoSim_$temps/expression_abundance.tsv -c /MYPATH/workspace/NanoSim/NanoSim_$temps/training -o /MYPATH/workspace/NanoSim/NanoSim_$temps/simulated -max 10000 --no_model_ir -rg /MYPATH/data/Homo_sapiens.GRCh38.dna.primary_assembly.fa

All the output are okay except for the simulation, it give me only unaligned reads. It break after 'start simulation of random reads' and give me the 'dict_keys' error

@SaberHQ
Copy link
Collaborator

SaberHQ commented Jul 12, 2019

Dear @StanislasF

Thanks for providing more info. Actually, it seems like an input requirement bug and I fixed it now. The reference genome and the annotation file are not necessary unless you are willing to model Intron retention events as well. There was a bug in which that dictionary you just mentioned did not create when using --no_model_ir option.

It is now fixed. I also improved the speed a lot and added a new option as well. So please check the new pre-release here: https://github.com/bcgsc/NanoSim/releases/tag/v2.4-beta

Let me know if I can provide more help. Please feel free to contact me if you have any questions.

@ScatF
Copy link
Author

ScatF commented Jul 15, 2019

Dear @SaberHQ

You did a great job with your new pre-release. I tried with and without the --no_model_ir option and find only one little issue : for your --uracile option in simulator.py you use maketrans function. With Python 3, this function is no more in string package but is a static method of builtin str. So you cannot import it but you can use it directly with str.maketrans

Thank you for your help !

@ScatF ScatF closed this as completed Jul 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants