Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rxlr_motifs or rxlr_venn_workflow? #37

Closed
Neato-Nick opened this issue Jun 16, 2021 · 14 comments
Closed

rxlr_motifs or rxlr_venn_workflow? #37

Neato-Nick opened this issue Jun 16, 2021 · 14 comments

Comments

@Neato-Nick
Copy link
Contributor

Neato-Nick commented Jun 16, 2021

Hi, I'm exploring galaxy, just got my own instance spun up.

In the public toolshed, I found rxlr_venn_workflow. Does this execute the same things as rxlr_motifs.py in this repository? Or, are the outputs slightly different somehow?

@peterjc
Copy link
Owner

peterjc commented Jun 16, 2021

The workflow calls the RXLR tool and other tools to plot a Venn diagram. It was more a proof of principle for how we might share workflows on the Galaxy Tool Shed than something very practical.

You probably want just the RXLR tool.

@Neato-Nick
Copy link
Contributor Author

Neato-Nick commented Jun 16, 2021

Thanks. The venn_workflow is taking a while to install through the toolshed so I did the manual install of these tools in parallel. Running my data I got an error. If you want me to post as a separate issue that's fine, but if I was hoping it was a minor thing you see all the time

File "~/opt/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/7de64c8b258d/tmhmm_and_signalp/tools/protein_analysis/rxlr_motifs.py", line 273
    print "%s for %i sequences:" % (mode

Edit: This was actually an issue with the toolshed version, running Whisson

@peterjc
Copy link
Owner

peterjc commented Jun 16, 2021

I suspect that's a Python 2 bit of code being run under Python 3, although I'd like to see more of the error context to be sure. The RXLR tool has been updated to work under Python 3, but the workflow is most likely requesting an older Python 2 only version of the tool.

Tricky.

The workflow ought to be fine if you install the latest version of the RXLR tool. But perhaps I should update the workflow...

@Neato-Nick
Copy link
Contributor Author

Neato-Nick commented Jun 16, 2021

Interesting, I think I was running an older version of the rxlr script. I updated it and all of the dependencies from toolshed to the most current version. Now I get a different error stemming from signalp, I guess that's progress. It looks less like a Python 2 vs 3 error but I could be wrong. Is there a way to force all python scripts called by rxlr_motifs.py to run under python 2.7?

Traceback (most recent call last):
  File "~/opt/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/a19b3ded8f33/tmhmm_and_signalp/tools/protein_analysis/signalp3.py", line 175, in <module>
     n=FASTA_CHUNK, truncate=truncate, max_len=MAX_LEN)
  File "~/opt/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/a19b3ded8f33/tmhmm_and_signalp/tools/protein_analysis/seq_analysis_utils.py", line 125, in split_fasta
     records.append(iterator.next())
AttributeError: 'generator' object has no attribute 'next'
Error 256 from SignalP:
python /~/opt/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/a19b3ded8f33/tmhmm_and_signalp/tools/protein_analysis/signalp3.py euk 0 1 /~/opt/galaxy/database/objects/d/4/6/dataset_d46651f7-b4d1-45d0-a057-e8978f37ee77.dat.fasta.tmp ~/opt/galaxy/database/objects/d/4/6/dataset_d46651f7-b4d1-45d0-a057-e8978f37ee77.dat.tabular.tmp

@peterjc
Copy link
Owner

peterjc commented Jun 17, 2021

Progress indeed. Sadly another Python 2 to 3 pain point, this one was fixed 3 years ago in 85915a5

So again, hopefully all you need to do is update the signalp wrapper?

I'm unsure if there is a hack to specify Python 2.7, but I really don't want to go that route since the current versions of these wrappers should all work under Python 3

@Neato-Nick
Copy link
Contributor Author

Ah, indeed I see the tool ID in my error is toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/rxlr_motifs/0.0.14 and that commit you linked brought it up to 0.0.16
And yet, my install shows I should be at signalp3 0.0.19
image
I think I'm having galaxy learning curve issues

@peterjc
Copy link
Owner

peterjc commented Jun 17, 2021

Whoops. I never updated the main Tool Shed since 2017-09-21, so it didn't have the Python 3 fix for the "next" problem you ran into. Done now.

My apologies - there was a time when these wrappers were getting lots of small fixes, so doing an update every change was over the top, but my Galaxy work has trailed off since then.

@Neato-Nick
Copy link
Contributor Author

Thank you! Got past the python errors. Now it's actually SignalP... the dreaded error running HOW.

Very strange, I can run on the SignalP test data fine (~/opt/signalp-3.0/test/test.seq), but when it's my own proteins I get the 'error running HOW'.

@Neato-Nick
Copy link
Contributor Author

So - this does not help your development project, but all three models are working for me now! I'm using the script from this repo rather than the galaxy toolshed, and I'm just running it using command-line usage rather than through galaxy like so:
python2.7 rxlr_motifs.py ~/opt/signalp-3.0/test/test5.seq 2 Win2007 test5.win.out
Even though I called py2.7 directly, I had to fix one more python error near line 111 of rxlr_motifs.py, but I just followed this solution and it was no sweat.

I cannot reproduce the signalp errors. Even just running signalp directly from the dir I installed it in, I was getting that HOW error. But magically, it works when running my proteins via your script's signalp call.

@Neato-Nick
Copy link
Contributor Author

Neato-Nick commented Jun 17, 2021

Last thing, can I just ask about the Whisson output? I assumed that the union(hmm+re)=Y but it's not quite adding up. Here are my numbers:
Y = 290, hmm = 251, neither = 149423, re = 27
Edit: Oh, is it union(hmm+re)=Y, and then hmm labels are genes only found with hmm and re are genes only found with regex?

@peterjc
Copy link
Owner

peterjc commented Jun 17, 2021

Could you expand on what you had to change in rxlr_motifs.py about StopIteration? A pull request would be even better of course.

I never did get to the bottom of "error running HOW", any insights are welcome on #24.

As to the Whisson output, I think you've got it now. Adding those four numbers should match the total sequence count.

@Neato-Nick
Copy link
Contributor Author

Neato-Nick commented Jun 17, 2021

I'm glad you asked me to expand, I went back and found it was actually in seq_analysis_utils.py originally in line 111 not in rxlr_motifs.py. Here's what mine looks like now, lines 111-114 are what I added

105         if max_len and len(seq) > max_len:
106             raise ValueError(
107                 "Sequence %s is length %i, max length %i"
108                 % (title.split()[0], len(seq), max_len)
109             )
110         #yield title, seq
111         try:
112             yield title, seq
113         except StopIteration:
114             return
115 #    raise StopIteration

Part of me wonders if the "error running HOW" is related to filesize or number of sequences. Their test5.seq works, and an input set of my proteins reduced from regex & HMM searches works, further reduced splitting up the tasks as you described in #24. You first thought it was related to your temp files but ruled out 'user error' on your part of splitting them up. Maybe it is number of sequences per input file?
That being said, I also tried running all this in WSL on my windows machine, and my signalP install was giving me error running HOW even on their test.seq which is one sequence, so it could be some combination of environment & input file

Edit: I made a pull request implementing the code I pasted above
#38

@peterjc
Copy link
Owner

peterjc commented Jun 17, 2021

Thanks for #38, hopefully that's the Python 3 stuff dealt with.

I doubt we'll solve #24 and the "error running HOW" today :(

Can we close this issue, or do you think the workflow needs updating?

@Neato-Nick
Copy link
Contributor Author

Nope we can close it! Thank you for all the responses. I doubt I would've gotten through the python stuff if you had asked me to open a new issue with each new error. I really appreciate you working closely with me.

@peterjc peterjc closed this as completed Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants