Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MSmetaEnhancer #199

Merged
merged 18 commits into from
Jan 11, 2022
Merged

Add MSmetaEnhancer #199

merged 18 commits into from
Jan 11, 2022

Conversation

xtrojak
Copy link
Contributor

@xtrojak xtrojak commented Dec 2, 2021

This PR adds MSmetaEnhancer. It is possible to select which conversion jobs will executed. The list of jobs is in a macros files generated dynamically from current version of MSMetaEnhancer (#197). It has to be updated upon updating this wrapper as described in readme file. The jobs can be specified in particular order (the performance of tool depends on that, #198) and/or in a bulk of jobs to be executed once the ordered ones are finished.

Close #176.
Close #198.
Close #197.
Close #200.

tools/msmetaenhancer/msmetaenhancer.xml Outdated Show resolved Hide resolved
tools/msmetaenhancer/msmetaenhancer.xml Outdated Show resolved Hide resolved
tools/msmetaenhancer/msmetaenhancer.xml Outdated Show resolved Hide resolved
tools/msmetaenhancer/test-data/sample.msp Outdated Show resolved Hide resolved
@xtrojak xtrojak marked this pull request as draft December 6, 2021 08:43
@xtrojak
Copy link
Contributor Author

xtrojak commented Dec 6, 2021

After discussions with @martenson and @hechth, we decided to proceed as following:

@xtrojak xtrojak marked this pull request as ready for review December 6, 2021 12:24
@hechth
Copy link
Member

hechth commented Dec 9, 2021

I'm currently trying the tool and it seems like it doesn't actually start any jobs and just terminates straight away.

@martenson
Copy link
Member

@hechth any errors in the logs? The job finishes for me.

Copy link
Member

@martenson martenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tool would benefit from some more clarity in its interface. E.g. the most basic usage option probably is "do not select any jobs, just let MSME run everything" -- but that option is more or less hidden atm even though it is mentioned in the end of the help text.

I am also confused why the results are different for "no conversion jobs selected" and "all conversion jobs selected as a part of the random selector". I'd expect them be the same but they are consistently different.

For UX I'd consider a two-modes-tool with "simple" having no parameters other than the input file, and "specify jobs" offering the interface to alter the order/jobs.

tools/msmetaenhancer/msmetaenhancer.xml Outdated Show resolved Hide resolved
tools/msmetaenhancer/msmetaenhancer.xml Outdated Show resolved Hide resolved
@hechth
Copy link
Member

hechth commented Dec 9, 2021

The job finishes but just doesn't do anything and the input file doesn't change.

@hechth
Copy link
Member

hechth commented Dec 9, 2021

I think the tool would benefit from some more clarity in its interface. E.g. the most basic usage option probably is "do not select any jobs, just let MSME run everything" -- but that option is more or less hidden atm even though it is mentioned in the end of the help text.

I am also confused why the results are different for "no conversion jobs selected" and "all conversion jobs selected as a part of the random selector". I'd expect them be the same but they are consistently different.

For UX I'd consider a two-modes-tool with "simple" having no parameters other than the input file, and "specify jobs" offering the interface to alter the order/jobs.

For me the UI is completely fine and understandable and makes sense as it is to be honest.

@xtrojak
Copy link
Contributor Author

xtrojak commented Dec 9, 2021

The job finishes but just doesn't do anything and the input file doesn't change.

There was really an issue in order of arguments in job_options macro. Should be fixed in 24ae942.

@xtrojak
Copy link
Contributor Author

xtrojak commented Dec 9, 2021

I think the tool would benefit from some more clarity in its interface. E.g. the most basic usage option probably is "do not select any jobs, just let MSME run everything" -- but that option is more or less hidden atm even though it is mentioned in the end of the help text.

I suppose we can improve this in the future if the tool will not be intuitive enough to the users.

I am also confused why the results are different for "no conversion jobs selected" and "all conversion jobs selected as a part of the random selector". I'd expect them be the same but they are consistently different.

This should be covered in version of MSME 0.1.1 (changed in da914e9). At least I'm not able to reproduce it. Please provide sample .msp file if this issue is still lingering.

@hechth
Copy link
Member

hechth commented Dec 10, 2021

I found another bug when running only a single job (Pubchem: name -> inchikey)

Traceback (most recent call last):
  File "/m2b/home/hechth/git/recetox/galaxytools/tools/msmetaenhancer/msmetaenhancer_wrapper.py", line 47, in <module>
    main(argv=sys.argv[1:])
  File "/m2b/home/hechth/git/recetox/galaxytools/tools/msmetaenhancer/msmetaenhancer_wrapper.py", line 31, in main
    asyncio.run(app.annotate_spectra(services, jobs))
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/app.py", line 83, in annotate_spectra
    jobs = convert_to_jobs(jobs)
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/libs/utils/Job.py", line 28, in convert_to_jobs
    return [Job(data) for data in jobs]
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/libs/utils/Job.py", line 28, in <listcomp>
    return [Job(data) for data in jobs]
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/libs/utils/Job.py", line 6, in __init__
    self.source, self.target, self.service = data
ValueError: too many values to unpack (expected 3)

- and avoid using "jobs" in other than Galaxy context
@hechth
Copy link
Member

hechth commented Dec 10, 2021

Also, when running CTS: name -> inchikey, the job tuns for about 1sec and then terminates as green but no data is added - is CTS currently down?

@martenson
Copy link
Member

martenson commented Dec 10, 2021

I can confirm what helge reported above (ValueError: too many values to unpack (expected 3))

Additionally I've found a discrepancy with how the select field for the conversions without order behaves (tldr I think it does not like the format of value in the <option> tags).

Once you select an option from the offered list it is supposed to disappear from the list, hence if you select all of them the list should offer you nothing more. This is not the case with how it works for this tool and I assume that is because the select2 implementation has trouble comparing string values like this: value="('name', 'inchikey', 'PubChem')". I suggest we adjust the values here to preserve the original UX behavior, otherwise this could be confusing to users (and allows duplicates -- screenshot below).This adjustment may require additional mapping in our wrapper.

Galaxy

@xtrojak
Copy link
Contributor Author

xtrojak commented Dec 13, 2021

I found another bug when running only a single job (Pubchem: name -> inchikey)

Traceback (most recent call last):
  File "/m2b/home/hechth/git/recetox/galaxytools/tools/msmetaenhancer/msmetaenhancer_wrapper.py", line 47, in <module>
    main(argv=sys.argv[1:])
  File "/m2b/home/hechth/git/recetox/galaxytools/tools/msmetaenhancer/msmetaenhancer_wrapper.py", line 31, in main
    asyncio.run(app.annotate_spectra(services, jobs))
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/app.py", line 83, in annotate_spectra
    jobs = convert_to_jobs(jobs)
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/libs/utils/Job.py", line 28, in convert_to_jobs
    return [Job(data) for data in jobs]
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/libs/utils/Job.py", line 28, in <listcomp>
    return [Job(data) for data in jobs]
  File "/home/hechth/miniconda3/envs/__msmetaenhancer@0.1.1/lib/python3.8/site-packages/MSMetaEnhancer/libs/utils/Job.py", line 6, in __init__
    self.source, self.target, self.service = data
ValueError: too many values to unpack (expected 3)

Thank you for noticing! Fixed in 906c767.

@xtrojak
Copy link
Contributor Author

xtrojak commented Dec 13, 2021

Additionally I've found a discrepancy with how the select field for the conversions without order behaves (tldr I think it does not like the format of value in the <option> tags).

Once you select an option from the offered list it is supposed to disappear from the list, hence if you select all of them the list should offer you nothing more. This is not the case with how it works for this tool and I assume that is because the select2 implementation has trouble comparing string values like this: value="('name', 'inchikey', 'PubChem')". I suggest we adjust the values here to preserve the original UX behavior, otherwise this could be confusing to users (and allows duplicates -- screenshot below).This adjustment may require additional mapping in our wrapper.

Yes the problem was indeed in the format of values. I changed the format to a simpler string in 906c767 and now it behaves as expected.

@xtrojak
Copy link
Contributor Author

xtrojak commented Dec 13, 2021

Also, when running CTS: name -> inchikey, the job tuns for about 1sec and then terminates as green but no data is added - is CTS currently down?

CTS is running, but there are some new issues. This particular problem was already detected and mentioned here. It is btw a reason why tests for MSMetaEnhancer are currently failing.

@martenson
Copy link
Member

Output output_file:  different than expected, difference (using diff):
( /tmp/tmpzccq9rgksample_out.msp v. /tmp/tmpouwehnqesample_out.msp )
--- local_file
+++ history_data
@@ -4,10 +4,10 @@
 CASNO: 1333-74-0
 ID: 1
 COMMENT: NIST MS# 245692, Seq# M1
+INCHIKEY: UFHFLCQGNIYNRP-UHFFFAOYSA-N
 INCHI: InChI=1S/H2/h1H
-INCHIKEY: UFHFLCQGNIYNRP-UHFFFAOYSA-N
-IUPAC_NAME: molecular hydrogen
-SMILES: [HH]
+IUPAC_NAME: hydrogen monohydride
+SMILES: [H][H]
 NUM PEAKS: 2
 1.0	20.98
 2.0	999.0
@@ -18,9 +18,9 @@
 CASNO: 7782-39-0
 ID: 2
 COMMENT: NIST MS# 61316, Seq# M2
+INCHIKEY: UFHFLCQGNIYNRP-VVKOMZTBSA-N
 INCHI: InChI=1S/H2/h1H/i1+1D
-INCHIKEY: UFHFLCQGNIYNRP-VVKOMZTBSA-N
-SMILES: [HH]
+SMILES: [2H][2H]
********
*SNIP *
********
 13.0	85.92
@@ -67,10 +65,9 @@
 CASNO: 74-82-8
 ID: 5
 COMMENT: Any=100 ; NIST MS# 18809, Seq# R27
+INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 INCHI: InChI=1S/CH4/h1H4
-INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 IUPAC_NAME: methane
-SMILES: C
 NUM PEAKS: 6
 12.0	7.99
 13.0	28.97
@@ -85,10 +82,9 @@
 CASNO: 74-82-8
 ID: 6
 COMMENT: Any=100 ; NIST MS# 423924, Seq# R28
+INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 INCHI: InChI=1S/CH4/h1H4
-INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 IUPAC_NAME: methane
-SMILES: C
 NUM PEAKS: 6
 12.0	25.98
 13.0	74.93

@xtrojak
Copy link
Contributor Author

xtrojak commented Jan 7, 2022

Output output_file:  different than expected, difference (using diff):
( /tmp/tmpzccq9rgksample_out.msp v. /tmp/tmpouwehnqesample_out.msp )
--- local_file
+++ history_data
@@ -4,10 +4,10 @@
 CASNO: 1333-74-0
 ID: 1
 COMMENT: NIST MS# 245692, Seq# M1
+INCHIKEY: UFHFLCQGNIYNRP-UHFFFAOYSA-N
 INCHI: InChI=1S/H2/h1H
-INCHIKEY: UFHFLCQGNIYNRP-UHFFFAOYSA-N
-IUPAC_NAME: molecular hydrogen
-SMILES: [HH]
+IUPAC_NAME: hydrogen monohydride
+SMILES: [H][H]
 NUM PEAKS: 2
 1.0	20.98
 2.0	999.0
@@ -18,9 +18,9 @@
 CASNO: 7782-39-0
 ID: 2
 COMMENT: NIST MS# 61316, Seq# M2
+INCHIKEY: UFHFLCQGNIYNRP-VVKOMZTBSA-N
 INCHI: InChI=1S/H2/h1H/i1+1D
-INCHIKEY: UFHFLCQGNIYNRP-VVKOMZTBSA-N
-SMILES: [HH]
+SMILES: [2H][2H]
********
*SNIP *
********
 13.0	85.92
@@ -67,10 +65,9 @@
 CASNO: 74-82-8
 ID: 5
 COMMENT: Any=100 ; NIST MS# 18809, Seq# R27
+INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 INCHI: InChI=1S/CH4/h1H4
-INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 IUPAC_NAME: methane
-SMILES: C
 NUM PEAKS: 6
 12.0	7.99
 13.0	28.97
@@ -85,10 +82,9 @@
 CASNO: 74-82-8
 ID: 6
 COMMENT: Any=100 ; NIST MS# 423924, Seq# R28
+INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 INCHI: InChI=1S/CH4/h1H4
-INCHIKEY: VNWKTOKETHGBQD-UHFFFAOYSA-N
 IUPAC_NAME: methane
-SMILES: C
 NUM PEAKS: 6
 12.0	25.98
 13.0	74.93

This is caused by recent changes in MSME regarding values validation. Could be easily fixed by updating the expected file.

However, we agreed that this tool cannot reproduce consistent results among different runs. All tests checking for data content were removed from MSME. Do we want to remove this test also from Galaxy wrapper? We probably can't rely on them, although they use quite simple molecules where metadata is expected to be rather consistent.

@martenson
Copy link
Member

@xtrojak I guess you need to make a call whether the test is useful or not. My position is that if it breaks ~once a year, it could still be useful, if it is every month then the maintenance burden is too high.

@hechth
Copy link
Member

hechth commented Jan 7, 2022

I'd suggest staying with it for now and when we see that it becomes too maintenance-intensive we remove it.

@hechth
Copy link
Member

hechth commented Jan 10, 2022

Looks good to me, I think this can be merged. Thanks for the great work!

@xtrojak
Copy link
Contributor Author

xtrojak commented Jan 10, 2022

Alright, I have updated the test output file for now, we will see how it goes in the future.

@hechth hechth merged commit 2c9c75f into RECETOX:master Jan 11, 2022
@martenson
Copy link
Member

Well done @xtrojak. The tool is now on UMSA here: https://umsa.cerit-sc.cz/root?tool_id=testtoolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.1.2+galaxy0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants