Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSP as export format for MassBank-data #132

Closed
sneumann opened this issue Jul 28, 2020 · 18 comments
Closed

MSP as export format for MassBank-data #132

sneumann opened this issue Jul 28, 2020 · 18 comments

Comments

@sneumann
Copy link
Member

Hi, in addition to #31 and #32 we can think about *.msp
as export format for the releases. @meier-rene has code
up the sleeve that exports MB records to some flavour of *.msp files.
Yours, Steffen

@tsufz
Copy link
Member

tsufz commented Jul 29, 2020

Really appreciated!

@schymane
Copy link
Member

Somewhat related ... @adelenelai shared this earlier today
https://www.researchsquare.com/article/rs-44215/v1

They do not yet seem to support MSP (on a superficial look) but maybe an extension?

@tsufz
Copy link
Member

tsufz commented Jul 29, 2020

Jep, They are part of NFDI4chem.

@meowcat
Copy link

meowcat commented Jul 30, 2020

I usually use http://prime.psc.riken.jp/compms/others/main.html#Massbank2msp (usually followed by the LIB2Nist to convert it to a NIST library). This works quite well with the appropriate settings. But possibly @meier-rene has something better?

@sneumann
Copy link
Member Author

IIRC Massbank2msp is a windows program, so it would not work nicely in a continuous integration pipeline. Correct ?

@tsufz
Copy link
Member

tsufz commented Jul 30, 2020

Github should provide a Windows based runner... However, it might be a nicer way to write an own parser. The format is super simple and this should be doable in some hours:

NAME: Mellein; LC-ESI-ITFT; MS2; CE
PRECURSORMZ: 179.0697
PRECURSORTYPE: [M+H]+
INSTRUMENTTYPE: LC-ESI-ITFT
INSTRUMENT: Q-Exactive Orbitrap Thermo Scientific
Authors: Justin B. Renaud, Mark W. Sumarah, Agriculture and Agri-Food Canada
License: CC BY-SA
SMILES: CC1CC2=C(C(=CC=C2)O)C(=O)O1
INCHI: InChI=1S/C10H10O3/c1-6-5-7-3-2-4-8(11)9(7)10(12)13-6/h2-4,6,11H,5H2,1H3
COLLISIONENERGY: 10(NCE)
FORMULA: C10H10O3
RETENTIONTIME: 3.44
IONMODE: Positive
MASSBANKACCESSION: AC000001
Links: INCHIKEY KWILGNNWGSNMPA-UHFFFAOYSA-N; CAS 17397-85-2; PUBCHEM CID; CHEMSPIDER 26529; KNAPSACK C00000550; COMPTOX DTXSID60891794;
Comment: PrecursorMz=179.0697, PrecursorType=[M+H]+, InstrumentType=LC-ESI-ITFT, CE=10(NCE)
Num Peaks: 5
133.0648 21905.33203125
151.0754 9239.8974609375
155.9743 10980.8896484375
161.0597 96508.4375
179.0703 72563.875

@tsufz
Copy link
Member

tsufz commented Jul 30, 2020

The tool of Riken is obviously not running in command line mode and thus no chance for automation. Hence, an offline workaround and using a runner mode is not possible. :-(

@meowcat
Copy link

meowcat commented Jul 30, 2020

At one point I started (with @michaelwitting and @Treutler) work on a converter that uses schema templates, so it would read/write arbitrary formats by just adding new templates. I still find the idea neat but we never really materialized the thing. A simple MSP writer is easier to do, of course.
https://github.com/meowcat/MSnio

@sneumann
Copy link
Member Author

Another initiative (which can still use the schema template idea) is
https://github.com/rformassspectrometry/Spectra/ which can then
use different MsBackEnds https://github.com/sneumann/MsBackendMsp

Also note that we have started to document parts of the MassBank environment
in https://github.com/MassBank/MassBank-documentation/ so they are available
via the MassBank web frontend: https://msbi.ipb-halle.de/MassBank/
Then it is possible to have updated documentation without re-deploying the web app :-)

Yours, Steffen

@tsufz
Copy link
Member

tsufz commented Jul 30, 2020

Really appreciated :-)

Best,
Tobias

@michaelwitting
Copy link
Contributor

@meowcat Indeed, this is something we should follow up. Maybe the Spectra package is a good opportunity to revive it. I really like the backends and the Spectra objects. It is very easy to construct. I have written a backend for MassBank records: https://github.com/michaelwitting/MsBackendMassbank. If I find time soonish I will implement a write for MassBank records.
The MSnio package could be used to convert the namespace from one backend to another.

@tsufz
Copy link
Member

tsufz commented Jul 30, 2020

Nice, but we need a Java backend to integrate somewhere in CI, for example in the validator. We parse the records for the checks anyway. Some additional parsing and export to the *.msp and some downstream processing to compile meaningful *.msp collections might be possible. @meier-rene, what do you think?

@sneumann
Copy link
Member Author

@meier-rene has now worked on an export for MSP that will appear in our releases. Yours, Steffen

@tsufz
Copy link
Member

tsufz commented Nov 25, 2020

By today, this appeared! Many thanks to @meier-rene for your efforts!

https://github.com/MassBank/MassBank-data/releases/tag/2020.11

Time to close this issue.

@tsufz tsufz closed this as completed Nov 25, 2020
@tsufz
Copy link
Member

tsufz commented Nov 25, 2020

Reopen... It is a pity that the original record number is not included and we should give some credits to the authors. Could be parsed to the comment field.

@tsufz tsufz reopened this Nov 25, 2020
@meier-rene
Copy link
Collaborator

I will include this. Thanks for the comment. This converter to MSP is pretty new, so I expect some more issues. Please don't hesitate to report them.

@meowcat
Copy link

meowcat commented Nov 25, 2020

Yep. First, thanks a ton! Second, the original accession ID would be really great.

Then finally, not essential but worth a thought, the MSP format used by NIST (which I consider "the reference") uses a mixture of CamelCase (PrecursorMZ) and Capitalized_snake_case (Ion_mode). This is from an obfuscated version of an original NIST record:

Name: Obfuscamycin-methyl
Notes: Consensus spectrum; Nreps=9/9; Mz_diff=-0.2ppm; Vial_ID=3432; Metabolite_2016_01_29_ID=65432; micromol/L in water/acetonitrile/formic acid (50/50/0.1)
Ion_mode: P
Instrument: Thermo Finnigan Elite Orbitrap
Instrument_type: HCD
Ionization: ESI
Collision_energy: NCE=15% 11eV
Collision_gas: N2
Sample_inlet: direct flow injection
Spectrum_type: MS2
Precursor_type: [M+H]+
PrecursorMZ: 433.0658
InChIKey: WTZUASDHJKHKFA-UHFFFAOYSA-N
Synon: (The first synon could/should be the IUPAC name)
Formula: C54H13ClF3NO4
MW: 432
ExactMass: 432.058521
CASNO: 12345
NISTNO: 34567
ID: 8765432
Comment: NIST Mass Spectrometry Data Center
Num peaks: 3
123.222 654.12 "C13ClF3H10NO2=p-C3H1O2/0.1ppm;C34ClH11NO4=p-H3F3/-7.6ppm 8/8"
341.0294 77.40 "C11ClF3H10NO3=p-CH4O/-0.5ppm 8/8"
398.0558 999.00 "p/0.0ppm 8/8"

In the further future, we could also include the peak annotations.

We once compiled correspondence between record format fields here, maybe that helps:
https://docs.google.com/spreadsheets/d/1IGgtA6FL5lt9ajDSb0WR90qQfHtebDtcfyjX86tC_JQ/edit#gid=0

@meier-rene
Copy link
Collaborator

@tsufz Your issue is solved. I uploaded a RIKEN msp version which includes a DB id.

@meowcat The first msp I uploaded was not in the NIST format, but the RIKEN format. I've chosen this because its more used in our place, but I don't want to argue which one is the reference ore more useful. Our converter can also create NIST dialect of msp and thats why I also uploaded a NIST format msp. Future releases will include both version, but I will not implement a third version. 😄 Your record format cheat sheet will become useful in the future. Thank you for sharing! I would appreciate it, if it would stay accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants