You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I use an online tool to calculate the exact mass from the molecular formula, I get: 596.174125 (diff ~0.35).
I also calculated the exact mass using RDKit directly from the SMILES. I get: 596.1741203239999 (diff ~ 0.35).
When I check the compound in PubChem (searched by InChIKey) than I get: 596.17412.
Actually, the molecular weight in PubChem is pretty close the reported exact mass in the spectra file: 596.5 vs. 596.538.
I attached (see below) a Python script to run a comparison on reported and calculated (using RDKit) exact mass. I ran it for the RIKEN spectra files with an absolute tolerance of 0.001. Only the PR3*.txt seems to be effected.
I believe the files need a curation.
Best regards,
Eric
importsysimportosimportglobfrommathimportisclosefromrdkit.ChemimportMolFromSmilesfromrdkit.Chem.rdMolDescriptorsimportCalcExactMolWt, CalcMolFormulaMF_PATTERN="CH$FORMULA:"EXACT_MASS_PATTERN="CH$EXACT_MASS:"SMILES_PATTERN="CH$SMILES:"if__name__=="__main__":
# Directory containing the RIKEN spectra filesidir=sys.argv[1]
# Iterate overall ms-files in the directoryformsfninsorted(glob.glob(os.path.join(idir, "*.txt"))):
withopen(msfn, "r") asmsfile:
# Read information from file: Molecular Formula, Exact Mass and SMILESline=msfile.readline().strip()
whileline:
# Extract molecular formulaifline.startswith(MF_PATTERN):
mf_file=line[(len(MF_PATTERN) +1):]
# Extract exact masselifline.startswith(EXACT_MASS_PATTERN):
exact_mass_file=float(line[(len(EXACT_MASS_PATTERN) +1):])
# Extract SMILESelifline.startswith(SMILES_PATTERN):
smiles_file=line[(len(SMILES_PATTERN) +1):]
line=msfile.readline().strip()
# We skip molecules that are intrinsically charged, as those might not be correctly handled by rdkitifmf_file.endswith("+"):
continue# Calculate Molecular Formula and Exact Mass from the given SMILES and comparemol=MolFromSmiles(smiles_file)
mf_smi=CalcMolFormula(mol)
exact_mass_smi=CalcExactMolWt(mol)
ifmf_smi!=mf_file:
print("%s: MF (ms-file vs. rdkit) '%s' - '%s'"% (os.path.basename(msfn), mf_file, mf_smi))
ifnotisclose(exact_mass_file, exact_mass_smi, abs_tol=1e-3):
print("%s: Exact Mass (ms-file vs. rdkit) %f - %f = %f"% (os.path.basename(msfn), exact_mass_file,
exact_mass_smi, exact_mass_file-exact_mass_smi))
The text was updated successfully, but these errors were encountered:
Hei,
I stumbled into an issue with the RIKEN/PR3* spectra. It seems, that the exact mass is not correctly calculated. Let's look at the following example:
PR302491.txt
If I use an online tool to calculate the exact mass from the molecular formula, I get:
596.174125
(diff ~0.35).I also calculated the exact mass using RDKit directly from the SMILES. I get:
596.1741203239999
(diff ~ 0.35).When I check the compound in PubChem (searched by InChIKey) than I get:
596.17412
.Actually, the molecular weight in PubChem is pretty close the reported exact mass in the spectra file:
596.5
vs.596.538
.I attached (see below) a Python script to run a comparison on reported and calculated (using RDKit) exact mass. I ran it for the RIKEN spectra files with an absolute tolerance of 0.001. Only the
PR3*.txt
seems to be effected.I believe the files need a curation.
Best regards,
Eric
The text was updated successfully, but these errors were encountered: