Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectrum feature generator #178

Draft
wants to merge 41 commits into
base: main
Choose a base branch
from
Draft

Conversation

ArthurDeclercq
Copy link
Collaborator

No description provided.

@RalfG RalfG added this to the v3.2.0 milestone Aug 20, 2024
@RalfG RalfG added the feature new feature label Aug 20, 2024
(psm_list["qvalue"] <= 0.01)
& (psm_list["rank"] <= max_rank)
& (~psm_list["is_decoy"])
& ([metadata.get("original_psm", True) for metadata in psm_list["metadata"]])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it might be quite inefficient, however I'm not sure if it can be improved significantly, given that original_psm is in the metadata dict. Maybe keeping it a series instead of a list might be better. Or adding it to the dataframe.

Comment on lines 121 to 124
if original_matched_ions_pct > matched_ions[i]:
keep[i] = False
else:
keep[i] = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if original_matched_ions_pct > matched_ions[i]:
keep[i] = False
else:
keep[i] = True
keep[i] = original_matched_ions_pct <= matched_ions[i]

Comment on lines 108 to 111
if "matched_ions_pct" in psm_list[0].rescoring_features:
matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
else:
return psm_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "matched_ions_pct" in psm_list[0].rescoring_features:
matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
else:
return psm_list
if "matched_ions_pct" not in psm_list[0].rescoring_features:
return psm_list
else:
matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]



class MS2FeatureGenerator(FeatureGeneratorBase):
"""DeepLC retention time-based feature generator."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this docstring should be updated?

}
except AttributeError:
raise ParseSpectrumError(
"Could not parse spectrum IDs using ´spectrum_id_pattern´. Please make sure that there is a capturing in the pattern."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a capture group with "a capturing"?

Comment on lines 309 to 319
for peak in annotated_spectrum:

for fragment in peak.annotation:

ion_type = infer_fragment_identity(fragment)

if ion_type == 'b':
b_intensities.append(peak.intensity)
if ion_type == 'y':
y_intensities.append(peak.intensity)
return b_intensities, y_intensities
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for peak in annotated_spectrum:
for fragment in peak.annotation:
ion_type = infer_fragment_identity(fragment)
if ion_type == 'b':
b_intensities.append(peak.intensity)
if ion_type == 'y':
y_intensities.append(peak.intensity)
return b_intensities, y_intensities
for peak in annotated_spectrum:
for fragment in peak.annotation:
ion_type = infer_fragment_identity(fragment)
if ion_type == 'b':
b_intensities.append(peak.intensity)
elif ion_type == 'y':
y_intensities.append(peak.intensity)
return b_intensities, y_intensities

return annotated_spectrum.spectrum


def factorial(n):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use a custom function instead of math.factorial?

if spectrum_filepath.suffix.lower() == ".mzml":
return mzml.PreIndexedMzML(str(spectrum_filepath))
elif spectrum_filepath.suffix.lower() == ".mgf":
return mgf.IndexedMGF(str(spectrum_filepath))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to avoid failing silently and add an else and raise an e.g. NotImplementedError or ValueError.

paretje and others added 8 commits January 14, 2025 17:22
…rator

* origin/main:
  Use np.fromiter for generators
  Implement requested changes (.copy; use generators)
  Urgent fix in im2deep.py
  require ms2rescore-rs version with file type check
  move file type check to ms2rescore_rs
  Pin DeepLC version to <3.1, avoiding calibration bug
  Refactor parsing of spectrum data: - Clearer logging when parsing precursor info from spectrum files - Always check if PSMs match with spectra based on observed precursor m/z (if available in PSM list) - Always raise error if not all PSMs can be found in spectrum file(s), before MS²PIP - Provide example PSM IDs from both PSM and spectrum file when matching fails. - Move all code to parse_spectra
rustyms 0.9.0a3 requires python 3.11, while we support 3.9.
As mumble hasn't been published on pypi yet, use a git dependency for
now.
@paretje paretje force-pushed the spectrum-feature-generator branch from 597544b to aee8ec7 Compare January 21, 2025 15:13
@paretje
Copy link
Collaborator

paretje commented Jan 22, 2025

It might be worth considering setting these by default when running mumble through ms2rescore, given that these are kind of required for it to work, so at least it should be documented:

[ms2rescore.psm_generator.mumble]
keep_original = true
generate_modified_decoys = true

im2deep.utils was introduced in 0.3.0.
compomics/IM2Deep@0a4bc9d
@paretje
Copy link
Collaborator

paretje commented Jan 29, 2025

I had a quick look at the python 3.9 failure: this is because of mumble, which requires python 3.10. I think we could probably lower the requirement in mumble, as I don't think we actually use any new features, unless some of the dependencies do. And otherwise, we could of course just consider dropping python 3.9 for ms2rescore.

For the record, I used poetry to get a more useful error from the dependency resolver (adding the version to pyproject.toml, it seems dynamic version fields aren't supported):

The current project's supported Python range (>=3.9) is not compatible with some of the required packages Python requirement:
  - mumble-mod requires Python >=3.10, so it will not be satisfied for Python >=3.9,<3.10

Because ms2rescore depends on mumble-mod (0.2.0) @ git+https://github.com/compomics/mumble.git@114ad7d which requires Python >=3.10, version solving failed.

  * Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties

    For mumble-mod, a possible solution would be to set the `python` property to ">=3.10"

    https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
    https://python-poetry.org/docs/dependency-specification/#using-environment-markers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants