New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Spectrum feature generator #178

Draft

ArthurDeclercq wants to merge 41 commits into main from spectrum-feature-generator

Collaborator

ArthurDeclercq commented Aug 16, 2024

No description provided.

ArthurDeclercq and others added 17 commits

February 24, 2024 15:48


          initial commit

fdceeba


          finalize ms2 feature generation

5374ed8


          add rustyms

60207a3


          remove exit statement fixed IM required value

ae39844


          change logger.info to debug

9b98c4d


          added profile decorator to get timings for functions

5e45756


          removed profile as standard rescore debug statement

304777c


          added new basic features

95ee475


          fixes for ms2 feature generator, removed multiprocessing

73f4573


          return empty list on parsing error with rustyms, removed multiprocessing

947233e


          add deeplc_calibration psm set

24ce565


          Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

114b006

…into spectrum-feature-generator


          remove unused import

33c38b0


          Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

40425c7

…into spectrum-feature-generator


          Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

b810b8c

…into spectrum-feature-generator


          Merge tag 'main' of https://github.com/compomics/ms2rescore into spec…

69b5d1a

…trum-feature-generator


          Merge pull request #177 from compomics/main

6e2d102

pull main in spectrum-feature-generator

RalfG added this to the v3.2.0 milestone

RalfG added the feature label

ArthurDeclercq and others added 11 commits

August 21, 2024 13:25


          integrate mumble into ms2branch

11fdc51


          Merge remote-tracking branch 'origin/main' into spectrum-feature-gene…

3140c44

…rator


          temp removal of sage features before rescoring

883169a


          Merge branch 'main' of https://github.com/compomics/ms2rescore into s…

97865e7

…pectrum-feature-generator


          remove psm_file features when rescoring with mumble

da39ae8


          linting

37fff28


          add hyperscore calculation

e8b59f3


          calibration fixes

c51cd34


          changes for mumble implementation

295e37f


          change openms peptide formatting

909860d


          add mumble psm filtering functionality

c5902c2

ArthurDeclercq and others added 4 commits

November 22, 2024 13:36


          Merge branch 'spectrum-feature-generator' of https://github.com/compo…

6eaceb2

…mics/ms2rescore into spectrum-feature-generator


          remove pyopenms dependency for hyperscore calculation

5ce55f5


          fix spectrum_id accession

986c5f6


          Merge branch 'spectrum-feature-generator' of https://github.com/compo…

bbecf6a

…mics/ms2rescore into spectrum-feature-generator

paretje reviewed

View reviewed changes

ms2rescore/core.py

+                      (psm_list["qvalue"] <= 0.01)
+                      & (psm_list["rank"] <= max_rank)
+                      & (~psm_list["is_decoy"])
+                      & ([metadata.get("original_psm", True) for metadata in psm_list["metadata"]])

Collaborator

paretje Jan 6, 2025

This seems like it might be quite inefficient, however I'm not sure if it can be improved significantly, given that original_psm is in the metadata dict. Maybe keeping it a series instead of a list might be better. Or adding it to the dataframe.

ms2rescore/utils.py Outdated

Comment on lines 121 to 124

+                          if original_matched_ions_pct > matched_ions[i]:
+                              keep[i] = False
+                          else:
+                              keep[i] = True

Collaborator

paretje Jan 6, 2025

Suggested change

      
                        if original_matched_ions_pct > matched_ions[i]:
          
                            keep[i] = False
          
                        else:
          
                            keep[i] = True
          
                            keep[i] = original_matched_ions_pct <= matched_ions[i]

ms2rescore/utils.py Outdated

Comment on lines 108 to 111

+                  if "matched_ions_pct" in psm_list[0].rescoring_features:
+                      matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
+                  else:
+                      return psm_list

Collaborator

paretje Jan 6, 2025

Suggested change

      
                if "matched_ions_pct" in psm_list[0].rescoring_features:
          
                    matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
          
                else:
          
                    return psm_list
          
                if "matched_ions_pct" not in psm_list[0].rescoring_features:
          
                    return psm_list
          
                else:
          
                    matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]

ms2rescore/feature_generators/ms2.py



		class MS2FeatureGenerator(FeatureGeneratorBase):
		"""DeepLC retention time-based feature generator."""

Collaborator

paretje Jan 6, 2025

I guess this docstring should be updated?

ms2rescore/feature_generators/ms2.py

+                          }
+                      except AttributeError:
+                          raise ParseSpectrumError(
+                              "Could not parse spectrum IDs using ´spectrum_id_pattern´. Please make sure that there is a capturing in the pattern."

Collaborator

paretje Jan 6, 2025

Do you mean a capture group with "a capturing"?

ms2rescore/feature_generators/ms2.py Outdated

Comment on lines 309 to 319

+                  for peak in annotated_spectrum:
+                      for fragment in peak.annotation:
+                          ion_type = infer_fragment_identity(fragment)
+                          if ion_type == 'b':
+                              b_intensities.append(peak.intensity)
+                          if ion_type == 'y':
+                              y_intensities.append(peak.intensity)
+                  return b_intensities, y_intensities

Collaborator

paretje Jan 6, 2025

Suggested change

      
                for peak in annotated_spectrum:
          
                    for fragment in peak.annotation:
          
                        ion_type = infer_fragment_identity(fragment)
          
                        if ion_type == 'b':
          
                            b_intensities.append(peak.intensity)
          
                        if ion_type == 'y':
          
                            y_intensities.append(peak.intensity)
          
                return b_intensities, y_intensities
          
                for peak in annotated_spectrum:
          
                    for fragment in peak.annotation:
          
                        ion_type = infer_fragment_identity(fragment)
          
                        if ion_type == 'b':
          
                            b_intensities.append(peak.intensity)
          
                        elif ion_type == 'y':
          
                            y_intensities.append(peak.intensity)
          
                return b_intensities, y_intensities

ms2rescore/feature_generators/ms2.py

		return annotated_spectrum.spectrum


		def factorial(n):

Collaborator

paretje Jan 6, 2025

Any reason to use a custom function instead of math.factorial?

ms2rescore/feature_generators/ms2.py Outdated

+                      if spectrum_filepath.suffix.lower() == ".mzml":
+                          return mzml.PreIndexedMzML(str(spectrum_filepath))
+                      elif spectrum_filepath.suffix.lower() == ".mgf":
+                          return mgf.IndexedMGF(str(spectrum_filepath))

Collaborator

paretje Jan 6, 2025

It might be better to avoid failing silently and add an else and raise an e.g. NotImplementedError or ValueError.

paretje and others added 8 commits

January 14, 2025 17:22


          Merge remote-tracking branch 'origin/main' into spectrum-feature-gene…

6fd6053

…rator

* origin/main:
  Use np.fromiter for generators
  Implement requested changes (.copy; use generators)
  Urgent fix in im2deep.py
  require ms2rescore-rs version with file type check
  move file type check to ms2rescore_rs
  Pin DeepLC version to <3.1, avoiding calibration bug
  Refactor parsing of spectrum data: - Clearer logging when parsing precursor info from spectrum files - Always check if PSMs match with spectra based on observed precursor m/z (if available in PSM list) - Always raise error if not all PSMs can be found in spectrum file(s), before MS²PIP - Provide example PSM IDs from both PSM and spectrum file when matching fails. - Move all code to parse_spectra


          remove unused imports

5333e46


          remove unused import in deeplc feature generator

dd2259f


          add rustyms dependency

d24ef30


          drop rustyms requirement to 0.8.3

21cafc7

rustyms 0.9.0a3 requires python 3.11, while we support 3.9.


          mumble related changes

ca9da7d


          add mumble

c5b6eb0

As mumble hasn't been published on pypi yet, use a git dependency for
now.


          update mumble to use user cache dir

aee8ec7

paretje force-pushed the spectrum-feature-generator branch from 597544b to aee8ec7 Compare

January 21, 2025 15:13

Collaborator

paretje commented Jan 22, 2025

It might be worth considering setting these by default when running mumble through ms2rescore, given that these are kind of required for it to work, so at least it should be documented:

[ms2rescore.psm_generator.mumble]
keep_original = true
generate_modified_decoys = true


          bump im2deep dependency

7ce56c2

im2deep.utils was introduced in 0.3.0.
compomics/IM2Deep@0a4bc9d

Collaborator

paretje commented Jan 29, 2025

I had a quick look at the python 3.9 failure: this is because of mumble, which requires python 3.10. I think we could probably lower the requirement in mumble, as I don't think we actually use any new features, unless some of the dependencies do. And otherwise, we could of course just consider dropping python 3.9 for ms2rescore.

For the record, I used poetry to get a more useful error from the dependency resolver (adding the version to pyproject.toml, it seems dynamic version fields aren't supported):

The current project's supported Python range (>=3.9) is not compatible with some of the required packages Python requirement:
  - mumble-mod requires Python >=3.10, so it will not be satisfied for Python >=3.9,<3.10

Because ms2rescore depends on mumble-mod (0.2.0) @ git+https://github.com/compomics/mumble.git@114ad7d which requires Python >=3.10, version solving failed.

  * Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties

    For mumble-mod, a possible solution would be to set the `python` property to ">=3.10"

    https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
    https://python-poetry.org/docs/dependency-specification/#using-environment-markers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature