Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to strip metadata when parsing PO files #141

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

maennchen
Copy link
Member

@maennchen maennchen commented Sep 5, 2024

Intended to be used in Gettext.Backend to speed up compiler and minimize memory usage when meta information is not needed.

Uses roughly a third of memory and is two times faster than including all meta information.

Performance Test

#!/usr/bin/env elixir

Mix.install([
  {:expo, path: "."},
  {:large_translations_catalogue,
   github: "jshmrtn/hygeia",
   branch: "main",
   app: false,
   compile: false,
   depth: 1,
   sparse: "priv/gettext"},
  {:benchee, "~> 1.3"}
])

require Logger

catalogue_base_path =
  Path.join([Mix.install_project_dir(), "deps", "large_translations_catalogue", "priv", "gettext"])

Logger.info("Preparing .mo files")

po_files = Path.wildcard(Path.join([catalogue_base_path, "*", "LC_MESSAGES", "*.po"]))

for path <- po_files do
  mo_path = Path.rootname(path, ".po") <> ".mo"
  Mix.Task.rerun("expo.msgfmt", [path, "--output-file", mo_path])
end

mo_files = Path.wildcard(Path.join([catalogue_base_path, "*", "LC_MESSAGES", "*.mo"]))

Logger.info("Performance Test Expo Parse")

Benchee.run(
  [
    expo_po_parse: fn ->
      for file <- po_files, do: Expo.PO.parse_file!(file)
    end,
    expo_po_strip_meta_parse: fn ->
      for file <- po_files, do: Expo.PO.parse_file!(file, strip_meta: true)
    end,
    expo_mo_parse: fn ->
      for file <- mo_files, do: Expo.MO.parse_file!(file)
    end
  ],
  warmup: 5,
  time: 10,
  memory_time: 2,
  parallel: System.schedulers_online()
)

Results:

13:48:54.184 [info] Preparing .mo files

13:48:54.236 [info] Performance Test Expo Parse
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.76 GB
Elixir 1.17.2
Erlang 27.0.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 5 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 8
inputs: none specified
Estimated total run time: 51 s

Benchmarking expo_po_parse ...
Benchmarking expo_po_strip_meta_parse ...
Benchmarking expo_mo_parse ...
Calculating statistics...
Formatting results...

Name                               ips        average  deviation         median         99th %
expo_mo_parse                    56.77       17.62 ms    ±31.16%       16.07 ms       35.65 ms
expo_po_strip_meta_parse         14.12       70.81 ms    ±14.25%       69.67 ms      102.19 ms
expo_po_parse                     6.54      152.92 ms    ±11.27%      151.50 ms      199.97 ms

Comparison: 
expo_mo_parse                    56.77
expo_po_strip_meta_parse         14.12 - 4.02x slower +53.19 ms
expo_po_parse                     6.54 - 8.68x slower +135.31 ms

Memory usage statistics:

Name                             average  deviation         median         99th %
expo_mo_parse                    4.05 MB     ±0.00%        4.05 MB        4.05 MB
expo_po_strip_meta_parse        11.20 MB     ±0.00%       11.20 MB       11.20 MB
expo_po_parse                   33.67 MB     ±0.00%       33.67 MB       33.67 MB

Comparison: 
expo_mo_parse                    4.05 MB
expo_po_strip_meta_parse        11.20 MB - 2.77x memory usage +7.15 MB
expo_po_parse                   33.67 MB - 8.32x memory usage +29.62 MB

@maennchen maennchen added the enhancement New feature or request label Sep 5, 2024
@maennchen maennchen self-assigned this Sep 5, 2024
@coveralls
Copy link

coveralls commented Sep 5, 2024

Pull Request Test Coverage Report for Build ff379f76a243351fcb49ffa3076f519e522cc105-PR-141

Details

  • 17 of 18 (94.44%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 97.936%

Changes Missing Coverage Covered Lines Changed/Added Lines %
lib/expo/po/tokenizer.ex 15 16 93.75%
Totals Coverage Status
Change from base Build deb65ba9cbda69768ded2dcfac751dd6f8046828: -0.2%
Covered Lines: 427
Relevant Lines: 436

💛 - Coveralls

@whatyouhide
Copy link
Contributor

@maennchen so what would be a use case where meta information is not needed? Runtime translations? I thought we were still discussing if that's something we want.

@maennchen
Copy link
Member Author

@whatyouhide We need all meta information when we're extracting translations to merge correctly. However when we compile the backend, we don't need any comments etc.

I should choose a better name than runtime, especially in the current context it is misleading. Maybe messages_only or something. Ideas?

@whatyouhide
Copy link
Contributor

@maennchen something like "strip meta" maybe? IIRC that's what we use in the various Elixir compiler stuff.

@maennchen maennchen force-pushed the jm/po_parser_mode_runtime branch from a3b2977 to 9961bb9 Compare September 5, 2024 13:47
@maennchen maennchen changed the title Introduce PO Parser mode runtime Introduce PO Parser Metadata Stripping Sep 5, 2024
@maennchen
Copy link
Member Author

@whatyouhide Changed to strip_meta 👍

@maennchen maennchen force-pushed the jm/po_parser_mode_runtime branch from 9961bb9 to 38f7236 Compare September 5, 2024 13:50
Copy link
Contributor

@whatyouhide whatyouhide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments but this looks great.

lib/expo/po.ex Outdated Show resolved Hide resolved
lib/expo/po.ex Outdated Show resolved Hide resolved
lib/expo/po/parser.ex Outdated Show resolved Hide resolved
lib/expo/po/tokenizer.ex Outdated Show resolved Hide resolved
@maennchen maennchen force-pushed the jm/po_parser_mode_runtime branch from da5d93b to ff379f7 Compare September 6, 2024 10:24
@maennchen
Copy link
Member Author

maennchen commented Sep 9, 2024

@whatyouhide Are we good to merge?

@whatyouhide whatyouhide changed the title Introduce PO Parser Metadata Stripping Add option to strip metadata when parsing PO files Sep 10, 2024
@whatyouhide whatyouhide merged commit bd94b21 into main Sep 10, 2024
8 checks passed
@whatyouhide whatyouhide deleted the jm/po_parser_mode_runtime branch September 10, 2024 06:33
@whatyouhide
Copy link
Contributor

Fantastic @maennchen 💟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants