Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reader: unzip and untar data without checking the filename #2

Merged
merged 1 commit into from
Feb 25, 2020
Merged

reader: unzip and untar data without checking the filename #2

merged 1 commit into from
Feb 25, 2020

Conversation

s-stepien
Copy link
Contributor

Recently, MaxMind has changed the URL from which the free database can
be downloaded. In the new URL schema, the suffix is given as a URL
parameter, so we need to handle it correctly during unpacking.

Signed-off-by: Sławomir Stępień sst@poczta.fm

@s-stepien s-stepien requested a review from mneudert February 24, 2020 19:53
@s-stepien
Copy link
Contributor Author

Well, this is not the best thing we can do. It would be better for parse the URL, but would be super nice to check the headers (if this is even possible) and detect gnuzip and tar type data. I will take a look at better approach.

Since filename can be also an URL, it's more wise to make decision based
on data and not the filename.

This commit will try to unzip and untar the data, and if it fails it
will pass the data down the pipe.

Signed-off-by: Sławomir Stępień <sst@poczta.fm>
@s-stepien s-stepien changed the title reader: handle MaxMind URL file extensions reader: unzip and untar data without checking the filename Feb 25, 2020
@s-stepien
Copy link
Contributor Author

Commit updated. I think this is better approach.

Copy link
Member

@mneudert mneudert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ 💙 💚 💜 💛

@mneudert mneudert merged commit d9dbfd9 into elixir-geolix:master Feb 25, 2020
@mneudert
Copy link
Member

Just out of curiosity I ran a benchmark with the changes:

Benchee.run(
  %{
    "read" => fn -> Geolix.Adapter.MMDB2.Reader.read_database("GeoLite2-City.mmdb") end
  },
  formatters: [{Benchee.Formatters.Console, comparison: false}],
  warmup: 2,
  time: 10
)

Following values where output using an already unpacked City database (50-somewhat MB):

# untar/unzip with suffix check
Name           ips        average  deviation         median         99th %
read        135.48        7.38 ms    ±12.45%        7.21 ms        8.61 ms

# always untar/unzip
Name           ips        average  deviation         median         99th %
read         36.83       27.15 ms    ±11.83%       26.63 ms       45.35 ms

"Future versions™" might provide a more fine grained control on what to run for reading the file but as far as I am concerned these times they should not matter much right now. And it at allows to use the new download locations 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants