Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix URL encoded filenames #104

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
Changelog
=========

2.4.1 (2021-12-23)
------------------

- Fix URL decoding of filenames.
`#104 <https://github.com/hynek/doc2dash/pull/104>`_


----


2.4.0 (2021-11-16)
------------------

Expand Down
5 changes: 4 additions & 1 deletion src/doc2dash/parsers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import errno
import logging
import os
import urllib

from collections import defaultdict

Expand Down Expand Up @@ -108,7 +109,9 @@ def patch_anchors(parser, show_progressbar):

def patch_files(files):
for fname, entries in files:
full_path = os.path.join(parser.doc_path, fname)
full_path = os.path.join(
parser.doc_path, urllib.parse.unquote(fname)
)
with codecs.open(full_path, mode="r", encoding="utf-8") as fp:
soup = BeautifulSoup(fp, "html.parser")
for entry in entries:
Expand Down
25 changes: 25 additions & 0 deletions tests/parsers/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ def entries():
]


@pytest.fixture
def entries_url_format():
return [
ParserEntry(name="foo", type="Method", path="foo%20bar.html#foo"),
ParserEntry(name="qux", type="Class", path="foo%20bar.html"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a point to making this a fixture? I think it can be safely inlined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just add a new entry in the entries fixture but I wanted to have a separate test just for checking URL-encoded filenames. I see know it can be done without a new fixture. Give me a sec.

]


class TestPatchTOCAnchors:
@pytest.mark.parametrize("progressbar", [True, False])
def test_with_empty_db(self, progressbar):
Expand All @@ -71,6 +79,23 @@ def test_single_entry(self, monkeypatch, tmpdir, entries):
TOCEntry(name="foo", type="Method", anchor="foo")
] == parser._patched_entries

def test_single_entry_url_format(
self, monkeypatch, tmpdir, entries_url_format
):
mofeing marked this conversation as resolved.
Show resolved Hide resolved
"""
URL-encoded filenames are decoded before being added.
"""
foo = tmpdir.mkdir("foo")
foo.join("foo bar.html").write("docs!")
parser = FakeParser(doc_path=str(foo))
toc = patch_anchors(parser, show_progressbar=False)
for e in entries_url_format:
toc.send(e)
toc.close()
assert [
TOCEntry(name="foo", type="Method", anchor="foo")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, doesn't this mean that you've lost a TOCEntry somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm lost on writing a test. I though I could just add a new entry to the entries fixture, copy the test_single_entry test in test_utils.py and change it so it checks all the entries have been correctly parsed. But I don't understand what is happenning in here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I’ll have a look after Xmas. Happy holidays and thanks for your work so far!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and happy holidays to you too!

] == parser._patched_entries

def test_complains(self, entries, tmpdir):
"""
If patching fails, a debug message is logged.
Expand Down