Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Exception Handling in _get_yaml_data_and_path(...) #392

Merged
merged 4 commits into from
Jul 4, 2024

Conversation

martinrieder
Copy link
Contributor

@martinrieder martinrieder commented Jun 20, 2024

Add ValueError and OSError where e.errno is None to Exception Handling of _get_yaml_data_and_path()

This PR is amending fixes for the following issues based on #250 and #318.

# if inp is a long YAML string, Pathlib will raise OSError [errno.ENAMETOOLONG]
# in Windows, ValueError or OSError [errno.EINVAL or None] also might be raised
# when trying to expand and resolve it as a path (depending on the Python version)
# Catch these specific errors, but raise any others

master contains #250 from d7d7854: Consolidate wireviz.parse() to handle Path, str and Dict as input

if type(e) is OSError and e.errno != ENAMETOOLONG:

release/v0.4.1-rc contains #318 at 6f9007f: Use portable OS error codes so program doesn't crash

if type(e) is OSError and e.errno not in (EINVAL, ENAMETOOLONG):

Due to the following #246 (comment), this is a bugfix only, waiting to be resolved by #251.

All comments have been addressed inside #251 or are no longer relevant after 251 is applied

References

@martinrieder martinrieder changed the base branch from master to release/v0.4.1-rc June 20, 2024 13:20
@martinrieder martinrieder changed the title Bugfix on #391 OSError where e.errno is None Bugfix on #391 OSError where e.errno is None Jun 20, 2024
@martinrieder martinrieder mentioned this pull request Jun 20, 2024
@martinrieder
Copy link
Contributor Author

martinrieder commented Jun 20, 2024

I can dig a little deeper to find the root cause, because the docs do not mention that errno could be none. This behavior might be specific to Python 3.12, which is not officially supported by WireViz (yet).

        try:
            yaml_path = Path(inp).expanduser().resolve(strict=True)
            # if no FileNotFoundError exception happens, get file contents
            yaml_str = open_file_read(yaml_path).read()
        except (FileNotFoundError, OSError) as e:
            # if inp is a long YAML string, Pathlib will raise OSError: [errno.ENAMETOOLONG]
            # (in Windows, it seems OSError [errno.EINVAL] might be raised in some cases)
            # when trying to expand and resolve it as a path.
            # Catch this error, but raise any others
            from errno import EINVAL, ENAMETOOLONG

            if type(e) is OSError and e.errno not in (EINVAL, ENAMETOOLONG):
                raise e
            # file does not exist; assume inp is a YAML string
            yaml_str = inp
            yaml_path = None

The exception asks for both, FileNotFoundError and OSError. Since one is a subclass of the other, errno should actually map to ENOENT.

PS : I am taking a peek at jaraco/path as possible drop-in replacement, although still proposinc this PR only as a Bugfix!

@martinrieder
Copy link
Contributor Author

@RedshiftVelocities, I noticed that a similar issue has been found by @kvid in #318 (review) a year ago. Could you please check this again?

It seems that behavior changes with different Python versions. The referred one uses ValueError in 3.9, while it is now reported as OSError in 3.12.

@kvid kvid linked an issue Jun 22, 2024 that may be closed by this pull request
@kvid kvid mentioned this pull request Jun 22, 2024
25 tasks
@kvid
Copy link
Collaborator

kvid commented Jun 26, 2024

@martinrieder wrote:

I noticed that a similar issue has been found by @kvid in #318 (review) a year ago. Could you please check this again?

It seems that behavior changes with different Python versions. The referred one uses ValueError in 3.9, while it is now reported as OSError in 3.12.

I can confirm that I still get ValueError: _getfinalpathname: path too long for Windows when trying WireViz on input YAML with 80000 character dummy string in Python 3.10.4 at Windows 10. It's not a very likely use case, but demonstrates a possible exception when trying to use random strings as a path. This edge case is resolved by including ValueError in the list of exceptions to catch.

Update: Tested again at a different PC with Python 3.9.13 with the current branch of PR #365 (without this PR merged in) with the valid a-80k.yml as input:

WireViz 0.4.1-dev
Input file:   _test\a-80k.yml
Output file:  _test\a-80k.[html|png|svg|tsv]
Traceback (most recent call last):
  File "C:\Repos\WireViz\src\wireviz\wv_cli.py", line 153, in <module>
    wireviz()
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "C:\Repos\WireViz\src\wireviz\wv_cli.py", line 141, in wireviz
    wv.parse(
  File "C:\Repos\WireViz\src\wireviz\..\wireviz\wireviz.py", line 90, in parse
    yaml_data, yaml_file = _get_yaml_data_and_path(inp)
  File "C:\Repos\WireViz\src\wireviz\..\wireviz\wireviz.py", line 409, in _get_yaml_data_and_path
    yaml_path = Path(inp).expanduser().resolve(strict=True)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\pathlib.py", line 1215, in resolve
    s = self._flavour.resolve(self, strict=strict)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\pathlib.py", line 210, in resolve
    return self._ext_to_normal(_getfinalpathname(s))
ValueError: _getfinalpathname: path too long for Windows

@martinrieder
Copy link
Contributor Author

@kvid what does errno evaluate to in that case? Which version of Pathlib are you running?

BTW: I do not consider running a huge YAML file to be that unusual. If a library of connectors is contained in the templates section, file size can increase quickly.

@kvid
Copy link
Collaborator

kvid commented Jun 27, 2024

what does errno evaluate to in that case?

An OSError exception has errno, but ValueError has not.

Which version of Pathlib are you running?

pathlib is a system library included in (and identified by) the Python 3.10.4 installation in my case. System libraries don't have separate version numbers AFAIK.

BTW: I do not consider running a huge YAML file to be that unusual. If a library of connectors is contained in the templates section, file size can increase quickly.

I agree to that, but my 80000 character test input was only a single long string, and that is a bit unusual. However, I just now added my 80000 character as one of the attributes of a connector in a valid WireViz YAML file, and the same ValueError was raised, so you are probably right it also can happen with a huge template section.

I suggest you include ValueError like this:

        except (FileNotFoundError, OSError, ValueError) as e:

@martinrieder
Copy link
Contributor Author

I suggest you include ValueError like this:

        except (FileNotFoundError, OSError, ValueError) as e:

Additionally, the if clause needs to be adapted, so that the exception is also raised.

My proposal to add some "band-aid" the fix:

  • If inp contains newline characters, treat it as YAML
  • Else pass it to Pathlib and catch its exceptions
  • Raise any unexpected exceptions
  • Fall back to YAML parsing on FileNotFoundError etc.

What do you think about treating all multi-line strings as YAML?

Besides bugfixing, would printing an error message if it is neither an existing file nor valid YAML provide a more user-friendly behavior instead of simply raising an exception?

@kvid
Copy link
Collaborator

kvid commented Jun 27, 2024

I suggest you include ValueError like this:

        except (FileNotFoundError, OSError, ValueError) as e:

Additionally, the if clause needs to be adapted, so that the exception is also raised.

No, the if only needs to raise other OSError exceptions than the identified errno values.

My proposal to add some "band-aid" the fix:

  • If inp contains newline characters, treat it as YAML

As I wrote a year ago in point 4 of #318 (review) that you referred to above, this is exactly what is already implemented in #251.

Catching more exceptions here is just a quick band-aid to avoid most errors until the more thorough fix in #251.

  • Else pass it to Pathlib and catch its exceptions
  • Raise any unexpected exceptions

No expected exceptions to catch after the #251 changes, so they are all raised normally.

  • Fall back to YAML parsing on FileNotFoundError etc.

No special catching of these exceptions are needed.

What do you think about treating all multi-line strings as YAML?

I agree, and #251 is the first to be merged after releasing the collections of quick-fixes.

Besides bugfixing, would printing an error message if it is neither an existing file nor valid YAML provide a more user-friendly behavior instead of simply raising an exception?

See #207

@martinrieder
Copy link
Contributor Author

Okay then I will keep things simple and it shall look like this:

try:
    yaml_path = Path(inp).expanduser().resolve(strict=True)
    # if no FileNotFoundError exception happens, get file contents
    yaml_str = open_file_read(yaml_path).read()
except (FileNotFoundError, OSError, ValueError) as e:   # adding ValueError per #318 
    # if inp is a long YAML string, Pathlib will raise OSError: [errno.ENAMETOOLONG]
    # (in Windows, it seems OSError [errno.EINVAL] might be raised in some cases)
    # when trying to expand and resolve it as a path.
    # Catch this error, but raise any others
    from errno import EINVAL, ENAMETOOLONG

    if type(e) is OSError and e.errno not in (EINVAL, ENAMETOOLONG, None): # adding None fixes #391
        raise e
    # file does not exist; assume inp is a YAML string
    yaml_str = inp
    yaml_path = None

@kvid
Copy link
Collaborator

kvid commented Jun 28, 2024

@formatc1702 seems to prefer not referring to issues or PRs in the source code - see #264 (comment)

Referring to them in the body of commit messages is OK and often useful, though.

I therefore suggest removing the two trailing comments including issue references. Instead I suggest expanding the comment block to explain the latest additions like this:

    # if inp is a long YAML string, Pathlib will raise OSError: [errno.ENAMETOOLONG]
    # (in Windows, it seems ValueError or OSError [errno.EINVAL or None]
    # also might be raised in some cases - depending on the Python version)
    # when trying to expand and resolve it as a path.
    # Catch these errors, but raise any others

@martinrieder
Copy link
Contributor Author

martinrieder commented Jun 28, 2024

Thanks. I will update the PR accordingly.

I discovered an additional OS-specific bug with expanduser() that arises when the file name contains a tilde ~ character. Should I include a fix for this as well, even though it is a separate issue?

@kvid
Copy link
Collaborator

kvid commented Jun 28, 2024

@martinrieder wrote:

Thanks. I will update the PR accordingly.

I discovered an additional OS-specific bug with expanduser() that arises when the file name contains a tilde ~ character. Should I include a fix for this as well, even though it is a separate issue?

If the fix is similar to what is already discussed (catching an extra Exception class or OSError.errno value), then I see no harm in including also that fix in this PR - just update the PR title and description to cover all sub-issues.

Please use the imperative mood in your commit subject lines as recommended in the contribution guidelines and I think the PR title should follow the same recommendation as for commit subject lines, because a squash commit will normally use the PR title as subject line.

@martinrieder martinrieder changed the title Bugfix on #391 OSError where e.errno is None Add ValueError and OSError where e.errno is None in Exception Handling Jun 28, 2024
@martinrieder martinrieder changed the title Add ValueError and OSError where e.errno is None in Exception Handling Add ValueError and OSError where e.errno is None to Exception Handling of _get_yaml_data_and_path(...) Jun 28, 2024
@kvid
Copy link
Collaborator

kvid commented Jun 28, 2024

I'm sorry for being unclear. When I wrote "update the PR title and description to cover all sub-issues", I didn't mean all details should be in the title. Please read all the commit recommendations at the page I linked to above. The subject line should be a short summary, preferably limited to 50 characters and never more than 72. Use the body for details.

@martinrieder martinrieder changed the title Add ValueError and OSError where e.errno is None to Exception Handling of _get_yaml_data_and_path(...) Fix Exception Handling in _get_yaml_data_and_path(...) Jun 29, 2024
@formatc1702 formatc1702 requested a review from kvid July 1, 2024 17:32
@formatc1702
Copy link
Collaborator

Thanks for digging into these more obscure issues despite my limited availability. It seems like a very minor code change with multiple positive effects, good job :) I assume it's ready for merge but have requested review just to be sure.

@martinrieder
Copy link
Contributor Author

Sorry, I did not get to commit the discussed changes over the weekend. I am doing that now and will keep you updated.

@kvid
Copy link
Collaborator

kvid commented Jul 1, 2024

@martinrieder wrote:

[...]
I discovered an additional OS-specific bug with expanduser() that arises when the file name contains a tilde ~ character. Should I include a fix for this as well, even though it is a separate issue?

Didn't you include a fix for this? I can't see any more code additions than what we discussed before you mentioned this last issue.

Or does the fix involve huge code changes that is clearly out of scope for this PR?

@martinrieder
Copy link
Contributor Author

No, I am working on another machine today and had some trouble trying reproduce the issue here. I even reinstalled Python and noticed that the installer asked me to remove MAX_PATH limit. The following link might give a hint on the issue you found in #318, which indicates that it is also depending on System configuration:
https://docs.python.org/3.11/using/windows.html#removing-the-max-path-limitation

When a file name starts with ~, the path gets expanded to C:\Users\. I am looking into finding a simple fix for this at the moment:

C:\Users\Martin\Git\WireViz\test>wireviz "~demo01.yml"

WireViz 0.4.1-dev
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Scripts\wireviz.exe\__main__.py", line 7, in <module>
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\Git\WireViz\src\wireviz\wv_cli.py", line 122, in wireviz
    raise Exception(f"File does not exist:\n{file}")
Exception: File does not exist:
C:\Users\demo01.yml

@martinrieder
Copy link
Contributor Author

@kvid as you can see from the output above, the exception is raised in wv_cli.py. It is unrelated to the issues discussed in this PR. Feel free to review and merge.

Regarding this other issue, I was able to trace it back to the argument handling of the Click library, which passes file to the wireviz() function:

@click.argument("file", nargs=-1)

Calling wireviz ".\~demo01.yml" evaluates to '.\\~demo01.yml',
while wireviz "~demo01.yml" evaluates to 'C:\\Users\\demo01.yml'

@kvid
Copy link
Collaborator

kvid commented Jul 1, 2024

At Linux it's normal to evaluate a path with a leading ~Martin to the home folder of a user named Martin, and it seems Python tries to make this work in a similar fashion at Windows. Se also https://docs.python.org/3/library/os.path.html#os.path.expanduser
and https://docs.python.org/3/library/pathlib.html#pathlib.Path.expanduser

If I understand your description correctly, it seems to work as designed. You get an exception because there is no user named demo01.yml at your PC, and hence no home folder named "C:\\Users\\demo01.yml".

Note also that if the leading ~ isn't followed by something that could be a username, the leading ~ is evaluated to the home folder of the current user.

@martinrieder
Copy link
Contributor Author

martinrieder commented Jul 2, 2024

@kvid thanks for explaining. It is of course unlikely for users to choose file names like this. Many Windows apps, especially MS Office create these as temporary files. They would typically not carry a .yaml file extension though. I do not intend to create a bug fix for this.

I would still like to criticize the fact that file names are evaluated in two different parts of the code. This could be the source of many more bugs like the one I discovered. Once there is the chance to do more than just bugfixing, this should be reworked.

Anyways, I am not planning to add more commits to this PR. I tested it locally (on Windows) and if the automated tests are also passing (on Linux) then it should be okay to merge this.

Copy link
Collaborator

@kvid kvid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • build_examples.py runs without errors, and produces the expected results (diffs on version numbers and image absoulute paths are normal).
  • My a-80k.yml from Fix Exception Handling in _get_yaml_data_and_path(...) #392 (comment) is processed without errors, and produces the expected result.
  • I'm not able to verify the fix for Help with OS Error #391 because I'm not able to reproduce the reported error with YMLFILE.txt from Help with OS Error #391 (I've tried with Python 3.9.13 and 3.12.4 because 3.12.3 is no longer available from Microsoft Store). I get no errors both before and after changes in this PR.
  • I suggest a simple squash and merge with a summary of the two resolved issues in the resulting commit body.

src/wireviz/wireviz.py Outdated Show resolved Hide resolved
Clarify the changes in wireviz#392

Co-authored-by: kvid <kvid@users.noreply.github.com>
@kvid kvid merged commit a04ddfd into wireviz:release/v0.4.1-rc Jul 4, 2024
2 checks passed
kvid pushed a commit that referenced this pull request Jul 5, 2024
In Windows might OSError(errno = None) be raised instead of the already
catched exceptions in some cases (depending on the Python version)

Fixes #391
kvid pushed a commit that referenced this pull request Jul 5, 2024
In Windows might ValueError be raised instead of the already
catched exceptions in some cases (depending on the Python version)

Fixes point 2 of #318 (review)
kvid added a commit that referenced this pull request Jul 5, 2024
Clarify all exceptions catched, including changes in #392

Co-authored-by: kvid <kvid@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Help with OS Error
3 participants