-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
networkutil: Add new download_file utility function #423
networkutil: Add new download_file utility function #423
Conversation
Based on discussion in #421 the following changes may be desirable:
I will push the changes to catch more errors because I think this is just general safety, but I'll wait for further review before implementing any other significant changes (unless I spot something egregious that I want to fix). |
In 78328f6, I put all the catches in the one I also moved the call to |
Thanks!
Ah, that's good. I wasn't aware that GitLab doesn't send the content length.
There is one thing that occurred to me, not related to your changes but to the code in general. We calculate the download progress using the following function. We do it by dividing download_progress = int(min(current_chunk / chunk_count * 98.0, 98.0)) If, for some reason, chunk_count = int(file_size / buffer_size) It might make sense to change that line: chunk_count = int(file_size / buffer_size) or 1
I wonder how likely such a case is. But yes, I would actually prefer if it only prints the warning an then continues.
Great!
I haven't yet seen anything that could break, but still good. We can migrate them slowly once we know it works perfectly.
The current behavior is that ProtonUp-Qt crashes and shows a warning. That warning dialog might be helpful if there is a permissions error.
Looks fine to me.
Yes, that is great. Thanks.
That is good to hear 😄
I think that one might be useful. To keep the changes low we could a hack like this: if not progress_callback:
progress_callback = lambda _: None |
Ah you're right! But sadly I don't think adding an
I can make that change no problem :-)
Good point, I think giving an "error code" so to speak is a better approach than This would allow us to catch and create dialogs for specific errors in the function. We would have to wrap the function call in a
I always learn new things when I open PRs here, I had no idea you could create a blank lambda like that, but that works perfectly. The
The PRs, they are a-flowin'. On this note, I haven't forgotten about migrating ctmods to use the other network functions, the remaining ones are just a bit tricky but I do have local proof-of-concepts of moving all ctmods over. Just to clarify that I don't intend to leave that work half-finished nor this work to move other ctmods into using this util function :-) |
Made the changes to make the I also pushed the change to simply warn that the progress bar may not display properly if the retrieved filesize is 0. Finally I also pushed a commit to ensure
The remaining thing to think about is how to return more fine-grained error messages from the function. |
Great, thanks.
I think we have three options:
The advantage of 3 and 4 is that the program will continue even if errors are not being handled. But I do not think this is a wanted behavior because we cannot work with a file that wasn't downloaded successfully. If we go with 2, the question is which errors we should use. We can use Python standard errors for that with a custom message. We need something for Internet-related errors (connection/endpoint/timeout errors) and local errors (permission, path does not exist, ...). Not sure if we should also handle generic errors such as "expected int, received str" because this kind of error should be found by the CI/unit testing and not happen sporadic when the user uses ProtonUp-Qt. |
Hmm, I do think 2 is the way to go ultimately but, do we really not want ProtonUp-Qt to continue if a download fails? If we can't download a compatibility tool, should it really crash? To go back to the thing that sparked this:
The problem we want to solve here is to catch and bubble errors up to whatever calls With Option 2 proposed, we would catch the many errors that could happen and return a new error for fine-grained control from the caller. For example with all of the network problems we could hit we could simply return some kind of custom I am not sure what the best pattern is in Python for writing error handling paths (or to be honest, in most languages 😅).
I agree, I think these kinds of errors are out of scope here. We want to focus on catching issues with something going wrong "externally" i.e. from downloads and when writing. So I think we should focus on catching the errors in try:
# Try to get the data for the file we want
try:
response: requests.Response = requests.get(url, stream=stream)
except (OSError, requests.ConnectionError, requests.Timeout) as e:
print(f'Error: Failed to make request to URL {url}, cannot complete download! Reason: {e}')
raise pupgui2.NetworkError # Custom error that we create |
We definitively should catch the errors at a higher level then, show a message box and continue operation normally.
We could just do something like this, but I don't think that is very Pythonic. Throwing a less fine grained error and handling it at a higher level seems like a better solution. def just_return_error():
try:
raise OSError("An error occured.")
except Exception as e:
return e
# Option 2
def throw_new_error():
""" Throws: RuntimeError """
try:
raise OSError("An error occured.")
except Exception as e:
raise RuntimeError("Some error related to the OS occured.") I kinda like how Rust and Go handle this.
The alternative, which I feel makes a bit more sense when we want to use Python runtime errors is to not actually handle errors inside the |
I'm not too familiar with Go, but Rust's approach to error handling is very interesting. I have been playing around with Rust in my own time and I do quite like it 😄 (inb4 there are 500 discussions opened about rewriting ProtonUp-Qt in Rust)
Yeah, it seems a bit silly to make a signal specifically for showing a messagebox. My experience handling exceptions is very sparse, apologies if this is a bit silly but, should we remove all of the try/except blocks which return I guess it isn't wholly necessary to keep the try/excepts that we can't handle in Are there any good practices for noting in docstrings / elsewhere what exceptions this function might raise? In case a caller wants to more specifically handle each exception (we can always look at the code, but still :-)) |
b3bc62d
to
9dd4884
Compare
I pushed a change to raise the exceptions in Also rebased the branch on top of main specifically to pull in the newer dependencies and the tests. Sorry if this is spawning another discussion but should I include a test for |
Great.
There isn't really a signle standard for this and we aren't following any of the few conventions either.
Feel free to add it in this request if you want to.
You may take a look at https://requests-mock.readthedocs.io/en/latest/ |
Okay, wired up the error dialogs for Luxtorpeda and DXVK. They now look like this, I forced the download to fail by setting the URL to something invalid when calling Might tweak the dialog title very slightly to note a download error or something but I think it's good to have this kind of notification! I will take a stab at adding a test for the EDIT: Another potential library of interest is |
Pushed the changes to make the error dialog strings translatable, and it looks like the dialog is still working fine :-) I originally made these separate variables, but figured it was fine to just change them to be inline in the function parameters but with newlines. As for the test for this function, I'm still looking into it, trying to learn some of the best practices for PyTest, how to use things like fixtures, the |
Thanks. I've added a few commits:
Okay, I will merge this now as it seems to work fine. We can concentrate on the testing in a later PR. Remarks:
|
This is something that I've seen discussed around in general around testing, what is the "ideal" test coverage? The only consistent thing I have seen is that 100% coverage doesn't have to be the goal, because there's no need to end up writing tests just to tick a box. Tests are there to give you confidence in the code you write and help spot/avoid defects. I think this is something best decided on a case-by-case basis. I don't know if there's a good way to outline fully what new features may and may not require a test. On one hand something trivial may not require a test, but on the other hand if it's so trivial it may be straightforward to write a test for. I don't have a good answer :-) But maybe overtime we'll figure out a good pattern! |
Overview
This PR adds a new utility function,
download_file
, which takes the repeated download code in ctmods and turns it into a generic function.This function lives in a new file,
networkutil.py
, and in future we can refactor and move the other network-related functions in here. Instead of further pollutingutil.py
I figured I'd make a new file, which should also hopefully reduce the load when refactoring and moving the other methods for fetching from GitHub/GitLab into this file. Doing this should also aid in writing tests!Background
Each ctmod has to download the respective archive from its API. In almost every ctmods case (or maybe actually for all of them) the code is duplicated and does the same thing:
65536
bytes at a time, when downloading we only load65536
bytes into memory at a time when downloading.stream
the request, we can do this lazily and overall save on memory.65536
bytes at a time)True
if download succeeded,False
otherwiseThis is a repeated 25 lines or so across each ctmod. Each time we create a new ctmod this is copied and pasted. If we ever had to adjust this we would have to potentially touch quite a few ctmods. A base ctmod class may resolve this somewhat but in the meantime it puts complexity for a generic action inside of a ctmod. There is a better way!
The Better Way
(Okay, "better way" is an exaggeration 😄)
The tl;dr is that we now have a util function that can download files. It is a close copy of each ctmod's
__download
method but has more catches for handling errors, and also has better logic for grabbing the file size by allowing us to pass a known file size, so we can consistently work out the download progress, even ifContent-Length
is blank, without needing to resort tolen(response.content)
which defeats the purpose of streaming the file and stalls the function call. We should have the file size head of time in most cases as when usingfetch_project_release_data
one of the fields we try to parse from the assets list is the filesize. GitHub and GitLab should have thesize
in its data, but GitLab is the only one that doesn't haveContent-Length
in the response headers because it usesTransfer-Encoding: Chunked
. So instead of usinglen(response.content)
we can use the knownsize
to work out the download progress. This also means this new util function fixes the currently broken DXVK Async progress bar.Instead of repeating all of this logic in each ctmod, we can make a util function for this. Downloading files is a very generic action we would generally want to use the same implementation, at least when it comes to ctmods.
So that's what this PR does. I re-wrote the ctmod download function by hand, added in some extra try/except checks, and even did a little extra spice I'll talk about in a second.
The function takes the following parameters:
progress_callback
which is a function we can call from inside our download function to send the download progress__set_download_progress_percent
method on each ctmod that handles emitting to the progress bar! This is how we update the progress from inside this function.download_cancelled
property, which is aProperty
class specific to Qt. We use this to manage if the download was cancelled on-the-fly (i.e. if ProtonUp-Qt was closed)Property
is specific to Qt, which makes this function far less generic than I'd like. The only alternative I found was to use some kind of async signals that Python has, but then that becomes a bit too generic. So I think usingProperty
was the path of least resistance.buffer_size
which defaults to the same as what most ctmods use,65536
stream
boolean given torequests.get
to know whether tostream
the request or not. This defaults toTrue
, which is what the current ctmod download method uses.known_size
which is the size of the file we want to download, if we know it ahead of time we can pass it in meaning we don't have to rely onContent-Length
which may not always be available, and we may not always be able to get the file size (meaning we won't be able to calculate the download progress)0
which means we fall back to trying to figure out the file size ourselves.A lot of this is pretty standard, but being able to specify
stream
andknown_size
might seem strange, so I'll explain. This is the "extra spice" I was talking about!Getting the File Size to Calculate Download Progress
Currently when we download files, we pretty much always use
stream=True
. This won't load the whole request to my knowledge, saving on memory. We then get the file size by taking it from the response'sContent-Length
header. But, this header is not always available. It should always be given when making a request to GitHub, but GitLab does not send it.The reason GitLab does not send this header is because it uses a
Transfer-Encoding
type ofChunked
. This basically downloads the files in chunks already I guess. But this means we can't get the file size! To work around this in #302, I simply usedlen(response.content)
based on guidance in this StackOverflow answer. But this has two problems:stream=True
as it loads the entire response.In other words, right now, if we don't have
Content-Type
, we can't set the download progress properly.However, we aren't defeated yet! When we grab the
asset
we want to download from GitHub or GitLab usingfetch_project_release_data
we can get back thesize
from the release asset. This should be available on release API responses for both GitHub and GitLab, but we only need it for GitLab because we have no other way of getting the file size without resorting tolen(response.content)
, which is undesirable for the reasons above.So when we call our new util function, it will do the following to figure out the filesize:
known_size
above 0, we will use this.Content-Length
header valueContent-Length
header:len(response.content)
to get the file size, but ONLY if we are NOT streaming the response.stream=True
.stream=False
when calling the util function.0
, print an error andreturn False
because if we don't have the file size we can't report the download progress, resulting in a broken progress bar (and we are probably using the function wrong in such a scenario).This is why we can pass whether or not to
stream
the response, and why we can pass a known file size to the function. If we already have the size when calling the function, there is no need to fetch it in the function, we should prefer our known size. Similarly if we know we won't getContent-Length
back in our response headers (such as if we know we're making a GitLab API call without a known file size ahead of time) then we can passstream=False
so that we load the request response upfront when making the call.And because we can pass
known_size
to this util function, this means our GitLab calls now have fixed progress bars! DXVK Async is the only GitLab call we make right now, and it has a broken progress bar because we have to uselen(response.content)
. This results in the progress bar going to 1% and then when the download finishes it goes to 99% when the function ends. With this PR, the progress bar works correctly.The explanation may be a bit long-winded but I hope it helps illustrate what the function does similarly and differently, why I made certain choices, and the benefits of those choices. It basically allows us to set the file size ahead of time if we know it instead of relying on the response which may not always give us the results we want, and for ctmods we should usually have this in the
data
dict anyway.Implementation
Now that explanation of how it works is out of the way, here's how the function is implemented in this PR.
I decided to only refactor the
__download
call in two ctmods for now: DXVK (includes DXVK Async) and Luxtorpeda (includes Boxtron and Roberta). The rationale here is that we modify a lower-risk ctmod for each launcher; DXVK for Lutris and Heroic, Luxtorpeda for Steam.Modifying these ctmods acts as a "proof of concept" for this util function and to serve as a base implementation. I think I did something similar with the other network functions for refactoring how we fetch the release information for each ctmod.
With this change the DXVK and Luxtorpeda ctmods look a lot cleaner in my opinion. The majority of the complexity is now stripped out and replaced behind generic functions. The only remaining complexity in these ctmods really is the boilerplate ctmod stuff and the extraction logic probably being better suited to go behind an
__extract_tool
method in each ctmod. But it is overall much cleaner now in my opinion!Concerns
I am not sure if we should fail if we cannot get the file size. Right now we fall over in that case by returning
False
but let me know if we should just print a warning and continue!Also, I am unsure if we should have any catches around writing to the file in case it fails. We have this brand new function so we might as well try to make it as safe as possible. We already have some new catches in place.
And I am also concerned that the function may not be as generic as it could be. Please let me know if there is anything I can improve with it!
Remaining work
I think the remaining thing is to add in the ability to take a request's session, i.e. each ctmod's
self.rs
. But I'll let this be reviewed in its current state first 😄Apart from that, we can either add other ctmods in this PR or do them in follow-up PRs. I would prefer to do them in follow-up PRs personally but I understand wanting to get them all out of the way in this PR!
I hope this is seen as a welcome change. I have an interest in improving things like this around the codebase where possible, and doing it in a maintainable way. Breaking things into generic functions that we can use around the codebase as well as eventually moving things into separate files (such as breaking up
util.py
a bit more), is something I hope can bring "architectural" improvements to the codebase 😄ProtonUp-Qt is my reference when I start a personal Python project so any enhancements on the technical side might help others that do the same. And it might also help others get into contributing if we can break apart complexity into smaller re-usable chunks. Plus, this PR actually fixes a small bug, so it has some user-facing impact too!
As usual all feedback is welcome. Thanks!
P.S. The diff for this is more into the additions than deletions, but as this gets extended to more ctmods, the lines removed will outweigh the lines added. The initial additions cost comes from writing the function, which is a bit large.