-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HumanSizeWithPrecision invalid scientific notation #45
Open
kmirzavaziri
wants to merge
1
commit into
docker:master
Choose a base branch
from
kmirzavaziri:44-fix-human-size-with-precision
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+90
−8
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused, what is the precision int supposed to signify? I can't understand how "0.009GB" is a correct expectation for 9.5MB with a precision of 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, the precision of 1 here means having at most one digit (summing up left and right side of the period), which is impossible to render for any number greater than or equal to 10. I personally prefer producing an error for precisions of 2 or less, but it would be a breaking change. (The reason it works with precision 3 or higher is that we want to render the maximum number of 999 which is three digits. Because: for 1000 or higher, we would raise the unit to the higher level).
For more clarification: For example rendering the value of
12.123456
with precision 3 must result in12.1
and not12.123
if I comprehend the intention of the function correctly, and as go formatting works.So I thought we must try to return meaningful values for precision 3 or higher, and disregard lower precisions. I am not sure what would be the most suitable way to handle lower precisions but I am willing to discuss and implement the result that reflects the mindset of docker.
Note: We can have somehow meaningful return values with precision 2, because any number with three digits (>= 100), can be converted to the higher unit occupying the maximum of 2 digits. But it is impossible to have a correct return value for some numbers with precision 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to look as well; not all of this was ever properly "documented", and some of the behavior most definitely got into the codebase "organically" (parts of this code at least originated from the moby/moby ("docker engine") codebase.
I know that long (long) time ago I opened a ticket related to output that confused me, which was because different number of decimals were used in the output which, combined with confusion about decimal separators (
.
vs,
differing between locales) at least confused "me"; moby/moby#9977 (comment)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching within the
docker
andmoby
orgs here on GitHub:moby/containerd
only seems to useCustomSize()
in one spot;docker
org we seem to be usingHumanSizeWithPrecision()
in:cli
;compose
;docker-ce
;registry-cli
CustomSize()
only in thecs-libnetwork
repo.Not a ton of occurrences, so I'd be happy if we could come together on a better definition for this func and make the required changes wherever they may be needed.
If we agree to continue using the word "precision" to represent the total number of digits in the size representation, as it seems to currently be, instead of the number of digits after the decimal point (which I think would be a much better way of both defining and using the func) then I think we should adjust how the function works so that the output is at least consistent for all precision values used.
All values should be approximated anyway, so if we need to approximate 9.5*MB with a "precision" of 1, then I think the result should be the closest representation using a single digit, so for example 9MB instead of 0.009GB.
From what I've managed to find searching around, we never use precision values of 1 or 2, so I wouldn't worry too much about the "validity" of the output given that it's an approximation. We can always write better documentation on the functions so things are a bit clearer for the users, or outright make 3 the minimum precision value.
eg To be consistent, I think these tests should pass
My personal preference would be to fix the definition and use actual number of digits after the decimal point as the meaning of
precision
, maybe by writing a new func and adjusting existing code that uses the old func to use the new func before removing it entirely.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comprehensive description.
I believe that configuring the total number of digits (as opposed to the number of digits after the decimal point) when calling the function, would make a better functionality, since callers of the function mostly care about the maximum length of the return value (for example rendering a size to put it in a table in docker cli).
Although I strongly agree that the conventional definition of 'precision' typically refers to the number of digits after the decimal point, but Golang standard library uses this word for "width" definition in an exceptional case.
However I do agree that
precision
is not a good name and it is better to use a name such aswidth
(or even better), but this also results in breaking the function nameHumanSizeWith"Precision"
.Also if we are sure that no one calls the function using precision 2 or 3, My personal opinion is that it is better to return empty string or behave in a similar way signaling an error (with proper documentation). An error, or a totally wrong answer makes it easier to spot and avoid using such a precision, than a misleading one (such as returning
"9MB"
for the input offloat64(501*MB)
) which might potentially result in hours of confusion and debugging.I am concerned about modifying the function signature and behavior, as I am unsure about the extent to which we should prioritize compatibility with the go-units library. Do we only care to update
docker
andmoby
orgs usages, or we prefer not to break any other random user of this module as well? Understanding the policy would aid me in making more informed decisions. And I am open to adding an improved function, updating caller projects gradually, and eventually removing the old one, if feasible and preferred. (As it would make it more consistent across all projects)Anyway, I'm new to the open source and Docker community, so I'd appreciate your guidance. If you have any thoughts or know of community conventions I may not be aware of, please feel free to share. I'm open to adapting and aligning with best practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @krissetto, I'm looking forward to a final decision. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kmirzavaziri!
I don't think I'll be able to give a final decision here as I'm also new to Docker and I don't know how we manage these kinds of situations, but I'll leave my personal opinion and some observations here, waiting to be corrected by the powers that be ;)
TL;DR
I would think about creating a new function with more consistent behavior and naming, migrating our internal codebase to use it and marking the old func as deprecated somehow, but leaving it available in case any external codebases are using it. I don't think the "width" of the response should be part of the func signature, but only the precision (and maybe an optional maximum desired unit of size? eg
TB
). Handling the actual display size of the output seems to me to be a responsibility of whatever client decides to consume this library, and not something we should be dealing with here.Some observations
I don't really like how we're using the concept of precision currently, as it goes against most of the stdlib fmt use-cases (which consider both width - min number of unicode runes - and precision - chars to the right of the decimal point). What we're actually using is the concept of "significant digits", which Go seems to handle in a peculiar way when using the
%.*g
syntax.Take a look at the following test case we have:
assertEquals(t, "0.009GB", WowWithPrecision(float64(9.5*MB), 1))
Here the result ends up being 0.009GB, as 9 is the precision of one (as in 1 significant digit). This is still somewhat ok as it's still human-readable, but I think we should adjust our logic to leave the representation as 9.5MB, which is shorter in width, more human-readable, and actually has a "precision" of 1 (the kind of precision we like).
Checking this test case instead:
assertEquals(t, "1e+04YB", HumanSizeWithPrecision(float64(10000000000000*PB), 1))
I think we have more of a proble. Using the
%.*g
syntax, Go is converting the output to scientific notation in order to keep 1 significant digit, the 4 at the very end. My guess is that 1e+0.. is not considered as significant because it's part of the standard scientific notation, but I'm sure most-all of us would agree that it's not exactly human readable.∞
would probably be better after a certain file size 😂If we were to use
%.*f
instead, the output would become10000.0YB
. 1 digit of (in this case unnecessary) precision, but a relatively "wide" total number since we currently max out our units at YB, and thus cannot divide any further.In the first test example, using
9.5MB
, the func would return 0.0GB using%.*f
and a precision of 1, since we are still doing one too many divisions and don't have enough precision available to represent the value at all.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krissetto Thank you for sharing the observations. I completely agree with your perspective on the situation. The only concern I have regarding the use of "precision" instead of "significant digits", is that it may pose challenges for callers in controlling the length of strings, potentially making it more difficult to address issues like this one.
However, it's worth noting that we still cannot guarantee the maximum length, at least not for
precision < 3
or for sizes over999YB
(unless we return inaccurate responses such as-
or∞
). Also there are some workarounds for the callers. And since they cannot rely on the implementation of this function anyway, they must control the length themselves.Anyway, I am happy in both ways, and believe each can have their own pros and cons. @thaJeztah, based on the context and our conversion, a final decision would be appreciated.