Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow float sizes for 32bit support #1444

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

come-nc
Copy link
Contributor

@come-nc come-nc commented Jan 24, 2023

For 32bit support in Nextcloud we are typing sizes as int|float as PHP uses float for bigint.

Would you consider merging this change?

@codecov
Copy link

codecov bot commented Jan 24, 2023

Codecov Report

Merging #1444 (301ac98) into master (5684c75) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #1444   +/-   ##
=========================================
  Coverage     97.33%   97.33%           
  Complexity     2830     2830           
=========================================
  Files           176      176           
  Lines          9419     9419           
=========================================
  Hits           9168     9168           
  Misses          251      251           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@phil-davis
Copy link
Contributor

When a file size gets bigger than PHP_INT_MAX (e.g. about 2GB on 32-bit systems) then the float representation is going to have some maximum precision. On 32-bit systems, does PHP actually use some 64-bit floating-point binary representation? (that would end up across 2 32-bit registers...) Or does it squeeze down into a 32-bit floating point representation like https://en.wikipedia.org/wiki/Single-precision_floating-point_format ?

And so as the file size gets bigger, it gradually becomes more granular. Something roughly like:

  • from 2GB to 4GB the size will effectively be to an even number of bytes (an odd number of bytes will not have an exact representation)
  • from 4GB to 8GB the representable sizes will be multiples of 4
  • from 8GB to 16GB the representable sizes will be multiples of 8
  • and so on

And so getSize() will return a number that, when "expanded" as a big int, may be a few bytes more or less than the real size.

How is that likely to impact on clients that want to:

  • read the whole file (maybe they do not care much, and just keep reading until EOF is reached)
  • read parts of the file (video players that "jump around" getting parts of the file (they might do something weird at the very end of playing the video)

@phil-davis
Copy link
Contributor

@come-nc do you also want/need to expand the return type of implementations such as
https://github.com/sabre-io/dav/blob/master/lib/DAV/SimpleFile.php getSize()
https://github.com/sabre-io/dav/blob/master/lib/DAV/FS/File.php getSize()

or do you just need IFile to declare the possibility of returning a float from getSize() ?

@staabm
Copy link
Member

staabm commented Jan 30, 2023

may I ask about the use-case? I guess you need this more precise return type for static analysis tools or similar?

@come-nc
Copy link
Contributor Author

come-nc commented Jan 30, 2023

When a file size gets bigger than PHP_INT_MAX (e.g. about 2GB on 32-bit systems) then the float representation is going to have some maximum precision. On 32-bit systems, does PHP actually use some 64-bit floating-point binary representation? (that would end up across 2 32-bit registers...) Or does it squeeze down into a 32-bit floating point representation like https://en.wikipedia.org/wiki/Single-precision_floating-point_format ?

I do not know the details on that, I only know what happens on PHP side of typing, sorry.

And so getSize() will return a number that, when "expanded" as a big int, may be a few bytes more or less than the real size.

How is that likely to impact on clients that want to:

  • read the whole file (maybe they do not care much, and just keep reading until EOF is reached)
  • read parts of the file (video players that "jump around" getting parts of the file (they might do something weird at the very end of playing the video)

This is already the case, getSize is returning a float for these files on 32bits, it’s just that static analysis believe it’s an int because of the docblock.

@come-nc do you also want/need to expand the return type of implementations such as https://github.com/sabre-io/dav/blob/master/lib/DAV/SimpleFile.php getSize() https://github.com/sabre-io/dav/blob/master/lib/DAV/FS/File.php getSize()

or do you just need IFile to declare the possibility of returning a float from getSize() ?

For Nextcloud we only use IFile from what I can see.
But I think it’s better to align File and SimpleFile as well as long as sabre is not officially dropping support for 32bits, it should advertise that sizes can be floats.

may I ask about the use-case? I guess you need this more precise return type for static analysis tools or similar?

Yes. This change allow static analysis tools such as psalm to warn us about places where we are making the assumption that size is an int and that will break on 32bit installations. Then we can either fix this code to support float or add a check and error out when it’s a float and document a known limitation on 32bit.

For Nextcloud sizes are mostly used for quota checks and storage space use estimation, so sizes being a few bit off on huge files should not be a problem. And is better than a crash anyway.

@staabm
Copy link
Member

staabm commented Jan 30, 2023

For Nextcloud sizes are mostly used for quota checks and storage space use estimation, so sizes being a few bit off on huge files should not be a problem. And is better than a crash anyway.

one thing I can think of where this might get problematic would be e.g. a Content-Length header when sent as e.g. float and does therefore no longer match the actual bytes on the filesystem and therefore would result in a corrupt download.

but I guess these cases would get obvious when we have this more precise return type, right?

@come-nc
Copy link
Contributor Author

come-nc commented Jan 30, 2023

For Nextcloud sizes are mostly used for quota checks and storage space use estimation, so sizes being a few bit off on huge files should not be a problem. And is better than a crash anyway.

one thing I can think of where this might get problematic would be e.g. a Content-Length header when sent as e.g. float and does therefore no longer match the actual bytes on the filesystem and therefore would result in a corrupt download.

but I guess these cases would get obvious when we have this more precise return type, right?

Not sure, the typing will tell you it’s a float but not that it’s imprecise. It all depends on how you are building the header I suppose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants