Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: behaviour of readFile and writeFile with file descriptors #23433

Closed
thefourtheye opened this issue Oct 11, 2018 · 45 comments
Closed

fs: behaviour of readFile and writeFile with file descriptors #23433

thefourtheye opened this issue Oct 11, 2018 · 45 comments
Labels
discuss Issues opened for discussions and feedbacks. fs Issues and PRs related to the fs subsystem / file system.

Comments

@thefourtheye
Copy link
Contributor

thefourtheye commented Oct 11, 2018

  • Version: v11.0.0-pre
  • Platform: MacOS, Ubuntu (I have tried in these two operating systems personally)
  • Subsystem: fs

This problem has been discussed multiple times without reaching any conclusion. That is why opening a separate issue to discuss.

Documented Expectations

As per our docs

readFile is defined as

Asynchronously reads the entire contents of a file.

and writeFile is defined as

Asynchronously writes data to a file, replacing the file if it already exists.

Current State of readFile & writeFile:

When a regular file path is passed:

The file is either read from the beginning or completely overwritten. This complies with the documentation.

When a file descriptor is passed:

  1. readFile
    The file is read from the current position in the file (the current position is maintained by the file descriptor itself).
readFile problem reproduction
'use strict';

const fs = require('fs');
const assert = require('assert');
const buffer = Buffer.alloc(5);

/* Open the file descriptor. */
const fd = fs.openSync(__filename, 'r');

/* Read only the first five characters. */
assert.deepStrictEqual(fs.readSync(fd, buffer, 0, 5), 5);
assert.deepStrictEqual(buffer.toString('UTF-8'), '\'use ');

/* As per the docs, both of these should be the same. */
assert.deepStrictEqual(
fs.readFileSync(fd).toString(),
fs.readFileSync(__filename).toString()
);

  1. writeFile
    Writes from the beginning of the file, without replacing the contents of the file. For example, if the existing content in the file is ABCD and the newly written data is 12, then the actual contents of the file after both the writeFiles is 12CD.
writeFile problem reproduction
'use strict';

const fs = require('fs');
const assert = require('assert');

/* Write only five characters. */
fs.writeFileSync('/tmp/test1', 'Hello');
assert.deepStrictEqual(fs.readFileSync('/tmp/test1').toString(), 'Hello');

/* Write something else. */
fs.writeFileSync('/tmp/test1', 'Sun');
assert.deepStrictEqual(fs.readFileSync('/tmp/test1').toString(), 'Sun');

/* Open the file descriptor. */
const fd = fs.openSync('/tmp/test', 'w');

/* Write only five characters. */
assert.deepStrictEqual(fs.writeSync(fd, 'Hello'), 5);
assert.deepStrictEqual(fs.readFileSync('/tmp/test').toString(), 'Hello');

/* Write something else. */
fs.writeFileSync(fd, 'Sun');
assert.deepStrictEqual(fs.readFileSync('/tmp/test').toString(), 'Sun');

These two behaviours deviate from the documented expectations.

Cases for changing the behaviour to work from the beginning

#9671 (It presents the case with readFile)

Cases for keeping the current behaviour

Its intuitive for people to expect read/write from the current position till the end of the file. (For example, read a header from a file and then decide whether to read the rest of the file or not.)

Proposed courses of action

  1. Land fs: make writeFile consistent with readFile wrt fd #23709, which makes writeFile also to write from the current file position just like readFile reads from the current file position, and be done with it. (semver-major)
  2. Make the readFile and writeFile to work from the beginning of the file. (semver-major)
    a. If the file descriptor is not seekable, an error will be thrown from libuv.
  3. Variant of 2, if the file descriptor doesn't correspond to a regular file (basically a non-seekable file descriptor), throw an error. Otherwise, try to work from the beginning of the file. (semver-major)
  4. Remove fd support, its just a helper function for a not so oftenly used use-case, and there are valid reasons to support both seeking and non-seeking, so whatever we choose someone will need to write their own wrapper to make functions that work as they expect. Let there be an npm package to do this, or several. (semver-major)
  5. Just leave the current behaviour as it is with the doc changes in doc: clarify fd behaviour with {read,write}File #23706.

This is not an exhaustive list. Please feel free to propose more ideas.

PS: This comment by @sam-github summarises my personal expectations from the API.


Name Option 1 Option 2 Option 3 Option 4 Option 5
@thefourtheye X X
@mcollina X X
@addaleax X X
@TimothyGu X
@mhdawson X X
@Trott
@ChALkeR X + warn * X
@rvagg X
@cjihrig X X
@joyeecheung X
@apapirovski X
@Fishrock123 X
Total 9 1 1 2 4

cc @nodejs/fs

Tagging @nodejs/tsc (Might be a bit early, but this has been discussed before. So tagging to get more inputs)

Tagging @seishun @vsemozhetbyt @sam-github @jorangreef @addaleax from the previous discussions.

If needed we can tag the collaborators to get their expectations.


Previous discussions on this topic happened in #9671, #10809, #10853, #13572,
#22554

@thefourtheye thefourtheye added fs Issues and PRs related to the fs subsystem / file system. discuss Issues opened for discussions and feedbacks. labels Oct 11, 2018
@sam-github
Copy link
Contributor

Another option:

  1. Remove fd support, its just a helper function for a not so oftenly used use-case, and there are valid reasons to support both seeking and non-seeking, so whatever we choose someone will need to write their own wrapper to make functions that work as they expect. Let there be an npm package to do this, or several.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 12, 2018

See also #22554 for various behaviour of fs.writeFile() variants (sync / callback / promise ). Promise variant adds even more discrepancy: complete rewriting vs positional rewriting vs appending.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 12, 2018

As for a use case for fs.writeFile(fd). Say, I have a long process with many iterative tasks. After each task, the script writes a current task identifier to a state file, to save a progress and to be able to restore the state. This identifier should be overwritten, not appended as in a log file, so that it can be easily required not by reading and parsing all the huge log file, but via one small fs.readFile() call. Using a path would have an overhead of a constant file reopening. But we cannot use a file descriptor in this case for now.

@thefourtheye
Copy link
Contributor Author

@sam-github I included that option in the OP.

@vsemozhetbyt I have included that issue also in the list of references.


Let's start recording what we think would be the expected behaviour of readFile & writeFile in the table in OP.

@thefourtheye
Copy link
Contributor Author

I checked few other languages to see how they handle file descriptors.

Python

  • Different functions to deal with file descriptors.
  • Positional reads and writes have separate functions.
  • No read all or write all functions with file descriptors.

Other languages which I checked were Ruby, Java, C, and C++. They all provide their own abstractions over file descriptors. Using file descriptors are not recommended, so not a lot of fd based functions are available in them.

@Trott
Copy link
Member

Trott commented Oct 16, 2018

I think step zero should be to update the documentation to specify the current behavior. If we change that behavior, fine, but there should be no reason to not have correct docs while that is being discussed. Is there already a PR open to update the docs? If not, I'm happy to do it or let @thefourtheye or anyone else do it.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 16, 2018

Would it be a bit confusing to state 3 different behaviors in docs? Especially callback/sync fd variant (merging) vs promise fd variant (appending)?

@Trott
Copy link
Member

Trott commented Oct 17, 2018

Would it be a bit confusing to state 3 different behaviors in docs? Especially callback/sync fd variant (merging) vs promise fd variant (appending)?

It will be a bit confusing, but hopefully less confusing than the current situation of having the same behavior but incorrectly documented.

@thefourtheye
Copy link
Contributor Author

thefourtheye commented Oct 17, 2018

@Trott I got a PR up to clarify the behaviour of file descriptors with readFile and writeFile, #23706.

@mcollina
Copy link
Member

mcollina commented Oct 24, 2018

I'm for deprecating readFile(fd) and writeFile(fd), if it does not break on CITGM. If somebody is deailing with file descriptors, I think they can use the barebone methods read() and write().

@Trott Trott added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Oct 24, 2018
@Trott
Copy link
Member

Trott commented Oct 24, 2018

Adding tsc-agenda label per discussion in the TSC meeting today where conclusion was "Get more engagement, particularly from @nodejs/tsc, but leave on agenda in case no consensus is reached."

@Trott
Copy link
Member

Trott commented Oct 24, 2018

I find all four proposed options acceptable (assuming Option 1 involves documenting the current behavior, which given the open PR, it seems like it does).

@Trott
Copy link
Member

Trott commented Oct 24, 2018

(Although for option 3: If we throw, then we shouldn't do anything afterwards, right? So either it's not throwing and it's emitting a warning instead and then trying from the beginning of the file, or it's throwing an error and giving up? Or am I misunderstanding it?)

@mhdawson
Copy link
Member

I'm ok with options 1 and 4, with a slight preference for 1 based on the concern of breaking existing code.

@thefourtheye
Copy link
Contributor Author

@Trott I updated Option 3 now. It either throws or works from the beginning of the file.

@thefourtheye
Copy link
Contributor Author

@mhdawson Option 1 still leaves us with inconsistent behaviour (as readFile and writeFile work from different positions). #23709 (semver-major, so it still breaks existing code 😞) makes them consistent.

@addaleax
Copy link
Member

So … one thing I can potentially see people doing is fs.readFileSync(0) or similar. It’s brittle, because fd 0 has to be non-blocking (i.e. process.stdin should not have been accessed), but it’s something we may want to consider realistic usage?

I’m definitely okay with the current behavior, i.e. option 1.

@thefourtheye
Copy link
Contributor Author

@addaleax Just to confirm, are you okay with the current state as it is, or with #23709?

@addaleax
Copy link
Member

@thefourtheye Oh, yes, I’m a fan of #23709. I just didn’t really see that as an option here, so I thought it falls under option 1? :)

@thefourtheye
Copy link
Contributor Author

@addaleax I updated the option 1 now.

@TimothyGu
Copy link
Member

I’m for option 1 as well.

@mhdawson
Copy link
Member

@thefourtheye Was option 1 updated to include changes from #23709 or did I just miss that the first time I looked at it.

I think there should be an option were we leave functionality as is but update the documentation. That is what I thought 1 was originally.

@thefourtheye
Copy link
Contributor Author

@mhdawson Oh yes, Option 1 was updated after @addaleax's comment. I'll land #23706 tomorrow (provided there are no suggestions by then), which documents the current behaviour. As it is anyway going to be landed, I didn't want to include that in the options. Just to be thorough, I have included that as Option 5 now.

@rvagg
Copy link
Member

rvagg commented Nov 7, 2018

not sure if we're still voting on this one, I weighed in @ #23709, I'd prefer option 1.

@thefourtheye
Copy link
Contributor Author

What I would prefer to see is a variant of (2/3 1) that also emits a warning when an fd with non-zero offset is used (for one transitional major release). Or a doc change.

@ChALkeR This would be tricky, as there is no offset passed by the users. So, if we are to emit warnings, only option is to emit them if file descriptors are used with writeFile.

@ChALkeR
Copy link
Member

ChALkeR commented Nov 7, 2018

@thefourtheye
As I understood, (1) changes the behaviour of .writeFile in case when a file descriptor with a non-zero offset is passed, and (2) changes the behaviour of .readFile when a file descriptor with a non-zero offset is passed.

What I described above is (1) + a warning about behaviour change emitted the first time .writeFile/.writeFileSync is used on a file descriptor with a non-zero offset, at least for several versions.

That can't be from our JS api, but perhaps we could use lseek/_lseek internally and at least emit the warning on supported platforms (if not all platforms have that)?

We shouldn't emit needless warnings -- writing to an fd with zero offset is fine, that behaviour won't change and doesn't need a warning.

@Trott
Copy link
Member

Trott commented Nov 7, 2018

Seems to be consensus around option 1, or at least that's the closest to consensus? Is that correct?

@apapirovski
Copy link
Member

apapirovski commented Nov 7, 2018

+1 on option 1 from my end.

@Fishrock123
Copy link
Contributor

I prefer Option 1.

@rvagg
Copy link
Member

rvagg commented Nov 7, 2018

I'd be OK with going the option 1 + warning route through a single major cycle for a pure option 1. I'm concerned about performance implications though, if we need to reach down in to do an lseek to figure out whether a warning is warranted, can we put that in the path in such a way as to not make reads slower? Since here's a bunch of layers of indirection between writeFile() and the actual operations perhaps this is just going to add too much complication?
Still also fine with just a pure option 1 though, that seems to be consensus here unless anyone else wants to jump to the option 1 + warn bandwagon and put together a PR that implements it?

@thefourtheye
Copy link
Contributor Author

thefourtheye commented Nov 12, 2018

That can't be from our JS api, but perhaps we could use lseek/_lseek internally and at least emit the warning on supported platforms (if not all platforms have that)?

@ChALkeR Seek is not supported by libuv. @bnoordhuis's comment in this thread

lseek() is unpredictable with concurrent readers or writers. The libuv way is to use positional reads and writes

kind of gives a reason why it is not included in libuv. Do you mean, using *seek to get the details from our C++ layer? I believe that would be very difficult to maintain.

We shouldn't emit needless warnings -- writing to an fd with zero offset is fine, that behaviour won't change and doesn't need a warning.

Yes, I agree. However, in this particular case, we don't have a straight forward to decide when to actually warn. Would it be very bad if we warn people once who use these functions with file descriptors?

That said, #23709 is going to write from the current file position always, it doesn't matter what that is. So, if someone is using writeFile with a file descriptor, warning them saying "data is going to be written from the current file position instead of the current behaviour which is to write from position zero", would be okay, right?

@Trott
Copy link
Member

Trott commented Nov 13, 2018

This seems to be progressing to a conclusion. I'm going to remove the tsc-agenda label (since I'm the one who added it). Feel free to re-add it if you think it should stay on or go back on the agenda.

@Trott Trott removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Nov 13, 2018
@kghost
Copy link

kghost commented Dec 9, 2018

I think seek is needed #24923

@thefourtheye
Copy link
Contributor Author

@kghost The outcome of this issue is that readFile and writeFile will not seek and read/write from the current file position. You can always read and write from specific positions of a file with fs.read and fs.write functions.

@ChALkeR
Copy link
Member

ChALkeR commented Dec 12, 2018

@thefourtheye Sorry for the delay 😞 .

After further thoughts about this, I withdraw my suggestion — the increased maintaining costs indeed would not pay off here.

@thefourtheye
Copy link
Contributor Author

thefourtheye commented Dec 17, 2018

Out of the 18 TSC members (including @Trott), 12 have expressed their opinions in the table in the OP. With six abstentions and nine votes for Option 1, Option 1 wins. @nodejs/tsc please let me know, if there are any objections here.

Thanks everyone for participating and driving this issue to closure.

thefourtheye added a commit to thefourtheye/io.js that referenced this issue Dec 26, 2018
As per the decision in nodejs#23433,
the `fs.writeFile` will always write from the current position if it
is used with file descriptors. This patch updates it.

Ref: nodejs#23709
addaleax pushed a commit that referenced this issue Jan 7, 2019
As per the decision in #23433,
the `fs.writeFile` will always write from the current position if it
is used with file descriptors. This patch updates it.

Ref: #23709

PR-URL: #25080
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Weijia Wang <starkwang@126.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anto Aravinth <anto.aravinth.cse@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Rod Vagg <rod@vagg.org>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
refack pushed a commit to refack/node that referenced this issue Jan 14, 2019
As per the decision in nodejs#23433,
the `fs.writeFile` will always write from the current position if it
is used with file descriptors. This patch updates it.

Ref: nodejs#23709

PR-URL: nodejs#25080
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Weijia Wang <starkwang@126.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anto Aravinth <anto.aravinth.cse@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Rod Vagg <rod@vagg.org>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

No branches or pull requests