Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement more detailed streams support #3268

Merged
merged 29 commits into from
Dec 18, 2022
Merged

Conversation

hoodmane
Copy link
Member

@hoodmane hoodmane commented Nov 18, 2022

This adds a carefully designed API for controlling stdin, stdout, and stderr. It changes the default behavior to be a bit more useful, though in doing so introduces some mild backwards incompatibility. In particular:

  1. By default, stdin reads directly from process.stdin in node (as before) and raises an error if in browser (not as before).
  2. By default, stdout writes directly to process.stdout in node (before it called console.log) and calls console.log in browser (as before).
  3. By default, stderr writes directly to process.stderr in node (before it called console.warn) and calls console.warn in browser (as before).
  4. In all three cases, by default isatty(stdin/stdout/stderr) is true in node and false in browser (in the browser it used to be true).
  5. As before, if you pass stdin, stdout, or stderr as arguments to loadPyodide, isatty of the corresponding stream is set to false.

The stdin function is now more flexible: we now correctly handle the case where it returns an ArrayBuffer or ArrayBufferView.

I also added 3 new functions to set streams after Pyodide is loaded which offer additional control:

  • setStdin({stdin?, error?, isatty = false}) -- Sets the stdin function. The stdin function takes no arguments and should return null, undefined, a string, or a buffer. Sets and isatty(stdin) to isatty (by default false). If error is true, set stdin to always raise an EIO error when it is read.
  • setStdout({raw?, batched?, isatty = false}) -- If neither raw nor batched is passed, restore default stdout behavior. If rwa is passed, the raw stdout function receives a byte which it should interpret as a utf8 character code. Sets isatty(stdout) to isatty (by default false). If batched is passed but not raw, it sets a batched stdout function. The stdout function receives a string and should do something with it. In this case it ignores isatty and sets isatty(stdout) to false.
  • setStderr({raw?, batched?, isatty = false}) -- same but with stderr.

Resolves #3112.

Checklists

  • Add a CHANGELOG entry
  • Add / update tests
  • Add new / update outdated documentation
  • Add section to Usage docs

@hoodmane
Copy link
Member Author

@ryanking13 could you review this if you get a chance?

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @hoodmane! I have some minor comments (mostly questions), otherwise LGTM.

src/js/module.ts Show resolved Hide resolved
src/js/streams.ts Outdated Show resolved Hide resolved
src/js/streams.ts Show resolved Hide resolved
src/js/streams.ts Show resolved Hide resolved
Co-authored-by: Gyeongjae Choi <def6488@gmail.com>
@hoodmane
Copy link
Member Author

@rth Would you mind reading my description in the PR and reviewing the names / logic there? I think I will change isatty into a named argument, because then the consuming code will be clearer and it is more future-proof if we later want to add more options.

@rth
Copy link
Member

rth commented Nov 22, 2022

Thanks for working on this @hoodmane !

A couple of comments/questions:

  • do we actually need setDefaultStdout etc? From an API perspective wouldn't it be better to have this as the default behavior of setStdout e.g. when the argument is null. It's still useful to have the logic in a separate function, but it could be private in this case.
  • I haven't worked that much with UNIX streams unlike you, in particular for TTY but I don't understand why we need both the character level and the buffered setters (i.e. setRawStdout and setStdout) while they do not exist e.g. in UNIX and Python. I would have though that where to write (i.e. console in the browser and process in Node) is up to us, but as to whether stdout and stderr is buffered or not is part the spec and we shouldn't change that behavior. But again I don't know much about this.
  • A last open question is how do you generally see this interacting with message handlers. In particular, things like Offer redirection of micropip.install console messages and data micropip#23 ?

In all three cases, by default isatty(stdin/stdout/stderr) is true in node and false in browser (in the browser it used to be true).

On what basis do we decide this? For instance from what I understood https://www.npmjs.com/package/dash-wasm is a real terminal not a JS one like we had, which runs in the browser (see related docs). So in that case the isatty would be True. So it seems to be it's not just about being in the browser or not, no? Edit: though, yes, I guess it's tied to console vs process used as default rather than node / browser.

The last comment would you mind adding a documentation section somewhere under Usage explaining the default stream behavior and how to change it (essentially what you put in the issue description)?

@hoodmane
Copy link
Member Author

would you mind adding a documentation section somewhere under Usage explaining the default stream behavior and how to change it (essentially what you put in the issue description)?

I can do this, yes.

do we actually need setDefaultStdout etc? From an API perspective wouldn't it be better to have this as the default behavior of setStdout e.g. when the argument is null.

Yeah that is a reasonable option. The reason I didn't go this way is that setDefaultStdout sometimes calls setStdout and sometimes calls setRawStdout (depending on IN_NODE). I thought it would be weird if setStdout(null) internally called setRawStdout(write_to_process_stdout). But I don't have a strong opinion.

I would have though that where to write (i.e. console in the browser and process in Node) is up to us, but as to whether stdout and stderr is buffered or not is part the spec and we shouldn't change that behavior.

In order to write to console.log we have to have a line buffer. This is because any write to console.log always acts like it has a new line at the end -- console.log("a string"); and console.log("a string\n"); behave identically and they both add a new line. stdout may also be unbuffered, line buffered, or fully buffered inside of wasm. But when that wasm stdout buffer is flushed, it calls the stdout device's put_char method once for each character. Since we don't want to turn test into

t
e
s
t

we have to collect up the calls to put_char and then output them as a batch to console.log. Even if they were already buffered by someone else in some way.

Another point of setStdout is that it is much easier to implement a function that takes a string than a function that takes a character code for a single character. Consider for instance the example tools/python: we just need to filter out specific messages that we don't want printed. E.g., we want to filter out "warning: no blob constructor, cannot create blobs with mimetypes". This would be annoying if we received the message one character at a time. Another point is that it's the way the stdout argument to loadPyodide currently works so we need it in order to maintain backwards compatibility.

I am adding here setRawStdout which is significantly more powerful since it doesn't have to be line buffered, but it is also harder to use.

A last open question is how do you generally see this interacting with message handlers. In particular, things like pyodide/micropip#23?

Well it's certainly conceptually related but I haven't thought too much about the details. One interesting idea is that we could have the message handlers be file descriptors that we write messages to. Then we could use the same sort of api for setting up the handlers for the file descriptors. I'm not sure that this is a good idea, compared to just having callbacks which could be more expressive. I think it would be useful to allow people to create new input or output streams with handlers using an API similar to setStdout/setStdin. But we could work on that in the future.

@rth
Copy link
Member

rth commented Dec 10, 2022

@antocuni I would be interested in your opinion on this PR, just from the high level API perspective.

The reason I didn't go this way is that setDefaultStdout sometimes calls setStdout and sometimes calls setRawStdout (depending on IN_NODE). I thought it would be weird if setStdout(null) internally called setRawStdout(write_to_process_stdout). But I don't have a strong opinion.

I also don't have a very strong opinion, but I would be +1 to reduce the number of API endpoints this exposes if we can. Also, we really need to document that in the end when setRawStdout is called it would overwrite what was set via setStdout (and the other way around).

Thanks for the explanations @hoodmane! The other thing that is still not very clear to me is what would happen if one sets both setRawStdout and setStdout at the same time: whether one overwrites the other in the end. Anyway we need this explained in the docs, in particular how would one chose between these two methods in practice.

@hoodmane
Copy link
Member Author

if one sets both setRawStdout and setStdout at the same time: whether one overwrites the other in the end.

They overwrite each other yes.

@ryanking13 ryanking13 mentioned this pull request Dec 12, 2022
10 tasks
@rth
Copy link
Member

rth commented Dec 13, 2022

Thanks for this work @hoodmane ! LGTM from my side, but I think it would be good to have a second review.

@hoodmane
Copy link
Member Author

I think I'll go ahead and merge this since @ryanking13 approved it too and it doesn't seem like @antocuni is going to review.

@hoodmane hoodmane merged commit 7422ab3 into pyodide:main Dec 18, 2022
@hoodmane hoodmane deleted the isatty branch December 18, 2022 23:55
@rth
Copy link
Member

rth commented Dec 20, 2022

The readthedocs build is not flaky as far as I can tell. I can reproduce the doc building failure locally and bisection points to this commit. It's just that RTD has an unfortunate tendency not to show status when it fails as it did in the last commit.

@rth rth mentioned this pull request Dec 20, 2022
@ryanking13
Copy link
Member

ryanking13 commented Dec 20, 2022

The RTD is failing because this PR added interface.ts but variables (PyodideInterface, FS, PATH) are not correctly imported in that file so TypeScript is complaining about it. I guess Hood intended to move Pyodide Interface-related stuff into interface.ts but it wasn't actually done in this PR, regarding that interface.ts is not used for now?

@hoodmane
Copy link
Member Author

Oh yeah adding interface.ts was part of an idea that I scrapped, it should be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stdout not ending in newline doesn't trigger stdout callback until newline is printed
3 participants