Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: opening/reading from fifo/chardev files are blocking the thread #14255

Conversation

ysbaddaden
Copy link
Contributor

Disk files are always opened in a blocking way because epoll (for instance) will always say that a file is ready for read or write, but this is only true for regular disk files, and that doesn't apply to pipes and characters devices.

For example opening /dev/tty (a character device) will block the current thread when trying to read until we start typing on the keyboard. IO::FileDescriptor was taking care to check the file type, but File always told it to be blocking.

Worse, trying to open a fifo (pipe) file will block the current thread until another thread or process also opens it, which means that we must determine whether to block before trying to open a file (not make it nonblocking afterwards).

I also added a blocking parameter to File.open and File.new. Specifying a boolean value will always open the file in blocking or nonblocking mode, avoiding a call to stat(2), though it shouldn't be an issue unless an application is opening lots of files.

closes #8152

Disk files are always opened in a blocking way because epoll (for
instance) will always say that a file is ready for reach or write. But
this is only for regular disk files, and that doesn't apply to pipes and
characters devices for example.

Opening `/dev/tty` (a character device) will block the current thread
when trying to read from it until you start typing on the keyboard.
IO::FileDescriptor was taking care to check the file type, but File
always told it to be blocking.

Worse, trying to open a fifo/pipe file will block the current thread
until another thread or process also opens it, which means that we must
determine whether to block _before_ trying to open a file (not change it
afterwards).

I also added a `blocking` parameter to `File.open` and `File.new`.
@Blacksmoke16 Blacksmoke16 added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency topic:stdlib:files labels Jan 26, 2024
@HertzDevil
Copy link
Contributor

I'm not so sure whether File should handle things other than regular files and, if it indeed should, whether we still need a separate IO::FileDescriptor

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jan 26, 2024

I'm also conflicted: I'm not fond of having to call stat(2) whenever we open a file to fix an edge case yet they're just files on POSIX systems so it's natural to merely call File.open("/dev/tty").

Maybe we could require to specify blocking: false? That sounds acceptable, a bit weird since Crystal is supposed to be nonblocking by default, but we can document this behavior. That pushes the burden on the developer to notice this and understand the reasons, but still acceptable because it's an edge case.

@straight-shoota
Copy link
Member

Yeah File should be able to handle anything that can be referenced by a path name and has readable content.

IIUC there are three aspects to patch which are all related, but I think they can be implemented largely independently? Can we separate them into different PRs?

  1. Non-blocking read for character devices, sockets and FIFOs
  2. Non-blocking open on FIFO
  3. blocking paramter for File constructors

I understand 1. should be relatively simple to resolve, just avoid overriding the implicit value from IO::FileDescriptor.
Only 2. has added complexity which warrants a deeper discussion about pros and cons.
And 3. seems pretty straightforward as well.

@ysbaddaden
Copy link
Contributor Author

  1. is the problematic thing: it involves making a stat call every time we open a file (we currently don't);
  2. ain't complex, we just need open(2) to set O_NONBLOCK instead of delaying with a pair of fcntl calls (we could spare them);
  3. is indeed straightforward.

Hence my proposal: add a blocking: true param to File.new and File.open and pass it to the Crystal::System::File.open so we can manually fix the issue. We can have another PR for automatic detection of special file types, where we can do some measurements about the impact.

@straight-shoota
Copy link
Member

straight-shoota commented Jan 28, 2024

we just need open(2) to set O_NONBLOCK

Are you sure that's enough? The event loop doesn't handle open calls. So this would just continue execution while the file descriptor is not open yet. I suppose any read or write operation will eventually suspend the fiber until the file descriptor is open. But are we sure that no other operation can cause any harm on a non-blocking file descriptor while it's not open yet?

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jan 28, 2024

Good point:

O_NONBLOCK
When possible, the file is opened in nonblocking mode. Neither the open(2) nor any subsequent I/O operations on the file descriptor which is returned will cause the calling process to wait.

EINTR
While blocked waiting to complete an open of a slow device (e.g., a FIFO), the call was interrupted by a signal handler.

ENXIO
O_NONBLOCK | O_WRONLY is set, the named file is a FIFO, and no process has the FIFO open for reading.

EWOULDBLOCK
The O_NONBLOCK flag was specified, and an incompatible lease was held on the file, see fnctl(2).

Hum 🤔

That would only happen when we explicitly pass blocking: false. In cases where it's true (default) or nil (check) that would only set O_NONBLOCK after opening the file for special cases, so it doesn't seem problematic for the general case, but it means we must handle EINTR and ENXIO.

Allright, I'll make a couple PR: one with a blocking: true parameter to constructors and another with support for FIFO in open. I fixed the current PR to only expose a blocking argument to constructors that default to true (blocking) but can bet set to false (non-blocking) or nil (detect).

Removes the automatic detection of the file type and blocking behavior
before opening the file as we must support at least EINTR and ENXIO
return values for `open(2)`.

Blocking still defaults to true. Developers may pass nil for auto
detection or false to force the nonblocking flag (after open).

Opening a fifo without another thread/process also trying to open for
read or write will still block the current thread (will be another PR).
@beta-ziliani
Copy link
Member

The interpreter specs are failing. Do you know if that's related @ysbaddaden ?

@straight-shoota
Copy link
Member

Interpreter specs should be fixed by #14287

Starting threads very likely requires support from the interpreter to
create the thread (so it knows about it) to run the interpreted code in.

This also fixes the "can't resume running fiber" exceptions that started
appearing in the wait group pull request, because they were always run
after the thread related specs.
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Feb 6, 2024

Nope. I cherry-picked the commit from #14287 and this still fails, apparently in flock spec 🤨

  flock
    #flock_exclusive
    #flock_shared
    #flock_shared soft blocking fiber
FATAL: can't resume a running fiber: #<Fiber:0x7f23ae566dc0: main>
  from /home/julien/src/crystal-1.11.2/src/crystal/scheduler.cr:148:7 in 'swapcontext'
  from /home/julien/src/crystal-1.11.2/src/compiler/crystal/interpreter/interpreter.cr:354:7 in 'interpret'
  from /home/julien/src/crystal-1.11.2/src/indexable.cr:574:11 in '->'
  from /home/julien/src/crystal-1.11.2/src/fiber.cr:146:11 in 'run'
  from ???

EDIT: this is because I'm calling Thread.new for the fifo spec 🤦
It only fails in this specific spec because it's the first time we switch to another fiber.

@HertzDevil HertzDevil added kind:feature and removed kind:bug A bug in the code. Does not apply to documentation, specs, etc. labels Feb 20, 2024
@HertzDevil
Copy link
Contributor

Note to self: asynchronous regular file I/O on Windows also depends on this since Windows disallows changing a file handle to the opposite blocking mode after creation

@straight-shoota straight-shoota added this to the 1.12.0 milestone Feb 20, 2024
@straight-shoota straight-shoota merged commit 35b8878 into crystal-lang:master Feb 21, 2024
57 checks passed
@davidmatter
Copy link

davidmatter commented Feb 22, 2024

@ysbaddaden Not sure if this solves my issue with raw mode:

The following example does nothing at all. Am I thinking about this correctly? Seems to me that there is a problem with raw mode and non-blocking opening of /dev/tty.

tty = File.open("/dev/tty", "r+", blocking: false)
puts tty.blocking # => false

tty.raw do
  c = tty.read_char
  puts "got #{c.inspect}"
end

@ysbaddaden ysbaddaden deleted the fix/8152/fifo-and-chardev-files-must-be-nonblocking branch February 22, 2024 14:21
@ysbaddaden
Copy link
Contributor Author

@davidmatter I have no issues in a Linux terminal. Neither with this example or the one from https://forum.crystal-lang.org/t/help-with-channels-fibers/6354 after I add blocking: false and a couple fixes:

CTRL_C = '\u0003'

def read_tty_and_send(channel : Channel(Char))
  File.open("/dev/tty", "r", blocking: false) do |tty|
    while char = tty.raw &.read_char
      if char == CTRL_C
        channel.close
        break
      else
        channel.send char
      end
    end
  end
rescue ex
  puts "Error reading from /dev/tty: #{ex.message}"
end

# Create a channel for Char
keystroke_channel = Channel(Char).new

# Start the tty reading in a separate fiber
spawn read_tty_and_send(keystroke_channel)

# Read from the channel
while char = keystroke_channel.receive?
  print "Received keystroke: #{char.inspect}\r\n"
end

@davidmatter
Copy link

Hey @ysbaddaden thanks for trying it out. So it was my darwin setup all along :) Seems we have an issue only on (my?) macOS.

Chip: M1 Pro
OS: Sonoma 14.2.1

@davidmatter
Copy link

davidmatter commented Feb 22, 2024

The following works perfectly fine. If you can give me a hint where the problem could be in the crystal stdlib, I can take a look.

#include <stdio.h>
#include <termios.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
    int fd = open("/dev/tty", O_RDONLY | O_NONBLOCK);
    if (fd == -1) {
        perror("open");
        return 1;
    }

    struct termios original_settings, new_settings;

    tcgetattr(fd, &original_settings); // Store original settings
    new_settings = original_settings;

    new_settings.c_lflag &= ~(ICANON | ECHO); 
    new_settings.c_cc[VMIN] = 1; 
    new_settings.c_cc[VTIME] = 0; 

    tcsetattr(fd, TCSANOW, &new_settings); // Apply new settings

    char ch;
    while (1) {
        int num_read = read(fd, &ch, 1);
        if (num_read == 1) {
            printf("Key pressed: %c\n", ch);
        }
    }

    tcsetattr(fd, TCSANOW, &original_settings); 
    close(fd); 

    return 0;
}

@davidmatter
Copy link

This might be the problem, not sure about that, though. Maybe @gdamore knows some things about this.

https://github.com/gdamore/tcell/blob/main/tty_unix.go#L60

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Feb 22, 2024

Yeah, that's probably MacOS acting out. There's nothing wrong with your code, but maybe there's something in tty.raw that MacOS doesn't like.

The difference with the C code is that File in Crystal won't pass O_NONBLOCK to open(2) (we'd need to handle EINTR/ENXIO for open(2) which I'm not sure how to handle, so we set it afterwards with fcntl. The TTY should already be opened by the terminal so it should block (it doesn't on linux) but maybe that's not the case on MacOS?

@davidmatter
Copy link

Alright, this one works :)

fd = LibC.open("/dev/tty", LibC::O_RDONLY | LibC::O_NONBLOCK)

original_settings = LibC::Termios.new

LibC.tcgetattr(fd, pointerof(original_settings))

new_settings = original_settings.dup

new_settings.c_lflag &= ~(LibC::ICANON | LibC::ECHO)

LibC.tcsetattr(fd, LibC::TCSANOW, pointerof(new_settings))

loop do
  ch = ' '
  num_read = LibC.read(fd, pointerof(ch), 1)
  if num_read == 1
    puts "Key pressed: #{ch}"
  end
end

@davidmatter
Copy link

This one does not, I'll try to dig deeper this weekend:

fd = LibC.open("/dev/tty", LibC::O_RDONLY | LibC::O_NONBLOCK)

original_settings = LibC::Termios.new
LibC.tcgetattr(fd, pointerof(original_settings))
new_settings = original_settings.dup
new_settings.c_lflag &= ~(LibC::ICANON | LibC::ECHO)
LibC.tcsetattr(fd, LibC::TCSANOW, pointerof(new_settings))

tty = IO::FileDescriptor.new(fd, blocking: false)

loop do
  b = Bytes.new(1)
  c = tty.read(b)
  puts "Read: #{c.inspect}"
end

@davidmatter
Copy link

Note to self: There seems to be a problem with "Crystal::Scheduler.reschedule" in evented IO. Without the rescheduling it seems to work. Trying to find out why.

 # :nodoc:
  def wait_readable(timeout = @read_timeout, *, raise_if_closed = true, &) : Nil
    readers = @readers.get { Deque(Fiber).new }
    readers << Fiber.current
    add_read_event(timeout)
    # Crystal::Scheduler.reschedule

    if @read_timed_out
      @read_timed_out = false
      yield
    end

    check_open if raise_if_closed
  end

@ysbaddaden
Copy link
Contributor Author

That seems to be the event loop: #reschedule suspends the current fiber until the read event is triggered which will resume the fiber. That should happen when the fd is ready for reading.

Basically, your change is making the read into a busy loop (loop { break if read }).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Reading from FiFo pipe blocks execution of other fibers
6 participants