Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] prevent socket fd leaks to child processes #1465

Merged
merged 6 commits into from
Sep 7, 2020

Conversation

aogun
Copy link
Contributor

@aogun aogun commented Aug 12, 2020

Child processed created with fork share the file descriptors with their parent process, if we fork a child process after startup a srt listener, the child process will hold the socket after we close it in the parent process, then the parent can't listen on the same port again. So I suggest set the SOCK_CLOEXEC flag to prevent socket leaks to child processes.

@maxsharabayko maxsharabayko added the [core] Area: Changes in SRT library core label Aug 12, 2020
@maxsharabayko maxsharabayko added this to the v1.5.0 - Sprint 21 milestone Aug 12, 2020
@ethouris
Copy link
Collaborator

You are right with this fix, thanks for finding it. I think this needs some improvements, however.

The definition of SO_CLOEXEC is platform dependent, and although it should suffice to see if this is defined, the alternative simply keeps the invalid implementation. Additionally, if someone attempts to compile it on a newer platform and run a binary on the older platform, the solution might be run and fail.

So what I think needs improvement is:

  1. Check if the call to ::socket succeeded here. If it failed, it might have failed due to SO_CLOEXEC flag, in which case it should retry without this flag.

  2. Both when the call failed and when the flag isn't defined, it should fallback in setting O_CLOEXEC flag on the created socket before it's rewritten to the field.

Might even be that for the sake of portability it would be even easier to rely exclusively on O_CLOEXEC because it is defined for a wider number of platforms and the situation of SO_CLOEXEC available without O_CLOEXEC doesn't exist anyway.

@aogun
Copy link
Contributor Author

aogun commented Aug 13, 2020

@ethouris , thank you for your reply, you are totally right, I made some changes, use O_CLOEXEC instead of SO_CLOEXEC, and create again without O_CLOEXEC if the call failed, is there anything I missed?

@aogun
Copy link
Contributor Author

aogun commented Aug 13, 2020

@ethouris , I found my pull request was base on an old version, what should I do? Should I close this pull request and create an new one base on the lasted version, or just update my fork repository and do the merge?

@ethouris
Copy link
Collaborator

ethouris commented Aug 14, 2020

As for "old version": If Github doesn't report conflicts, don't worry. If it does, got to your repository, update the upstream master branch (pull), then switch to your branch and do "merge" from master.

Your fix is still wrong, unfortunately. The SOCK_CLOEXEC flag can be used with socket call, but the O_CLOEXEC must be used with fcntl, after successful return from socket.

@ethouris
Copy link
Collaborator

Ok, that's much better. For Windows I found also something like:

::SetHandleInformation(hInputWrite, HANDLE_FLAG_INHERIT, 0)

If you have no possibility to check if it works on Windows - don't worry, just please add the above in the comment somewhere and add some "XXX" so that it can catch an eye. We'll fix and test it ourselves later. I need you to add this on your branch because in this case we can simply merge your branch directly.

@aogun
Copy link
Contributor Author

aogun commented Aug 17, 2020

Sorry, I can't check it on Windows, the comment already added, thank you for your patience.

@maxsharabayko maxsharabayko modified the milestones: v1.5.0 - Sprint 21, v1.5.0 - Sprint 22 Aug 25, 2020
@maxsharabayko
Copy link
Collaborator

Hi @aogun
Thank you for your contribution!

I wonder if you have some workflows with forked processes utilizing SRT?
Forking a process with UDP socket used by SRT does not seem like something that could be expected to work properly.
Sharing the same UDP socket between different processes might lead to a case, when an SRT packet designated for one SRT Socket in on process actually arrives at another process, and will effectively be lost.

I am thinking it might be better to forbid such forking in SRT, would that be possible.

@aogun
Copy link
Contributor Author

aogun commented Aug 27, 2020

No, sharing the same UDP socket is not my expect, in my use case, there have a process A which call srt create a socket and bind port X on it, after that at some time A needs call popen to execute command, it will create an sub process and the sub process will hold the udp socket inherit from parent, then you will get failed when you reopen the socket and bind port X in A during subprocess running, add O_CLOEXEC flag will avoid such problem, I suppose forbid process to use popen is not a good idea.

@maxsharabayko
Copy link
Collaborator

Aha, ok, you are not doing fork, you are doing popen, and it also forks the UDP socket handler. 🤔

@maxsharabayko
Copy link
Collaborator

Looks like there is no clean way to prevent a descriptor to be shared with child process.

@aogun
Copy link
Contributor Author

aogun commented Sep 7, 2020

@maxsharabayko Fedora Docs mentioned two ways, the first is set close-on-exec flag, and the doc figured out the downside of this approach is needs set the flag to every descriptors which sub process not needed. The second way is loop over file descriptors and close it before creating a new process image, but there's no safe and efficiency way to loop all the file descriptors.
So, I think the first approach is our best choice since we already knew our socket should closed before exec, and as far as I know, many projects use the first way to create socket and I list some below :

and gstreamer, uWebsocket, etc.

Copy link
Collaborator

@maxsharabayko maxsharabayko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aogun, I agee.

@maxsharabayko
Copy link
Collaborator

Hi @aogun

Looking at the code again, why are #define set_cloexec set_cloexec_ioctl and static int set_cloexec_ioctl(int fd, int set) { under different ifdef directives?

#if defined(_AIX) || \
    defined(__APPLE__) || \
    defined(__DragonFly__) || \
    defined(__FreeBSD__) || \
    defined(__FreeBSD_kernel__) || \
    defined(__linux__) || \
    defined(__OpenBSD__) || \
    defined(__NetBSD__)
#define set_cloexec set_cloexec_ioctl
#else
#define set_cloexec set_cloexec_fcntl
#endif

#if !defined(__CYGWIN__) && !defined(__MSYS__) && !defined(__HAIKU__)
static int set_cloexec_ioctl(int fd, int set) {
    int r;

    do
        r = ioctl(fd, set ? FIOCLEX : FIONCLEX);
    while (r == -1 && errno == EINTR);

    if (r)
        return errno;

    return 0;
}
#endif

Should not the whole section be the following? Or even just defining the set_cloexec.

#ifndef _WIN32

#if defined(_AIX) || \
    defined(__APPLE__) || \
    defined(__DragonFly__) || \
    defined(__FreeBSD__) || \
    defined(__FreeBSD_kernel__) || \
    defined(__linux__) || \
    defined(__OpenBSD__) || \
    defined(__NetBSD__)
#define set_cloexec set_cloexec_ioctl

static int set_cloexec_ioctl(int fd, int set) {
    int r;

    do
        r = ioctl(fd, set ? FIOCLEX : FIONCLEX);
    while (r == -1 && errno == EINTR);

    if (r)
        return errno;

    return 0;
}

#else
#define set_cloexec set_cloexec_fcntl

static int set_cloexec_fcntl(int fd, int set) {
    int flags;
    int r;

    do
        r = fcntl(fd, F_GETFD);
    while (r == -1 && errno == EINTR);

    if (r == -1)
        return errno;

    /* Bail out now if already set/clear. */
    if (!!(r & FD_CLOEXEC) == !!set)
        return 0;

    if (set)
        flags = r | FD_CLOEXEC;
    else
        flags = r & ~FD_CLOEXEC;

    do
        r = fcntl(fd, F_SETFD, flags);
    while (r == -1 && errno == EINTR);

    if (r)
        return errno;

    return 0;
}
#endif
#endif // #ifndef _WIN32

@aogun
Copy link
Contributor Author

aogun commented Sep 22, 2020

you are right, just define set_cloexec would be better, there's no need to define different ifdef.

@maxsharabayko
Copy link
Collaborator

@aogun Thanks for the prompt feedback.
Functions are updated via PR #1562 and merged to the master.

It would be helpful if you could confirm the latest master works as expected on your setup as we are to release SRT v1.4.2 soon.

@aogun
Copy link
Contributor Author

aogun commented Sep 23, 2020

@maxsharabayko Socket leak test case passed if turn on option ENABLE_SOCK_CLOEXEC, failed if ENABLE_SOCK_CLOEXEC turn off, the PR #1562 works good.

@maxsharabayko
Copy link
Collaborator

@aogun Thank you for the confirmation!

@mbakholdina mbakholdina modified the milestones: v1.5.0 - Sprint 22, v1.4.2 Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants