Skip to content

Commit

Permalink
fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Browse files Browse the repository at this point in the history
[in vfs.git#fixes]

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.

Reproducer follows:

#define __GNU_SOURCE
#include <linux/close_range.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sched.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/mman.h>

void is_open(int fd)
{
	printf("#%d is %s\n", fd,
		fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}

int child(void *unused)
{
	while(1) {
	}
	return 0;
}

int main(void)
{
	char *stack;
	pid_t pid;

	stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
		     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
	if (stack == MAP_FAILED) {
		perror("mmap");
		return -1;
	}

	pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
	if (pid == -1) {
		perror("clone");
		return -1;
	}
	for (int i = 2; i < 128; i++)
	    dup2(0, i);
	close_range(64, ~0U, CLOSE_RANGE_UNSHARE);

	is_open(64);
	printf("dup(0) => %d, expected 64\n", dup(0));

	kill(pid, 9);
	return 0;
}

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
  • Loading branch information
Al Viro authored and intel-lab-lkp committed Aug 3, 2024
1 parent defaf1a commit bdbee84
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions fs/file.c
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,20 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int
}

copy_fd_bitmaps(new_fdt, old_fdt, open_files);
if (unlikely(max_fds != NR_OPEN_MAX)) {
/*
* if there had been opened descriptors past open_files,
* the last copied word in full_fds_bits might have picked
* stray bits; nothing of that sort happens to open_fds and
* close_on_exec, since there the part that needs to be copied
* will always fall on a word boundary.
*/
unsigned int n = open_files / BITS_PER_LONG;
if (n % BITS_PER_LONG) {
unsigned long mask = BITMAP_LAST_WORD_MASK(n);
new_fdt->full_fds_bits[n / BITS_PER_LONG] &= mask;
}
}

old_fds = old_fdt->fd;
new_fds = new_fdt->fd;
Expand Down

0 comments on commit bdbee84

Please sign in to comment.