Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-107845: Fix symlink handling for tarfile.data_filter #107846

Merged
merged 7 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Doc/library/tarfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@
For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the
keyword argument *preset* to specify the compression level of the file.

For special purposes, there is a second format for *mode*:

Check warning on line 115 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:meth reference target not found: read

Check warning on line 115 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:meth reference target not found: write
``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
object that processes its data as a stream of blocks. No random seeking will
be done on the file. If given, *fileobj* may be any object that has a
Expand Down Expand Up @@ -323,7 +323,7 @@
All following arguments are optional and can be accessed as instance attributes
as well.

*name* is the pathname of the archive. *name* may be a :term:`path-like object`.

Check warning on line 326 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: name
It can be omitted if *fileobj* is given.
In this case, the file object's :attr:`name` attribute is used if it exists.

Expand Down Expand Up @@ -358,7 +358,7 @@
*debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
messages). The messages are written to ``sys.stderr``.

*errorlevel* controls how extraction errors are handled,

Check warning on line 361 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: ~TarFile.errorlevel
see :attr:`the corresponding attribute <~TarFile.errorlevel>`.

The *encoding* and *errors* arguments define the character encoding to be
Expand Down Expand Up @@ -645,7 +645,7 @@
:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and
:meth:`~TarFile.gettarinfo`.

Modifying the objects returned by :meth:`~!TarFile.getmember` or

Check warning on line 648 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:meth reference target not found: !TarFile.getmember

Check warning on line 648 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:meth reference target not found: !TarFile.getmembers

Check warning on line 648 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:meth reference target not found: TarInfo.replace
:meth:`~!TarFile.getmembers` will affect all subsequent
operations on the archive.
For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or
Expand Down Expand Up @@ -727,7 +727,7 @@

.. attribute:: TarInfo.type

File type. *type* is usually one of these constants: :const:`REGTYPE`,

Check warning on line 730 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:const reference target not found: REGTYPE

Check warning on line 730 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:const reference target not found: AREGTYPE

Check warning on line 730 in Doc/library/tarfile.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:const reference target not found: LNKTYPE
:const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
:const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
:const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Expand All @@ -740,6 +740,11 @@
Name of the target file name, which is only present in :class:`TarInfo` objects
of type :const:`LNKTYPE` and :const:`SYMTYPE`.

For symbolic links (``SYMTYPE``), the linkname is relative to the directory
encukou marked this conversation as resolved.
Show resolved Hide resolved
that contains the link.
For hard links (``LNKTYPE``), the linkname is relative to the root of
encukou marked this conversation as resolved.
Show resolved Hide resolved
the archive.


.. attribute:: TarInfo.uid
:type: int
Expand Down
11 changes: 9 additions & 2 deletions Lib/tarfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,7 @@ def __init__(self, tarinfo):
class AbsoluteLinkError(FilterError):
def __init__(self, tarinfo):
self.tarinfo = tarinfo
super().__init__(f'{tarinfo.name!r} is a symlink to an absolute path')
super().__init__(f'{tarinfo.name!r} is a link to an absolute path')

class LinkOutsideDestinationError(FilterError):
def __init__(self, tarinfo, path):
Expand Down Expand Up @@ -802,7 +802,14 @@ def _get_filtered_attrs(member, dest_path, for_data=True):
if member.islnk() or member.issym():
if os.path.isabs(member.linkname):
raise AbsoluteLinkError(member)
target_path = os.path.realpath(os.path.join(dest_path, member.linkname))
if member.issym():
target_path = os.path.join(dest_path,
os.path.dirname(name),
member.linkname)
else:
target_path = os.path.join(dest_path,
member.linkname)
target_path = os.path.realpath(target_path)
if os.path.commonpath([target_path, dest_path]) != dest_path:
raise LinkOutsideDestinationError(member, target_path)
return new_attrs
Expand Down
120 changes: 113 additions & 7 deletions Lib/test/test_tarfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -3337,10 +3337,12 @@ def __exit__(self, *exc):
self.bio = None

def add(self, name, *, type=None, symlink_to=None, hardlink_to=None,
mode=None, **kwargs):
mode=None, size=None, **kwargs):
"""Add a member to the test archive. Call within `with`."""
name = str(name)
tarinfo = tarfile.TarInfo(name).replace(**kwargs)
if size is not None:
tarinfo.size = size
if mode:
tarinfo.mode = _filemode_to_int(mode)
if symlink_to is not None:
Expand Down Expand Up @@ -3416,7 +3418,8 @@ def check_context(self, tar, filter):
raise self.raised_exception
self.assertEqual(self.expected_paths, set())

def expect_file(self, name, type=None, symlink_to=None, mode=None):
def expect_file(self, name, type=None, symlink_to=None, mode=None,
size=None):
"""Check a single file. See check_context."""
if self.raised_exception:
raise self.raised_exception
Expand Down Expand Up @@ -3445,6 +3448,8 @@ def expect_file(self, name, type=None, symlink_to=None, mode=None):
self.assertTrue(path.is_fifo())
else:
raise NotImplementedError(type)
if size is not None:
self.assertEqual(path.stat().st_size, size)
for parent in path.parents:
self.expected_paths.discard(parent)

Expand Down Expand Up @@ -3491,8 +3496,15 @@ def test_parent_symlink(self):
# Test interplaying symlinks
# Inspired by 'dirsymlink2a' in jwilk/traversal-archives
with ArchiveMaker() as arc:

# links to `.` which is both:
encukou marked this conversation as resolved.
Show resolved Hide resolved
# - the destination directory
# - `current` itself
arc.add('current', symlink_to='.')

# effectively points to ./../
arc.add('parent', symlink_to='current/..')

arc.add('parent/evil')

if os_helper.can_symlink():
Expand Down Expand Up @@ -3534,9 +3546,46 @@ def test_parent_symlink(self):
def test_parent_symlink2(self):
# Test interplaying symlinks
# Inspired by 'dirsymlink2b' in jwilk/traversal-archives

# Posix and Windows have different pathname resolution:
# either symlink or a '..' component resolve first.
# Let's see which we are on.
if os_helper.can_symlink():
testpath = os.path.join(TEMPDIR, 'resolution_test')
os.mkdir(testpath)

# testpath/current links to `.` which is all of:
# - `testpath`
# - `testpath/current`
# - `testpath/current/current`
# - etc.
os.symlink('.', os.path.join(testpath, 'current'))

# we'll test where `testpath/current/../file` ends up
with open(os.path.join(testpath, 'current', '..', 'file'), 'w'):
pass

if os.path.exists(os.path.join(testpath, 'file')):
# Windows collapses 'current\..' to '.' first, leaving
# 'testpath\file'
dotdot_resolves_early = True
elif os.path.exists(os.path.join(testpath, '..', 'file')):
# Posix resolves 'current' to '.' first, leaving
# 'testpath/../file'
dotdot_resolves_early = False
else:
raise AssertionError('Could not determine link resolution')

with ArchiveMaker() as arc:

# `current` links to `.` which is both the destination directory
# and `current` itself
arc.add('current', symlink_to='.')

# `current/parent` is also available as `./parent`,
# and effectively points to `./../`
arc.add('current/parent', symlink_to='..')

arc.add('parent/evil')

with self.check_context(arc.open(), 'fully_trusted'):
Expand All @@ -3550,6 +3599,7 @@ def test_parent_symlink2(self):

with self.check_context(arc.open(), 'tar'):
if os_helper.can_symlink():
# Fail when extracting a file outside destination
self.expect_exception(
tarfile.OutsideDestinationError,
"'parent/evil' would be extracted to "
Expand All @@ -3560,10 +3610,24 @@ def test_parent_symlink2(self):
self.expect_file('parent/evil')

with self.check_context(arc.open(), 'data'):
self.expect_exception(
tarfile.LinkOutsideDestinationError,
"""'current/parent' would link to ['"].*['"], """
+ "which is outside the destination")
if os_helper.can_symlink():
if dotdot_resolves_early:
# Fail when extracting a file outside destination
self.expect_exception(
tarfile.OutsideDestinationError,
"'parent/evil' would be extracted to "
+ """['"].*evil['"], which is outside """
+ "the destination")
else:
# Fail as soon as we have a symlink outside the destination
self.expect_exception(
tarfile.LinkOutsideDestinationError,
"'current/parent' would link to "
+ """['"].*outerdir['"], which is outside """
+ "the destination")
else:
self.expect_file('current/')
self.expect_file('parent/evil')

@symlink_test
def test_absolute_symlink(self):
Expand Down Expand Up @@ -3593,12 +3657,30 @@ def test_absolute_symlink(self):
with self.check_context(arc.open(), 'data'):
self.expect_exception(
tarfile.AbsoluteLinkError,
"'parent' is a symlink to an absolute path")
"'parent' is a link to an absolute path")

def test_absolute_hardlink(self):
# Test hardlink to an absolute path
# Inspired by 'dirsymlink' in jwilk/traversal-archives
encukou marked this conversation as resolved.
Show resolved Hide resolved
with ArchiveMaker() as arc:
arc.add('parent', hardlink_to=self.outerdir / 'foo')

with self.check_context(arc.open(), 'fully_trusted'):
self.expect_exception(KeyError, ".*foo. not found")

with self.check_context(arc.open(), 'tar'):
self.expect_exception(KeyError, ".*foo. not found")

with self.check_context(arc.open(), 'data'):
self.expect_exception(
tarfile.AbsoluteLinkError,
"'parent' is a link to an absolute path")

@symlink_test
def test_sly_relative0(self):
# Inspired by 'relative0' in jwilk/traversal-archives
with ArchiveMaker() as arc:
# points to `../../tmp/moo`
arc.add('../moo', symlink_to='..//tmp/moo')

try:
Expand Down Expand Up @@ -3649,6 +3731,30 @@ def test_sly_relative2(self):
+ """['"].*moo['"], which is outside the """
+ "destination")

@symlink_test
def test_deep_symlink(self):
# Test that symlinks and hardlinks inside a directory
# point to the correct file (`target` of size 3).
# If links aren't supported we get a copy of the file.
with ArchiveMaker() as arc:
arc.add('targetdir/target', size=3)
# a hardlink's linkname is relative to the archive
arc.add('linkdir/hardlink', hardlink_to=os.path.join(
'targetdir', 'target'))
# a symlink's linkname is relative to the link's directory
arc.add('linkdir/symlink', symlink_to=os.path.join(
'..', 'targetdir', 'target'))

for filter in 'tar', 'data', 'fully_trusted':
with self.check_context(arc.open(), filter):
self.expect_file('targetdir/target', size=3)
self.expect_file('linkdir/hardlink', size=3)
if os_helper.can_symlink():
self.expect_file('linkdir/symlink', size=3,
symlink_to='../targetdir/target')
else:
self.expect_file('linkdir/symlink', size=3)

def test_modes(self):
# Test how file modes are extracted
# (Note that the modes are ignored on platforms without working chmod)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
:func:`tarfile.data_filter` now takes the location of symlinks into account
when determining their target, so it will no longer reject some valid
tarballs with ``LinkOutsideDestinationError``.
Loading