Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++][hardening] Use bounded iterators in std::vector and std::string #78929

Merged
merged 9 commits into from
Jul 23, 2024

Conversation

davidben
Copy link
Contributor

@davidben davidben commented Jan 22, 2024

NB: This PR depends on #78876. Ignore the first commit when reviewing, and don't merge it until #78876 is resolved. When/if #78876 lands, I'll clean this up.

This partially restores parity with the old, since removed debug build. We now can re-enable a bunch of the disabled tests. Some things of note:

  • bounded_iter's converting constructor has never worked. It needs a friend declaration to access the other bound_iter instantiation's private fields.

  • The old debug iterators also checked that callers did not try to compare iterators from different objects. bounded_iter does not currently do this, so I've left those disabled. However, I think we probably should add those. See std::copy and others bypass _LIBCPP_ABI_BOUNDED_ITERATORS and other hardened iterators #78771 (comment)

  • The std::vector iterators are bounded up to capacity, not size. This makes for a weaker safety check. This is because the STL promises not to invalidate iterators when appending up to the capacity. Since we cannot retroactively update all the iterators on push_back(), I've instead sized it to the capacity. This is not as good, but at least will stop the iterator from going off the end of the buffer.

    There was also no test for this, so I've added one in the std directory.

  • std::string has two ambiguities to deal with. First, I opted not to size it against the capacity. https://eel.is/c++draft/string.require#4 says iterators are invalidated on an non-const operation. Second, whether the iterator can reach the NUL terminator. The previous debug tests and the special-case in https://eel.is/c++draft/string.access#2 suggest no. If either of these causes widespread problems, I figure we can revisit.

  • resize_and_overwrite.pass.cpp assumed std::string's iterator supported s.begin().base(), but I see no promise of this in the standard. GCC also doesn't support this. I fixed the test to use std::to_address.

  • alignof.compile.pass.cpp's pointer isn't enough of a real pointer. (It needs to satisfy NullablePointer, LegacyRandomAccessIterator, and LegacyContiguousIterator.) __bounded_iter seems to instantiate enough to notice. I've added a few more bits to satisfy it.

Fixes #78805

@davidben davidben requested a review from a team as a code owner January 22, 2024 04:07
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jan 22, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 22, 2024

@llvm/pr-subscribers-libcxx

Author: David Benjamin (davidben)

Changes

NB: This PR depends on #78876. Ignore the first commit when reviewing, and don't merge it until #78876 is resolved. When/if #78876 lands, I'll clean this up.

This partially restores parity with the old, since removed debug build. We now can re-enable a bunch of the disabled tests. Some things of note:

  • bounded_iter's converting constructor has never worked. It needs a friend declaration to access the other bound_iter instantiation's private fields.

  • The old debug iterators also checked that callers did not try to compare iterators from different objects. bounded_iter does not currently do this, so I've left those disabled. However, I think we probably should add those. See std::copy and others bypass _LIBCPP_ABI_BOUNDED_ITERATORS and other hardened iterators #78771 (comment)

  • The std::vector iterators are bounded up to capacity, not size. This makes for a weaker safety check. This is because the STL promises not to invalidate iterators when appending up to the capacity. Since we cannot retroactively update all the iterators on push_back(), I've instead sized it to the capacaity. This is not as good, but at least will stop the iterator from going off the end of the buffer.

    There was also no test for this, so I've added one in the std directory.

  • std::string has two ambiguities to deal with. First, I opted not to size it against the capacity. https://eel.is/c++draft/string.require#4 says iterators are invalided on an non-const operation. Second, whether the iterator can reach the NUL terminator. The previous debug tests and the special-case in https://eel.is/c++draft/string.access#2 suggest no. If either of these causes widespread problems, I figure we can revisit.

  • resize_and_overwrite.pass.cpp assumed std::string's iterator supported s.begin().base(), but I see no promise of this in the standard. GCC also doesn't support this. I fixed the test to use std::to_address.

  • alignof.compile.pass.cpp's pointer isn't enough of a real pointer. (It needs to satisfy NullablePointer, LegacyRandomAccessIterator, and LegacyContiguousIterator.) __bounded_iter seems to instantiate enough to notice. I've added a few more bits to satisfy it.

Fixes #78805


Patch is 51.15 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78929.diff

20 Files Affected:

  • (modified) libcxx/include/__iterator/bounded_iter.h (+33-21)
  • (modified) libcxx/include/string (+25-1)
  • (modified) libcxx/include/vector (+29-1)
  • (modified) libcxx/test/libcxx/containers/sequences/vector/debug.iterator.add.pass.cpp (+18-6)
  • (modified) libcxx/test/libcxx/containers/sequences/vector/debug.iterator.decrement.pass.cpp (+4-4)
  • (modified) libcxx/test/libcxx/containers/sequences/vector/debug.iterator.dereference.pass.cpp (+16-4)
  • (modified) libcxx/test/libcxx/containers/sequences/vector/debug.iterator.increment.pass.cpp (+18-6)
  • (modified) libcxx/test/libcxx/containers/sequences/vector/debug.iterator.index.pass.cpp (+22-4)
  • (modified) libcxx/test/libcxx/containers/views/views.span/debug.iterator-indexing.pass.cpp (+127-48)
  • (modified) libcxx/test/libcxx/iterators/bounded_iter/dereference.pass.cpp (+7-7)
  • (modified) libcxx/test/libcxx/strings/basic.string/alignof.compile.pass.cpp (+9)
  • (modified) libcxx/test/libcxx/strings/basic.string/sizeof.compile.pass.cpp (+9)
  • (modified) libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.add.pass.cpp (+3-3)
  • (modified) libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.decrement.pass.cpp (+3-3)
  • (modified) libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.dereference.pass.cpp (+3-3)
  • (modified) libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.increment.pass.cpp (+3-3)
  • (modified) libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.index.pass.cpp (+5-4)
  • (modified) libcxx/test/libcxx/strings/string.view/string.view.iterators/debug.iterator-indexing.pass.cpp (+127-61)
  • (added) libcxx/test/std/containers/sequences/vector/vector.modifiers/push_back.invalidation.pass.cpp (+43)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/resize_and_overwrite.pass.cpp (+3-2)
diff --git a/libcxx/include/__iterator/bounded_iter.h b/libcxx/include/__iterator/bounded_iter.h
index 2a667648871c4c1..91ebe39d1ba57b6 100644
--- a/libcxx/include/__iterator/bounded_iter.h
+++ b/libcxx/include/__iterator/bounded_iter.h
@@ -31,13 +31,10 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 // Iterator wrapper that carries the valid range it is allowed to access.
 //
 // This is a simple iterator wrapper for contiguous iterators that points
-// within a [begin, end) range and carries these bounds with it. The iterator
-// ensures that it is pointing within that [begin, end) range when it is
-// dereferenced.
-//
-// Arithmetic operations are allowed and the bounds of the resulting iterator
-// are not checked. Hence, it is possible to create an iterator pointing outside
-// its range, but it is not possible to dereference it.
+// within a [begin, end] range and carries these bounds with it. The iterator
+// ensures that it is pointing within [begin, end) range when it is
+// dereferenced. It also ensures that it is never iterated outside of
+// [begin, end].
 template <class _Iterator, class = __enable_if_t< __libcpp_is_contiguous_iterator<_Iterator>::value > >
 struct __bounded_iter {
   using value_type        = typename iterator_traits<_Iterator>::value_type;
@@ -70,18 +67,20 @@ struct __bounded_iter {
 
 private:
   // Create an iterator wrapping the given iterator, and whose bounds are described
-  // by the provided [begin, end) range.
+  // by the provided [begin, end] range.
   //
-  // This constructor does not check whether the resulting iterator is within its bounds.
-  // However, it does check that the provided [begin, end) range is a valid range (that
-  // is, begin <= end).
+  // Except in debug builds, the constructor does not check whether the resulting iterator
+  // is within its bounds. The container is assumed to be self-consistent.
   //
   // Since it is non-standard for iterators to have this constructor, __bounded_iter must
   // be created via `std::__make_bounded_iter`.
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 explicit __bounded_iter(
       _Iterator __current, _Iterator __begin, _Iterator __end)
       : __current_(__current), __begin_(__begin), __end_(__end) {
-    _LIBCPP_ASSERT_INTERNAL(__begin <= __end, "__bounded_iter(current, begin, end): [begin, end) is not a valid range");
+    _LIBCPP_ASSERT_INTERNAL(
+        __begin <= __current, "__bounded_iter(current, begin, end): current and begin are inconsistent");
+    _LIBCPP_ASSERT_INTERNAL(
+        __current <= __end, "__bounded_iter(current, begin, end): current and end are inconsistent");
   }
 
   template <class _It>
@@ -91,21 +90,25 @@ struct __bounded_iter {
   // Dereference and indexing operations.
   //
   // These operations check that the iterator is dereferenceable, that is within [begin, end).
+  // The iterator is always within [begin, end], so we only need to check for end. This is
+  // easier to optimize out. See https://github.com/llvm/llvm-project/issues/78829
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 reference operator*() const _NOEXCEPT {
     _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
-        __in_bounds(__current_), "__bounded_iter::operator*: Attempt to dereference an out-of-range iterator");
+        __current_ != __end_, "__bounded_iter::operator*: Attempt to dereference an iterator at the end");
     return *__current_;
   }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 pointer operator->() const _NOEXCEPT {
     _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
-        __in_bounds(__current_), "__bounded_iter::operator->: Attempt to dereference an out-of-range iterator");
+        __current_ != __end_, "__bounded_iter::operator->: Attempt to dereference an iterator at the end");
     return std::__to_address(__current_);
   }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 reference operator[](difference_type __n) const _NOEXCEPT {
     _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
-        __in_bounds(__current_ + __n), "__bounded_iter::operator[]: Attempt to index an iterator out-of-range");
+        __n >= __begin_ - __current_, "__bounded_iter::operator[]: Attempt to index an iterator past the start");
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
+        __n < __end_ - __current_, "__bounded_iter::operator[]: Attempt to index an iterator at or past the end");
     return __current_[__n];
   }
 
@@ -114,6 +117,8 @@ struct __bounded_iter {
   // These operations do not check that the resulting iterator is within the bounds, since that
   // would make it impossible to create a past-the-end iterator.
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator++() _NOEXCEPT {
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
+        __current_ != __end_, "__bounded_iter::operator++: Attempt to advance an iterator past the end");
     ++__current_;
     return *this;
   }
@@ -124,6 +129,8 @@ struct __bounded_iter {
   }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator--() _NOEXCEPT {
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
+        __current_ != __begin_, "__bounded_iter::operator--: Attempt to rewind an iterator past the start");
     --__current_;
     return *this;
   }
@@ -134,6 +141,10 @@ struct __bounded_iter {
   }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator+=(difference_type __n) _NOEXCEPT {
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
+        __n >= __begin_ - __current_, "__bounded_iter::operator+=: Attempt to rewind an iterator past the start");
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
+        __n <= __end_ - __current_, "__bounded_iter::operator+=: Attempt to advance an iterator past the end");
     __current_ += __n;
     return *this;
   }
@@ -151,6 +162,10 @@ struct __bounded_iter {
   }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator-=(difference_type __n) _NOEXCEPT {
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
+        __n <= __current_ - __begin_, "__bounded_iter::operator-=: Attempt to rewind an iterator past the start");
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
+        __n >= __current_ - __end_, "__bounded_iter::operator-=: Attempt to advance an iterator past the end");
     __current_ -= __n;
     return *this;
   }
@@ -197,15 +212,12 @@ struct __bounded_iter {
   }
 
 private:
-  // Return whether the given iterator is in the bounds of this __bounded_iter.
-  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bool __in_bounds(_Iterator const& __iter) const {
-    return __iter >= __begin_ && __iter < __end_;
-  }
-
   template <class>
   friend struct pointer_traits;
+  template <class, class>
+  friend struct __bounded_iter;
   _Iterator __current_;       // current iterator
-  _Iterator __begin_, __end_; // valid range represented as [begin, end)
+  _Iterator __begin_, __end_; // valid range represented as [begin, end]
 };
 
 template <class _It>
diff --git a/libcxx/include/string b/libcxx/include/string
index e97139206d4fa7c..a3a04d42e18adb2 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -579,6 +579,7 @@ basic_string<char32_t> operator""s( const char32_t *str, size_t len );
 #include <__functional/unary_function.h>
 #include <__fwd/string.h>
 #include <__ios/fpos.h>
+#include <__iterator/bounded_iter.h>
 #include <__iterator/distance.h>
 #include <__iterator/iterator_traits.h>
 #include <__iterator/reverse_iterator.h>
@@ -736,9 +737,16 @@ public:
                 "[allocator.requirements] states that rebinding an allocator to the same type should result in the "
                 "original allocator");
 
-  // TODO: Implement iterator bounds checking without requiring the global database.
+#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS
+  // The pointer must be passed through __wrap_iter because
+  // __alloc_traits::pointer may not be detected as a continguous iterator on
+  // its own.
+  typedef __bounded_iter<__wrap_iter<pointer>> iterator;
+  typedef __bounded_iter<__wrap_iter<const_pointer>> const_iterator;
+#else
   typedef __wrap_iter<pointer> iterator;
   typedef __wrap_iter<const_pointer> const_iterator;
+#endif
   typedef std::reverse_iterator<iterator> reverse_iterator;
   typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
 
@@ -868,11 +876,27 @@ private:
     __init_with_sentinel(std::move(__first), std::move(__last));
   }
 
+#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 iterator __make_iterator(pointer __p) {
+    return __make_bounded_iter(
+        __wrap_iter<pointer>(__p),
+        __wrap_iter<pointer>(__get_pointer()),
+        __wrap_iter<pointer>(__get_pointer() + size()));
+  }
+
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_iterator __make_const_iterator(const_pointer __p) const {
+    return __make_bounded_iter(
+        __wrap_iter<const_pointer>(__p),
+        __wrap_iter<const_pointer>(__get_pointer()),
+        __wrap_iter<const_pointer>(__get_pointer() + size()));
+  }
+#else
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 iterator __make_iterator(pointer __p) { return iterator(__p); }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_iterator __make_const_iterator(const_pointer __p) const {
     return const_iterator(__p);
   }
+#endif
 
 public:
   _LIBCPP_TEMPLATE_DATA_VIS static const size_type npos = -1;
diff --git a/libcxx/include/vector b/libcxx/include/vector
index e9dd57055cb112d..7ce8f957d7d7cb5 100644
--- a/libcxx/include/vector
+++ b/libcxx/include/vector
@@ -326,6 +326,7 @@ template<class T, class charT> requires is-vector-bool-reference<T> // Since C++
 #include <__functional/hash.h>
 #include <__functional/unary_function.h>
 #include <__iterator/advance.h>
+#include <__iterator/bounded_iter.h>
 #include <__iterator/distance.h>
 #include <__iterator/iterator_traits.h>
 #include <__iterator/reverse_iterator.h>
@@ -399,9 +400,16 @@ public:
   typedef typename __alloc_traits::difference_type difference_type;
   typedef typename __alloc_traits::pointer pointer;
   typedef typename __alloc_traits::const_pointer const_pointer;
-  // TODO: Implement iterator bounds checking without requiring the global database.
+#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS
+  // The pointer must be passed through __wrap_iter because
+  // __alloc_traits::pointer may not be detected as a continguous iterator on
+  // its own.
+  typedef __bounded_iter<__wrap_iter<pointer>> iterator;
+  typedef __bounded_iter<__wrap_iter<const_pointer>> const_iterator;
+#else
   typedef __wrap_iter<pointer> iterator;
   typedef __wrap_iter<const_pointer> const_iterator;
+#endif
   typedef std::reverse_iterator<iterator> reverse_iterator;
   typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
 
@@ -796,10 +804,30 @@ private:
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __append(size_type __n);
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __append(size_type __n, const_reference __x);
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI iterator __make_iter(pointer __p) _NOEXCEPT {
+#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS
+    // Bound the iterator according to the capacity, rather than the size.
+    // Resizing a vector up to the capacity will not invalidate iterators, so,
+    // Without a way to update all live iterators on resize, we must
+    // conservatively bound the iterator by the capacity rather than the size.
+    return __make_bounded_iter(
+        __wrap_iter<pointer>(__p), __wrap_iter<pointer>(this->__begin_), __wrap_iter<pointer>(this->__end_cap()));
+#else
     return iterator(__p);
+#endif
   }
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_iterator __make_iter(const_pointer __p) const _NOEXCEPT {
+#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS
+    // Bound the iterator according to the capacity, rather than the size.
+    // Resizing a vector up to the capacity will not invalidate iterators, so,
+    // Without a way to update all live iterators on resize, we must
+    // conservatively bound the iterator by the capacity rather than the size.
+    return __make_bounded_iter(
+        __wrap_iter<const_pointer>(__p),
+        __wrap_iter<const_pointer>(this->__begin_),
+        __wrap_iter<const_pointer>(this->__end_cap()));
+#else
     return const_iterator(__p);
+#endif
   }
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
   __swap_out_circular_buffer(__split_buffer<value_type, allocator_type&>& __v);
diff --git a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.add.pass.cpp b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.add.pass.cpp
index 42021824ce6aec5..1ccc71c6c1512b3 100644
--- a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.add.pass.cpp
+++ b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.add.pass.cpp
@@ -10,8 +10,8 @@
 
 // Add to iterator out of bounds.
 
-// REQUIRES: has-unix-headers
-// UNSUPPORTED: !libcpp-has-legacy-debug-mode, c++03
+// REQUIRES: has-unix-headers, libcpp-has-abi-bounded-iterators
+// UNSUPPORTED: libcpp-hardening-mode=none, c++03
 
 #include <vector>
 #include <cassert>
@@ -19,27 +19,39 @@
 #include "check_assertion.h"
 #include "min_allocator.h"
 
+template <typename T, typename A>
+void fill_to_capacity(std::vector<T, A>& vec) {
+  // Fill vec up to its capacity. Our bounded iterators currently unable to
+  // catch accesses between size and capacity due to iterator stability
+  // guarantees. This function clears those effects.
+  while (vec.size() < vec.capacity()) {
+    vec.push_back(T());
+  }
+}
+
 int main(int, char**) {
   {
     typedef int T;
     typedef std::vector<T> C;
     C c(1);
+    fill_to_capacity(c);
     C::iterator i = c.begin();
-    i += 1;
+    i += c.size();
     assert(i == c.end());
     i = c.begin();
-    TEST_LIBCPP_ASSERT_FAILURE(i + 2, "Attempted to add/subtract an iterator outside its valid range");
+    TEST_LIBCPP_ASSERT_FAILURE(i + 2, "__bounded_iter::operator+=: Attempt to advance an iterator past the end");
   }
 
   {
     typedef int T;
     typedef std::vector<T, min_allocator<T> > C;
     C c(1);
+    fill_to_capacity(c);
     C::iterator i = c.begin();
-    i += 1;
+    i += c.size();
     assert(i == c.end());
     i = c.begin();
-    TEST_LIBCPP_ASSERT_FAILURE(i + 2, "Attempted to add/subtract an iterator outside its valid range");
+    TEST_LIBCPP_ASSERT_FAILURE(i + 2, "__bounded_iter::operator+=: Attempt to advance an iterator past the end");
   }
 
   return 0;
diff --git a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.decrement.pass.cpp b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.decrement.pass.cpp
index d134527a967e58c..3f6092d9208f8f6 100644
--- a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.decrement.pass.cpp
+++ b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.decrement.pass.cpp
@@ -10,8 +10,8 @@
 
 // Decrement iterator prior to begin.
 
-// REQUIRES: has-unix-headers
-// UNSUPPORTED: !libcpp-has-legacy-debug-mode, c++03
+// REQUIRES: has-unix-headers, libcpp-has-abi-bounded-iterators
+// UNSUPPORTED: libcpp-hardening-mode=none, c++03
 
 #include <vector>
 #include <cassert>
@@ -27,7 +27,7 @@ int main(int, char**) {
     C::iterator i = c.end();
     --i;
     assert(i == c.begin());
-    TEST_LIBCPP_ASSERT_FAILURE(--i, "Attempted to decrement a non-decrementable iterator");
+    TEST_LIBCPP_ASSERT_FAILURE(--i, "__bounded_iter::operator--: Attempt to rewind an iterator past the start");
   }
 
   {
@@ -37,7 +37,7 @@ int main(int, char**) {
     C::iterator i = c.end();
     --i;
     assert(i == c.begin());
-    TEST_LIBCPP_ASSERT_FAILURE(--i, "Attempted to decrement a non-decrementable iterator");
+    TEST_LIBCPP_ASSERT_FAILURE(--i, "__bounded_iter::operator--: Attempt to rewind an iterator past the start");
   }
 
   return 0;
diff --git a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.dereference.pass.cpp b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.dereference.pass.cpp
index 918cdd74b7916c7..a6e652de154893a 100644
--- a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.dereference.pass.cpp
+++ b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.dereference.pass.cpp
@@ -10,29 +10,41 @@
 
 // Dereference non-dereferenceable iterator.
 
-// REQUIRES: has-unix-headers
-// UNSUPPORTED: !libcpp-has-legacy-debug-mode, c++03
+// REQUIRES: has-unix-headers, libcpp-has-abi-bounded-iterators
+// UNSUPPORTED: libcpp-hardening-mode=none, c++03
 
 #include <vector>
 
 #include "check_assertion.h"
 #include "min_allocator.h"
 
+template <typename T, typename A>
+void fill_to_capacity(std::vector<T, A>& vec) {
+  // Fill vec up to its capacity. Our bounded iterators currently unable to
+  // catch accesses between size and capacity due to iterator stability
+  // guarantees. This function clears those effects.
+  while (vec.size() < vec.capacity()) {
+    vec.push_back(T());
+  }
+}
+
 int main(int, char**) {
   {
     typedef int T;
     typedef std::vector<T> C;
     C c(1);
+    fill_to_capacity(c);
     C::iterator i = c.end();
-    TEST_LIBCPP_ASSERT_FAILURE(*i, "Attempted to dereference a non-dereferenceable iterator");
+    TEST_LIBCPP_ASSERT_FAILURE(*i, "__bounded_iter::operator*: Attempt to dereference an iterator at the end");
   }
 
   {
     typedef int T;
     typedef std::vector<T, min_allocator<T> > C;
     C c(1);
+    fill_to_capacity(c);
     C::iterator i = c.end();
-    TEST_LIBCPP_ASSERT_FAILURE(*i, "Attempted to dereference a non-dereferenceable iterator");
+    TEST_LIBCPP_ASSERT_FAILURE(*i, "__bounded_iter::operator*: Attempt to dereference an iterator at the end");
   }
 
   return 0;
diff --git a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.increment.pass.cpp b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.increment.pass.cpp
index d3e4b4ec3143f1b..b059e9607366767 100644
--- a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.increment.pass.cpp
+++ b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.increment.pass.cpp
@@ -10,8 +10,8 @@
 
 // Increment iterator past end.
 
-// REQUIRES: has-unix-headers
-// UNSUPPORTED: !libcpp-has-legacy-debug-mode, c++03
+// REQUIRES: has-unix-headers, libcpp-has-abi-bounded-iterators
+// UNSUPPORTED: libcpp-hardening-mode=none, c++03
 
 #include <vector>
 #include <cassert>
@@ -19,25 +19,37 @@
 #include "check_assertion.h"
 #include "min_allocator.h"
 
+template <typename T, typename A>
+void fill_to_capacity(std::vector<T, A>& vec) {
+  // Fill vec up to its capacity. Our bounded iterators currently unable to
+  // catch accesses between size and capacity due to iterator stability
+  // guarantees. This function clears those effects.
+  while (vec.size() < vec.capacity()) {
+    vec.push_back(T());
+  }
+}
+
 int main(int, char**) {
   {
     typedef int T;
     typedef std::vector<T> C;
     C c(1);
+    fill_to_capacity(c);
     C::iterator i = c.begin();
-    ++i;
+    i += c.size();
     assert(i == c.end());
-    TEST_LIBCPP_ASSERT_FAILURE(++i, "Attempted to increment a non-incrementable iterator");
+    TEST_LIBCPP_ASSERT_FAILURE(++i, "__bounded_iter::operator++: Attempt to advance an iterator past the end");
   }
 
   {
     typedef int T;
     typedef std::vector<T, min_allocator<T> > C;
     C c(1);
+    fill_to_capacity(c);
     C::iterator i = c.begin();
-    ++i;
+    i += c.size();
     assert(i == c.end());
-    TEST_LIBCPP_ASSERT_FAILURE(++i, "Attempted to increment a non-incrementable iterator");
+    TEST_LIBCPP_ASSERT_FAILURE(++i, "__bounded_iter::operator++: Attempt to advance an iterator past the end");
   }
 
   return 0;
diff --git a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.index.pass.cpp b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.index.pass.cpp
index 8e8f6a5dae69d9d..4f3b49b69b3e8d6 100644
--- a/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.index.pass.cpp
+++ b/libcxx/test/libcxx/containers/sequences/vector/debug.iterator.index.pass.cpp
@@ -10,8 +10,8 @@
 
 // Index iterator out of bounds.
 
-// REQUIRES: has-unix-headers
-// UNSUPPORTED: !libcpp-has-legacy-debug-mode, c++03
+// REQUIRES: has-unix-headers, libcpp-has-abi-bounded-iterators
+// UNSUPPORTED: libcpp-hardening-mode=none, c++03
 
 #include <vector>
 #include <cassert>
@@ -19,23 +19,41 @@
 #include "check_assertion.h"
 #include "m...
[truncated]

Copy link

github-actions bot commented Jan 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ldionne ldionne added the hardening Issues related to the hardening effort label Jan 22, 2024
@ldionne ldionne added this to the LLVM 19.0.X Release milestone Jan 22, 2024
@davidben
Copy link
Contributor Author

First, I opted not to size it against the capacity. https://eel.is/c++draft/string.require#4 says iterators are invalided on an non-const operation.

Ran into an interesting case. While this PR doesn't trip it, I looked at adding iterator compatibility checks for #78771 (comment) and that tripped one of libc++'s existing std::string tests:

This test is only valid if we take the position that iterators remain valid when appending up to capacity:

However, if I'm reading the spec correctly, I don't think that is guaranteed. Yet this test is under "std" and not "libcxx". Thoughts? Should we bound the string iterator to the capacity too?

@davidben
Copy link
Contributor Author

davidben commented Feb 9, 2024

Another thing I realized: the trick of using the capacity will slightly impede optimizability of range-for loops. Because that will expand the code into:

auto begin = vec.__begin_;
auto end = vec.__end_;
auto end_cap = vec.__end_cap();
auto iter = begin;
while (iter != end) {
  assert(iter != end_cap);
  f(*iter);
  auto(iter != end_cap);
  iter++;
}

In order for the compiler to delete those assertions, it needs to know that begin <= end <= end_cap, and I think it has no way to learn that a priori. Especially because the vector just stores the pointers, so it doesn't even see pointer arithmetic establish the invariant. Interestingly, the assertions in __bounded_iter's constructor are enough to do that, but

  1. They're disabled by way of _LIBCPP_ASSERT_INTERNAL
  2. We don't actually want a runtime check on that because this is an invariant that std::vector maintains. I.e. no incorrect use of std::vector (should of rampant memory unsafety scribbling over it) will break that invariant.
  3. Although a disabled _LIBCPP_ASSERT_INTERNAL turns into _LIBCPP_ASSUME, _LIBCPP_ASSUME is currently disabled because it breaks optimizations. So we have to either fix that or add _LIBCPP_ASSUME_FOR_REAL on those two invariants. :-)

davidben added a commit to davidben/llvm-project that referenced this pull request May 10, 2024
libc++ turned off _LIBCPP_ASSUME because turning every debug assert into
__builtin_assume tripped
https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609

However, this means we can't use _LIBCPP_ASSUME when there is a clear
optimization intent. See
llvm#78929 (comment)
for discussion of a place where _LIBCPP_ASSUME would be valuable.

Go ahead and fix this now, and adjust the comments. I don't think we
want _LIBCPP_ASSUME in disabled asserts anyway, at least not in any of
the hardened modes. This PR should have no impact on libc++ right
now, since _LIBCPP_ASSUME is currently never called standalone, but I
figure I can do this as a standalone PR since it's pretty
straightforward.
This partially restores parity with the old, since removed debug build.
We now can re-enable a bunch of the disabled tests. Some things of note:

- bounded_iter's converting constructor has never worked. It needs a
  friend declaration to access the other bound_iter instantiation's
  private fields.

- The old debug iterators also checked that callers did not try to
  compare iterators from different objects. bounded_iter does not
  currently do this, so I've left those disabled. However, I think we
  probably should add those. See
  llvm#78771 (comment)

- The std::vector iterators are bounded up to capacity, not size. This
  makes for a weaker safety check. This is because the STL promises not
  to invalidate iterators when appending up to the capacity. Since we
  cannot retroactively update all the iterators on push_back(), I've
  instead sized it to the capacaity. This is not as good, but at least
  will stop the iterator from going off the end of the buffer.

  There was also no test for this, so I've added one in the std
  directory.

- std::string has two ambiguities to deal with. First, I opted not to
  size it against the capacity. https://eel.is/c++draft/string.require#4
  says iterators are invalided on an non-const operation. Second,
  whether the iterator can reach the NUL terminator. The previous debug
  tests and the special-case in https://eel.is/c++draft/string.access#2
  suggest no. If either of these causes widespread problems, I figure we
  can revisit.

- resize_and_overwrite.pass.cpp assumed std::string's iterator supported
  s.begin().base(), but I see no promise of this in the standard. GCC
  also doesn't support this. I fixed the test to use std::to_address.

- alignof.compile.pass.cpp's pointer isn't enough of a real pointer. (It
  needs to satisfy NullablePointer, LegacyRandomAccessIterator, and
  LegacyContiguousIterator.) __bounded_iter seems to instantiate enough
  to notice. I've added a few more bits to satisfy it.

Fixes llvm#78805
@davidben
Copy link
Contributor Author

davidben commented May 20, 2024

NB: This PR depends on #78876. Ignore the first commit when reviewing, and don't merge it until #78876 is resolved. When/if #78876 lands, I'll clean this up.

This is now done. Two remaining open questions, other than generally needing review:

  1. Should string iterators be bounded to size or capacity? [libc++][hardening] Use bounded iterators in std::vector and std::string #78929 (comment)
  2. How to help the compiler optimize std::vector? [libc++][hardening] Use bounded iterators in std::vector and std::string #78929 (comment).

For (1), I'm leaning towards leaving it as-is and seeing it it causes problems. If I'm reading the spec right, we're not obligated to keep that working, and the tighter bounds will prevent more bugs.

For (2), I propose we land #91801 and then, in vector::begin() and vector::end(), add _LIBCPP_ASSUME(size <= capacity). I believe that'll be enough for the compiler to optimize it, though I still need to confirm this.

@var-const, WDYT?

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments but I think the approach is fine. If you are unable to update this before next Tuesday, please let me know and we will apply the review comments. It would be nice to get this in LLVM 19.

libcxx/include/__iterator/bounded_iter.h Show resolved Hide resolved
libcxx/include/string Outdated Show resolved Hide resolved
libcxx/include/string Outdated Show resolved Hide resolved
libcxx/include/string Outdated Show resolved Hide resolved
@@ -786,9 +787,16 @@ public:
"[allocator.requirements] states that rebinding an allocator to the same type should result in the "
"original allocator");

// TODO: Implement iterator bounds checking without requiring the global database.
#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS
#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_STRING

I think we should introduce this under its own ABI macro, and the same for vector. And we should rename the existing macro to explicitly mention the containers they enable. I think that's the only way to pretend that these ABI macros yield a somewhat stable ABI.

In this patch, please introduce _LIBCPP_ABI_BOUNDED_ITERATORS_IN_STRING and _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR and I can handle the related cleanup as a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (leaving the comment open as a reminder that we need to do some further cleanup).

libcxx/include/vector Outdated Show resolved Hide resolved
@davidben
Copy link
Contributor Author

If you are unable to update this before next Tuesday, please let me know and we will apply the review comments.

Alas the IETF meeting is next week so I probably won't have time for a bit. If you've got the cycles, go for it! Thanks!

libcxx/include/vector Outdated Show resolved Hide resolved
libcxx/include/string Outdated Show resolved Hide resolved
- create dedicated ABI macros
  `_LIBCPP_ABI_BOUNDED_ITERATORS_IN_STRING` and
  `_LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR` (and the associated Lit
  features, etc.);
- add a release note and update the hardening documentation;
- add a comment in `<string>` to mention that its bounded iterators use
  the size rather than the capacity;
- reword and slightly expand the comment in `<vector>` explaining the
  use of the capacity rather than the size;
- tweak the comment that justifies the use of `__wrap_iter`;
- in `string`, move the conditional logic from outside to inside the
  functions' scope;
- move the repeated `fill_to_capacity` function to a separate helper
  header and slightly reword the comment;
- make `push_back.invalidation.pass.cpp` a death test and add a comment
  on some limitations of what a bounded iterator can currently detect;
- fix another `small_pointer` helper class to fix compilation.
@var-const
Copy link
Member

If you are unable to update this before next Tuesday, please let me know and we will apply the review comments.

Alas the IETF meeting is next week so I probably won't have time for a bit. If you've got the cycles, go for it! Thanks!

@davidben Awesome, thanks for the heads-up! I have applied most of the review feedback (plus some of the feedback I had). If you have the time, please take a look and let us know if there's anything you'd like to change/push back on; otherwise, I plan to merge by the Tuesday branch cutoff.

@davidben
Copy link
Contributor Author

If you have the time, please take a look and let us know if there's anything you'd like to change/push back on; otherwise, I plan to merge by the Tuesday branch cutoff.

Feedback all looks reasonable to me. Thanks!

@var-const
Copy link
Member

(I think everything I planned to do is uploaded now, I plan to merge before tomorrow's cutoff time if no objections)

Copy link
Member

@var-const var-const left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidben Thank you so much for working on this! (And my sincere apologies for the very slow review -- I wanted to get to this for a while but unfortunately had my plate very full recently) LGTM.

My biggest concern starting this review was whether accepting this lightweight approach would prevent us from adding heavier iterators in the future that would retain a connection with the container and thus could also catch temporal safety issues, such as invalidation. That would bring the current incarnation of hardening on par with the historical debug mode. However, since we decided to create separate ABI macros for these new iterators, this is no longer an issue. While it's a tradeoff (having several independent ABI configuration options creates many possible combinations, increasing complexity), I believe it's probably an inevitable outcome -- for one, I don't think it's reasonable to force anyone using the current bounded-iterators macro to start using bounded iterators in their vectors and strings, especially with no targeted opt-out possible, but also I think there might be legitimate contexts where someone might want bounded iterators in some, but not all, containers. The ability to choose (in the future, potentially) between more and less lightweight iterator hardening strategies follows quite nicely from this, kind of mirroring the distinction between the fast and extensive modes.

The idea to bound the vector iterator by capacity has grown on me. It is slightly unfortunate that there is a discrepancy between how checking is performed for strings versus vectors, but it seems like a necessary evil -- the only way to reconsolidate would be to make string checks laxer. Checking just the "physical" bounds prevents the worst vulnerabilities, so I believe we're still getting the most valuable part of the checks; I'd also note that IIUC, this is how Asan checks work for contiguous containers that are not specially annotated for the sanitizer.

I think that even without covering iterator invalidation (which, being a temporal safety issue, is much harder to solve in general anyway), this patch provides a noticeable improvement in the amount of checks we provide with little added code complexity, and the fact that the bound iterators are comparatively lightweight can be an advantage for some contexts.

@var-const var-const merged commit bcf9fb9 into llvm:main Jul 23, 2024
58 checks passed
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
…ing (llvm#78929)

~~NB: This PR depends on llvm#78876. Ignore the first commit when reviewing,
and don't merge it until llvm#78876 is resolved. When/if llvm#78876 lands, I'll
clean this up.~~

This partially restores parity with the old, since removed debug build.
We now can re-enable a bunch of the disabled tests. Some things of note:

- `bounded_iter`'s converting constructor has never worked. It needs a
friend declaration to access the other `bound_iter` instantiation's
private fields.

- The old debug iterators also checked that callers did not try to
compare iterators from different objects. `bounded_iter` does not
currently do this, so I've left those disabled. However, I think we
probably should add those. See
llvm#78771 (comment)

- The `std::vector` iterators are bounded up to capacity, not size. This
makes for a weaker safety check. This is because the STL promises not to
invalidate iterators when appending up to the capacity. Since we cannot
retroactively update all the iterators on `push_back()`, I've instead
sized it to the capacity. This is not as good, but at least will stop
the iterator from going off the end of the buffer.

There was also no test for this, so I've added one in the `std`
directory.

- `std::string` has two ambiguities to deal with. First, I opted not to
size it against the capacity. https://eel.is/c++draft/string.require#4
says iterators are invalidated on an non-const operation. Second,
whether the iterator can reach the NUL terminator. The previous debug
tests and the special-case in https://eel.is/c++draft/string.access#2
suggest no. If either of these causes widespread problems, I figure we
can revisit.

- `resize_and_overwrite.pass.cpp` assumed `std::string`'s iterator
supported `s.begin().base()`, but I see no promise of this in the
standard. GCC also doesn't support this. I fixed the test to use
`std::to_address`.

- `alignof.compile.pass.cpp`'s pointer isn't enough of a real pointer.
(It needs to satisfy `NullablePointer`, `LegacyRandomAccessIterator`,
and `LegacyContiguousIterator`.) `__bounded_iter` seems to instantiate
enough to notice. I've added a few more bits to satisfy it.

Fixes llvm#78805
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…ing (#78929)

~~NB: This PR depends on #78876. Ignore the first commit when reviewing,
and don't merge it until #78876 is resolved. When/if #78876 lands, I'll
clean this up.~~

This partially restores parity with the old, since removed debug build.
We now can re-enable a bunch of the disabled tests. Some things of note:

- `bounded_iter`'s converting constructor has never worked. It needs a
friend declaration to access the other `bound_iter` instantiation's
private fields.

- The old debug iterators also checked that callers did not try to
compare iterators from different objects. `bounded_iter` does not
currently do this, so I've left those disabled. However, I think we
probably should add those. See
#78771 (comment)

- The `std::vector` iterators are bounded up to capacity, not size. This
makes for a weaker safety check. This is because the STL promises not to
invalidate iterators when appending up to the capacity. Since we cannot
retroactively update all the iterators on `push_back()`, I've instead
sized it to the capacity. This is not as good, but at least will stop
the iterator from going off the end of the buffer.

There was also no test for this, so I've added one in the `std`
directory.

- `std::string` has two ambiguities to deal with. First, I opted not to
size it against the capacity. https://eel.is/c++draft/string.require#4
says iterators are invalidated on an non-const operation. Second,
whether the iterator can reach the NUL terminator. The previous debug
tests and the special-case in https://eel.is/c++draft/string.access#2
suggest no. If either of these causes widespread problems, I figure we
can revisit.

- `resize_and_overwrite.pass.cpp` assumed `std::string`'s iterator
supported `s.begin().base()`, but I see no promise of this in the
standard. GCC also doesn't support this. I fixed the test to use
`std::to_address`.

- `alignof.compile.pass.cpp`'s pointer isn't enough of a real pointer.
(It needs to satisfy `NullablePointer`, `LegacyRandomAccessIterator`,
and `LegacyContiguousIterator`.) `__bounded_iter` seems to instantiate
enough to notice. I've added a few more bits to satisfy it.

Fixes #78805
davidben added a commit to davidben/llvm-project that referenced this pull request Sep 17, 2024
libc++ turned off _LIBCPP_ASSUME because turning every debug assert into
__builtin_assume tripped
https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609

However, this means we can't use _LIBCPP_ASSUME when there is a clear
optimization intent. See
llvm#78929 (comment)
for discussion of a place where _LIBCPP_ASSUME would be valuable.

Go ahead and fix this now, and adjust the comments. I don't think we
want _LIBCPP_ASSUME in disabled asserts anyway, at least not in any of
the hardened modes. This PR should have no impact on libc++ right
now, since _LIBCPP_ASSUME is currently never called standalone, but I
figure I can do this as a standalone PR since it's pretty
straightforward.
ldionne pushed a commit that referenced this pull request Sep 17, 2024
libc++ turned off _LIBCPP_ASSUME because turning every debug assert into
__builtin_assume tripped [1]. However, this means we can't use _LIBCPP_ASSUME
when there is a clear optimization intent. See [2] for discussion of a place 
where _LIBCPP_ASSUME would be valuable.

This patch fixes this by not undefining the definition of _LIBCPP_ASSUME and
making sure that we don't attempt to `_LIBCPP_ASSSUME` every assertion in
the library.

[1]: https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609
[2]: #78929 (comment)
@davidben davidben deleted the bounded-iter-vec-str branch September 17, 2024 18:09
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
libc++ turned off _LIBCPP_ASSUME because turning every debug assert into
__builtin_assume tripped [1]. However, this means we can't use _LIBCPP_ASSUME
when there is a clear optimization intent. See [2] for discussion of a place 
where _LIBCPP_ASSUME would be valuable.

This patch fixes this by not undefining the definition of _LIBCPP_ASSUME and
making sure that we don't attempt to `_LIBCPP_ASSSUME` every assertion in
the library.

[1]: https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609
[2]: llvm#78929 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hardening Issues related to the hardening effort libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_LIBCPP_ABI_BOUNDED_ITERATORS support for std::string and std::vector
4 participants