Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASan][libc++] std::basic_string annotations #72677

Merged
merged 10 commits into from
Dec 13, 2023

Conversation

AdvenamTacet
Copy link
Member

@AdvenamTacet AdvenamTacet commented Nov 17, 2023

This commit introduces basic annotations for std::basic_string, mirroring the approach used in std::vector and std::deque. Initially, only long strings with the default allocator will be annotated. Short strings (SSO - short string optimization) and strings with non-default allocators will be annotated in the near future, with separate commits dedicated to enabling them. The process will be similar to the workflow employed for enabling annotations in std::deque.

Please note: these annotations function effectively only when libc++ and libc++abi dylibs are instrumented (with ASan). This aligns with the prevailing behavior of Memory Sanitizer.

To avoid breaking everything, this commit also appends _LIBCPP_INSTRUMENTED_WITH_ASAN to __config_site whenever libc++ is compiled with ASan. If this macro is not defined, string annotations are not enabled. However, linking a binary that does not annotate strings with a dynamic library that annotates strings, is not permitted.

Originally proposed here: https://reviews.llvm.org/D132769

Related patches on Phabricator:

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector and std::deque collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in std::deque, or between the std::basic_string's size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the std::equals function. This function was taking iterators (iter1_begin, iter1_end, iter2_begin) to perform the comparison, using a custom comparison function. When the iter1 object exceeded the length of iter2, an out-of-bounds read could occur on the iter2 object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

This Pull Request introduces basic annotations for std::basic_string. Long strings exhibit structural similarities to std::vector and will be annotated accordingly. Short strings are already implemented, but will be turned on separately in a forthcoming commit. Look at a comment below to read about SSO issues at current moment.

Due to the functionality introduced in D132522, the __sanitizer_annotate_contiguous_container function now offers compatibility with all allocators. However, enabling this support will be done in a subsequent commit. For the time being, only strings with the default allocator will be annotated.

If you have any questions, please email:

@AdvenamTacet AdvenamTacet added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Nov 17, 2023
@AdvenamTacet AdvenamTacet self-assigned this Nov 17, 2023
@AdvenamTacet AdvenamTacet requested a review from a team as a code owner November 17, 2023 16:58
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 17, 2023

@llvm/pr-subscribers-libcxxabi

@llvm/pr-subscribers-libcxx

Author: Tacet (AdvenamTacet)

Changes

This commit turns on basic annotations for std::basic_string, similar to those in std::vector and std::deque.

Notice: those annotations work if and only if libc++ and libc++abi are instrumented (with ASan) as well!

Originally proposed here: https://reviews.llvm.org/D132769

Related patches:
Turning on annotations for short strings: https://reviews.llvm.org/D147680

Turning on annotations for all allocators: https://reviews.llvm.org/D146214

This revision is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector, to std::string and std::deque collections. These changes allow ASan to detect cases when the instrumented program accesses memory which is internally allocated by the collection but is still not in-use (accesses before or after the stored elements for std::deque, or between the size and capacity bounds for std::string).

The motivation for the research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a std::equals function that took iter1_begin, iter1_end, iter2_begin iterators (with a custom comparison function). When object iter1 was longer than iter2, read out-of-bounds on iter2 could happen. Container sanitization would detect it.

This Pull Request adds annotations for std::basic_string. Long string is very similar to std::vector, and therefore works well with __sanitizer_annotate_contiguous_container from LLVM 15 and therefore annotations works if that implementation is compiled with previous LLVM. However, only the standard allocator is supported.

As D132522 extended possibilities of __sanitizer_annotate_contiguous_container, now annotations may be supported with all allocators, but that support will be added in a next PR. Only strings with default allocator are annotated with that commit.

If you have any questions, please email:

  • advenam.tacet@trailofbits.com
  • disconnect3d@trailofbits.com

Patch is 129.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72677.diff

83 Files Affected:

  • (modified) libcxx/include/string (+225-57)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/capacity.pass.cpp (+1)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/clear.pass.cpp (+7)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/reserve.pass.cpp (+3)
  • (added) libcxx/test/std/strings/basic.string/string.capacity/reserve_size.asan.pass.cpp (+81)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/reserve_size.pass.cpp (+1)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/resize_and_overwrite.pass.cpp (+4)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/resize_size.pass.cpp (+10-5)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/resize_size_char.pass.cpp (+19-5)
  • (modified) libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp (+8)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/T_size_size.pass.cpp (+17-13)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/alloc.pass.cpp (+17-13)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/brace_assignment.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/char_assignment.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/copy.pass.cpp (+4)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/copy_alloc.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/copy_assignment.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/default.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/initializer_list.pass.cpp (+9)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/initializer_list_assignment.pass.cpp (+10-1)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/iter_alloc.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/iter_alloc_deduction.pass.cpp (+1)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/move.pass.cpp (+4)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/move_alloc.pass.cpp (+4)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/move_assignment.pass.cpp (+5)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/pointer_alloc.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/pointer_assignment.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/pointer_size_alloc.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/size_char_alloc.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/string_view.pass.cpp (+5)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/string_view_assignment.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.cons/substr.pass.cpp (+35-27)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/initializer_list.pass.cpp (+3-1)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/iterator.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/pointer.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/pointer_size.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/push_back.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/size_char.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/string.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/string_size_size.pass.cpp (+9-5)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/T_size_size.pass.cpp (+9-5)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/initializer_list.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/iterator.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/pointer.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/pointer_size.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/size_char.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/string.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_assign/string_size_size.pass.cpp (+9-5)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_copy/copy.pass.cpp (+7-3)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_erase/iter.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_erase/iter_iter.pass.cpp (+2)
  • (added) libcxx/test/std/strings/basic.string/string.modifiers/string_erase/pop_back.asan.pass.cpp (+65)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_erase/pop_back.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_erase/size_size.pass.cpp (+12-8)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/iter_char.pass.cpp (+13-9)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/iter_initializer_list.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/iter_iter_iter.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/iter_size_char.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer.pass.cpp (+11-7)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp (+11-7)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_size_char.pass.cpp (+11-7)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_string.pass.cpp (+11-7)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_string_size_size.pass.cpp (+11-7)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_op_plus_equal/char.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_op_plus_equal/initializer_list.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_op_plus_equal/pointer.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_op_plus_equal/string.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/iter_iter_string.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_T_size_size.pass.cpp (+10-7)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_pointer.pass.cpp (+15-10)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_pointer_size.pass.cpp (+15-10)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_size_char.pass.cpp (+14-10)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_string.pass.cpp (+14-10)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_string_size_size.pass.cpp (+14-10)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_string_view.pass.cpp (+14-10)
  • (added) libcxx/test/std/strings/basic.string/string.modifiers/string_swap/swap.asan.pass.cpp (+102)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_swap/swap.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.nonmembers/string.special/swap.pass.cpp (+3)
  • (modified) libcxx/test/std/strings/basic.string/string.nonmembers/string_op+/char_string.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.nonmembers/string_op+/string_char.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.nonmembers/string_op+/string_pointer.pass.cpp (+2)
  • (modified) libcxx/test/std/strings/basic.string/string.nonmembers/string_op+/string_string.pass.cpp (+2)
  • (modified) libcxx/test/support/asan_testing.h (+50-12)
diff --git a/libcxx/include/string b/libcxx/include/string
index 9c2efac8006bd72..9362b1bcd16fa19 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -645,6 +645,16 @@ basic_string<char32_t> operator""s( const char32_t *str, size_t len );
 _LIBCPP_PUSH_MACROS
 #include <__undef_macros>
 
+#ifndef _LIBCPP_HAS_NO_ASAN
+#  define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __attribute__((__no_sanitize__("address")))
+// Sometimes string access own memory, which should not be accessed by different parts of program
+// (eg. not-in-use bytes of buffer in short string) and are poisoned by ASan.
+// This macro turns off instrumentation in a function, so memory accesses which normally would crash
+// work as expected.
+#else
+#  define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
+#endif
+#define _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED false
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
@@ -702,6 +712,9 @@ struct __init_with_sentinel_tag {};
 template<class _CharT, class _Traits, class _Allocator>
 class basic_string
 {
+private:
+  using __default_allocator_type = allocator<_CharT>;
+
 public:
     typedef basic_string                                 __self;
     typedef basic_string_view<_CharT, _Traits>           __self_view;
@@ -888,34 +901,46 @@ public:
 #endif
       : __r_(__value_init_tag(), __a) {}
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(const basic_string& __str)
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string(const basic_string& __str)
       : __r_(__default_init_tag(), __alloc_traits::select_on_container_copy_construction(__str.__alloc())) {
     if (!__str.__is_long())
+    {
       __r_.first() = __str.__r_.first();
+      __annotate_new(__get_short_size());
+    }
     else
       __init_copy_ctor_external(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
   }
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(const basic_string& __str, const allocator_type& __a)
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string(const basic_string& __str, const allocator_type& __a)
       : __r_(__default_init_tag(), __a) {
     if (!__str.__is_long())
+    {
       __r_.first() = __str.__r_.first();
+      __annotate_new(__get_short_size());
+    }
     else
       __init_copy_ctor_external(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
   }
 
 #ifndef _LIBCPP_CXX03_LANG
-  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(basic_string&& __str)
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+  basic_string(basic_string&& __str)
 #  if _LIBCPP_STD_VER <= 14
       _NOEXCEPT_(is_nothrow_move_constructible<allocator_type>::value)
 #  else
       _NOEXCEPT
 #  endif
-      : __r_(std::move(__str.__r_)) {
+      // Turning off ASan instrumentation for variable initialization with _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
+      // is inconsistent and that initialization may be annotated.
+      // Therefore, to copy __str memory, we have to unpoison it first (if object is poisoned and not external buffer,
+      // so short string case).
+      : __r_( ( (__str.__is_long() ? 0 : (__str.__annotate_delete(), 0)), std::move(__str.__r_)) ) {
     __str.__r_.first() = __rep();
   }
 
-  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(basic_string&& __str, const allocator_type& __a)
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
+  basic_string(basic_string&& __str, const allocator_type& __a)
       : __r_(__default_init_tag(), __a) {
     if (__str.__is_long() && __a != __str.__alloc()) // copy, not move
       __init(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
@@ -923,6 +948,9 @@ public:
       if (__libcpp_is_constant_evaluated())
         __r_.first() = __rep();
       __r_.first() = __str.__r_.first();
+      if (!__str.__is_long()) {
+        __str.__annotate_delete();
+      }
       __str.__r_.first() = __rep();
     }
   }
@@ -965,11 +993,11 @@ public:
   }
 
 #if _LIBCPP_STD_VER >= 23
-  _LIBCPP_HIDE_FROM_ABI constexpr
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS constexpr
   basic_string(basic_string&& __str, size_type __pos, const _Allocator& __alloc = _Allocator())
       : basic_string(std::move(__str), __pos, npos, __alloc) {}
 
-  _LIBCPP_HIDE_FROM_ABI constexpr
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS constexpr
   basic_string(basic_string&& __str, size_type __pos, size_type __n, const _Allocator& __alloc = _Allocator())
       : __r_(__default_init_tag(), __alloc) {
     if (__pos > __str.size())
@@ -1081,6 +1109,7 @@ public:
 #endif // _LIBCPP_CXX03_LANG
 
   inline _LIBCPP_CONSTEXPR_SINCE_CXX20 ~basic_string() {
+    __annotate_delete();
     if (__is_long())
       __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
   }
@@ -1098,8 +1127,8 @@ public:
     }
 
 #ifndef _LIBCPP_CXX03_LANG
-  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& operator=(basic_string&& __str)
-      _NOEXCEPT_((__noexcept_move_assign_container<_Allocator, __alloc_traits>::value)) {
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string&
+    operator=(basic_string&& __str) _NOEXCEPT_((__noexcept_move_assign_container<_Allocator, __alloc_traits>::value)) {
     __move_assign(__str, integral_constant<bool, __alloc_traits::propagate_on_container_move_assignment::value>());
     return *this;
   }
@@ -1112,7 +1141,7 @@ public:
 #if _LIBCPP_STD_VER >= 23
     basic_string& operator=(nullptr_t) = delete;
 #endif
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& operator=(value_type __c);
+    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string& operator=(value_type __c);
 
     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
     iterator begin() _NOEXCEPT
@@ -1329,7 +1358,7 @@ public:
   }
 
 #if _LIBCPP_STD_VER >= 20
-  _LIBCPP_HIDE_FROM_ABI constexpr
+  _LIBCPP_HIDE_FROM_ABI constexpr _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
   void __move_assign(basic_string&& __str, size_type __pos, size_type __len) {
     // Pilfer the allocation from __str.
     _LIBCPP_ASSERT_INTERNAL(__alloc() == __str.__alloc(), "__move_assign called with wrong allocator");
@@ -1345,7 +1374,7 @@ public:
     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
     basic_string& assign(const basic_string& __str) { return *this = __str; }
 #ifndef _LIBCPP_CXX03_LANG
-    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
     basic_string& assign(basic_string&& __str)
         _NOEXCEPT_((__noexcept_move_assign_container<_Allocator, __alloc_traits>::value))
         {*this = std::move(__str); return *this;}
@@ -1736,7 +1765,7 @@ private:
 
     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __shrink_or_extend(size_type __target_capacity);
 
-    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
     bool __is_long() const _NOEXCEPT {
         if (__libcpp_is_constant_evaluated() && __builtin_constant_p(__r_.first().__l.__is_long_)) {
             return __r_.first().__l.__is_long_;
@@ -1776,6 +1805,7 @@ private:
         value_type* __p;
         if (__cap - __sz >= __n)
         {
+          __annotate_increase(__n);
             __p = std::__to_address(__get_pointer());
             size_type __n_move = __sz - __ip;
             if (__n_move != 0)
@@ -1802,7 +1832,7 @@ private:
     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 allocator_type& __alloc() _NOEXCEPT { return __r_.second(); }
     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR const allocator_type& __alloc() const _NOEXCEPT { return __r_.second(); }
 
-    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
     void __set_short_size(size_type __s) _NOEXCEPT {
         _LIBCPP_ASSERT_INTERNAL(
             __s < __min_cap, "__s should never be greater than or equal to the short string capacity");
@@ -1810,7 +1840,7 @@ private:
         __r_.first().__s.__is_long_ = false;
     }
 
-    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
     size_type __get_short_size() const _NOEXCEPT {
         _LIBCPP_ASSERT_INTERNAL(
             !__r_.first().__s.__is_long_, "String has to be short when trying to get the short size");
@@ -1860,6 +1890,45 @@ private:
     const_pointer __get_pointer() const _NOEXCEPT
         {return __is_long() ? __get_long_pointer() : __get_short_pointer();}
 
+    // The following functions are no-ops outside of AddressSanitizer mode.
+#ifndef _LIBCPP_HAS_NO_ASAN
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_contiguous_container(
+        const void* __old_mid, const void* __new_mid) const {
+        const void* __begin = data();
+        const void* __end = data() + capacity() + 1;
+        if (!__libcpp_is_constant_evaluated() && __begin != nullptr && is_same<allocator_type, __default_allocator_type>::value)
+          __sanitizer_annotate_contiguous_container(__begin, __end, __old_mid, __new_mid);
+    }
+#else
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
+    __annotate_contiguous_container(const void*, const void*) const {}
+#endif
+
+    // ASan: short string is poisoned if and only if this function returns true.
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __asan_short_string_is_annotated() const _NOEXCEPT {
+      return _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED && !__libcpp_is_constant_evaluated();
+    }
+
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_new(size_type __current_size) const _NOEXCEPT {
+      if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+        __annotate_contiguous_container(data() + capacity() + 1, data() + __current_size + 1);
+    }
+
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_delete() const _NOEXCEPT {
+      if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+        __annotate_contiguous_container(data() + size() + 1, data() + capacity() + 1);
+    }
+
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_increase(size_type __n) const _NOEXCEPT {
+      if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+        __annotate_contiguous_container(data() + size() + 1, data() + size() + 1 + __n);
+    }
+
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_shrink(size_type __old_size) const _NOEXCEPT {
+      if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+        __annotate_contiguous_container(data() + __old_size + 1, data() + size() + 1);
+    }
+
     template <size_type __a> static
         _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
         size_type __align_it(size_type __s) _NOEXCEPT
@@ -1962,6 +2031,7 @@ private:
                 }
                 else
                 {
+                    __annotate_delete();
                     allocator_type __a = __str.__alloc();
                     auto __allocation = std::__allocate_at_least(__a, __str.__get_long_cap());
                     __begin_lifetime(__allocation.ptr, __allocation.count);
@@ -1971,6 +2041,7 @@ private:
                     __set_long_pointer(__allocation.ptr);
                     __set_long_cap(__allocation.count);
                     __set_long_size(__str.size());
+                    __annotate_new(__get_long_size());
                 }
             }
         }
@@ -2018,18 +2089,28 @@ private:
 
     // Assigns the value in __s, guaranteed to be __n < __min_cap in length.
     inline _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_short(const value_type* __s, size_type __n) {
+      size_type __old_size = size();
+      if (__n > __old_size)
+        __annotate_increase(__n - __old_size);
       pointer __p = __is_long()
                         ? (__set_long_size(__n), __get_long_pointer())
                         : (__set_short_size(__n), __get_short_pointer());
       traits_type::move(std::__to_address(__p), __s, __n);
       traits_type::assign(__p[__n], value_type());
+      if (__old_size > __n)
+        __annotate_shrink(__old_size);
       return *this;
     }
 
     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
     basic_string& __null_terminate_at(value_type* __p, size_type __newsz) {
+      size_type __old_size = size();
+      if (__newsz > __old_size)
+        __annotate_increase(__newsz - __old_size);
       __set_size(__newsz);
       traits_type::assign(__p[__newsz], value_type());
+      if (__old_size > __newsz)
+        __annotate_shrink(__old_size);
       return *this;
     }
 
@@ -2136,6 +2217,7 @@ void basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s,
     }
     traits_type::copy(std::__to_address(__p), __s, __sz);
     traits_type::assign(__p[__sz], value_type());
+    __annotate_new(__sz);
 }
 
 template <class _CharT, class _Traits, class _Allocator>
@@ -2164,6 +2246,7 @@ basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_ty
     }
     traits_type::copy(std::__to_address(__p), __s, __sz);
     traits_type::assign(__p[__sz], value_type());
+    __annotate_new(__sz);
 }
 
 template <class _CharT, class _Traits, class _Allocator>
@@ -2188,6 +2271,7 @@ void basic_string<_CharT, _Traits, _Allocator>::__init_copy_ctor_external(
     __set_long_size(__sz);
   }
   traits_type::copy(std::__to_address(__p), __s, __sz + 1);
+  __annotate_new(__sz);
 }
 
 template <class _CharT, class _Traits, class _Allocator>
@@ -2217,6 +2301,7 @@ basic_string<_CharT, _Traits, _Allocator>::__init(size_type __n, value_type __c)
     }
     traits_type::assign(std::__to_address(__p), __n, __c);
     traits_type::assign(__p[__n], value_type());
+    __annotate_new(__n);
 }
 
 template <class _CharT, class _Traits, class _Allocator>
@@ -2243,6 +2328,7 @@ void basic_string<_CharT, _Traits, _Allocator>::__init_with_sentinel(_InputItera
     }
     catch (...)
     {
+        __annotate_delete();
         if (__is_long())
             __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
         throw;
@@ -2298,11 +2384,13 @@ void basic_string<_CharT, _Traits, _Allocator>::__init_with_size(
     }
     catch (...)
     {
+        __annotate_delete();
         if (__is_long())
             __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
         throw;
     }
 #endif  // _LIBCPP_HAS_NO_EXCEPTIONS
+    __annotate_new(__sz);
 }
 
 template <class _CharT, class _Traits, class _Allocator>
@@ -2319,6 +2407,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by_and_replace
     size_type __cap = __old_cap < __ms / 2 - __alignment ?
                           __recommend(std::max(__old_cap + __delta_cap, 2 * __old_cap)) :
                           __ms - 1;
+    __annotate_delete();
     auto __allocation = std::__allocate_at_least(__alloc(), __cap + 1);
     pointer __p = __allocation.ptr;
     __begin_lifetime(__p, __allocation.count);
@@ -2338,6 +2427,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by_and_replace
     __old_sz = __n_copy + __n_add + __sec_cp_sz;
     __set_long_size(__old_sz);
     traits_type::assign(__p[__old_sz], value_type());
+    __annotate_new(__old_cap + __delta_cap);
 }
 
 // __grow_by is deprecated because it does not set the size. It may not update the size when the size is changed, and it
@@ -2360,6 +2450,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by(size_type __old_cap, size_t
     size_type __cap = __old_cap < __ms / 2 - __alignment ?
                           __recommend(std::max(__old_cap + __delta_cap, 2 * __old_cap)) :
                           __ms - 1;
+    __annotate_delete();
     auto __allocation = std::__allocate_at_least(__alloc(), __cap + 1);
     pointer __p = __allocation.ptr;
     __begin_lifetime(__p, __allocation.count);
@@ -2402,10 +2493,15 @@ basic_string<_CharT, _Traits, _Allocator>::__assign_no_alias(
     const value_type* __s, size_type __n) {
   size_type __cap = __is_short ? static_cast<size_type>(__min_cap) : __get_long_cap();
   if (__n < __cap) {
+    size_type __old_size = __is_short ? __get_short_size() : __get_long_size();
+    if (__n > __old_size)
+      __annotate_increase(__n - __old_size);
     pointer __p = __is_short ? __get_short_pointer() : __get_long_pointer();
     __is_short ? __set_short_size(__n) : __set_long_size(__n);
     traits_type::copy(std::__to_address(__p), __s, __n);
     traits_type::assign(__p[__n], value_type());
+    if (__old_size > __n)
+      __annotate_shrink(__old_size);
   } else {
     size_type __sz = __is_short ? __get_short_size() : __get_long_size();
     __grow_by_and_replace(__cap - 1, __n - __cap + 1, __sz, 0, __sz, __n, __s);
@@ -2420,6 +2516,9 @@ basic_string<_CharT, _Traits, _Allocator>::__assign_external(
     const value_type* __s, size_type __n) {
   size_type __cap = capacity();
   if (__cap >= __n) {
+    size_type __old_size = size();
+    if (__n > __old_size)
+      __annotate_increase(__n - __old_size);
     value_type* __p = std::__to_address(__get_pointer());
     traits_type::move(__p, __s, __n);
     return __null_terminate_at(__p, __n);
@@ -2447,11 +2546,15 @@ basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::assign(size_type __n, value_type __c)
 {
     size_type __cap = capacity();
+    size_type __old_size = size();
     if (__cap < __n)
     {
         size_type __sz = size();
         __grow_by_without_replace(__cap, __n - __cap, __sz, 0, __sz);
+        __annotate_increase(__n);
     }
+    else if(__n > __old_size)
+        __annotate_increase(__n - __old_size);
     value_type* __p = std::__to_address(__get_pointer());
     traits_type::assign(__p, __n, __c);
     return __null_terminate_at(__p, __n);
@@ -2462,24 +2565,26 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::operator=(value_type __c)
 {
-    pointer __p;
-    if (__is_long())
-    {
-        __p = __get_long_pointer();
-        __set_long_size(1);
-    }
-    else
-    {
-        __p = __get_short_pointer();
-        __set_short_size(1);
-    }
-    traits_type::assign(*__p, __c);
-    traits_type::assign(*++__p, value_type());
-    return *this;
+  pointer __p;
+  size_type __old_size = size();
+  if (__old_size == 0)
+    __annotate_increase(1);
+  if (__is_long()) {
+    __p = __get_long_pointer();
+    __set_long_size(1);
+  } else {
+    __p = __get_short_pointer();
+    __set_short_size(1);
+  }
+  traits_type::assign(*__p, __c);
+  traits_type::assign(*++__p, value_type());
+  if (__old_size > 1)
+    __annotate_shrink(__old_size);
+  return *this;
 }
 
 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20
+_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::operator=(const basic_string& __str)
 {
@@ -2487,7 +2592,12 @@ basic_string<_CharT, _Traits, _Allocator>::operator=(const basic_string& __str)
     __copy_assign_alloc(__str);
     if (!__is_long()) {
       if (!__str.__is_long()) {
+        size_type __old_size = __get_short_size();
+        if (__get_short_size() < __str.__get_short_size())
+          __annotate_increase(__str.__get_short_size() - __get_short_size());
         __r_.first() = __str.__r_.first();
+        if (__old_size > __get_short_size())
+          __annotate_shrink(__old_size);
       } else {
         return __assign_no_alias<true>(__str.data(), __str.size());
       }
@@ -2513,7 +2623,7 @@ basic_string<_CharT, _Traits, _Allocator>::__move_assign(basic_string& __str, fa
 }
 
 template <class _CharT, class _Traits, class _Allocator>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX20
+inline _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
 void
 basic_string<_CharT, _Traits, _Allocator>::__move_assign(basic_string& __str, true_t...
[truncated]

Copy link

github-actions bot commented Nov 17, 2023

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

@AdvenamTacet
Copy link
Member Author

AdvenamTacet commented Nov 17, 2023

@ldionne Given that the requirement for sanitized libc++ and libc++abi is already an established practice (Memory Sanitizer), could we proceed with merging this branch after addressing all code review concerns, prior to developing a unified approach for shipping instrumented libraries?

Edit: especially that Instrumented libc++ RFC is going into direction of fairly limited changes compared to original plan. And maintaining this patch outside of upstream takes a lot of time
due to fact how many files/functions are modified across the codebase.

@AdvenamTacet AdvenamTacet added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. and removed libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. labels Nov 17, 2023
@ldionne
Copy link
Member

ldionne commented Nov 22, 2023

^ Yes, I think that's a reasonable approach. So we'll assume that libc++ is built with support for asan + container annotations if a user enables these container annotations in their code? Does this mean that -fsanitize=address will stop working out of the box with a non-asan libc++.dylib? That would be a problem. If one has to opt-in to the container stuff that requires ABI synchrony, that's a different story.

@AdvenamTacet
Copy link
Member Author

AdvenamTacet commented Nov 23, 2023

Thank you!

Does this mean that -fsanitize=address will stop working out of the box with a non-asan libc++.dylib?

Now yes, but adding a macro to opt-in or opt-out is trivial, we already have the #if in place, it's enough to add a condition (&& defined(ANNOTATE_STRING_OR_STH)).

I'm in favor of opt-out macro, instead of opt-in. I'm aware of problems resulting from this decision, but using sanitized dylib should be the standard anyway and we should push in that direction.

We can always change opt-out to opt-in before branching, if we get feedback suggesting it.
Making it initially opt-out gives us more feedback now and the experience with previous containers shows that it's very important.

@ldionne let me know what version you prefer and I will add it.

PS
Ofc, not-annotated binary won't work with annotated libc++.dylib as well.

@AdvenamTacet
Copy link
Member Author

AdvenamTacet commented Nov 23, 2023

@philnik777 since your last review at Phabricator, except removing all changes from the file listing ABI functions (as requested on Phabricator), I was doing only maintenance (pull --rebase etc.).

If you see nothing else to fix or only minor things, could you greenlight the review?

@AdvenamTacet
Copy link
Member Author

@ldionne I'd appreciate a review of that PR, please let me know if you have time to look at it anytime soon.

Copy link
Member

@EricWF EricWF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please write specific tests for your change.

I don't think tacking on an assertion to every existing test is the right way to go. I understand the appeal, but I think specific tests written expressly to test this change is the right way to go.

In the past, we've allowed tests to be added like this, but it's resulted in tests that run too long and tests which don't test a single thing.

@AdvenamTacet
Copy link
Member Author

Hey @EricWF , that way of testing ASan is already established in std::deque and std::vector.
Overhead for tests runtime exists only when those tests are compiled with ASan (therefore, only the generic-asan build) plus the overhead for safe_allocator tests (basically min_allocator with cleaning memory).

We did the same thing with std::deque and certain bugs remained undetected before upstreaming, what suggests that ASan tests cannot be less extensive.

Creating a separate set of tests is not feasible, as it would basically require copy-pasting all tests we have, as almost every function has to be carefully tested. And if a new function is added/overloaded in future, tests should be added in two places.

To address performance concerns when running tests with safe_allocator, I can replace min_allocator tests with safe_allocator, as the latter tests everything what the former and extends beyond it (instead of duplicating tests by adding same ones with the safe_allocator, I can replace all instances of min_allocator with safe_allocator wherever safe_allocator is used).
If we don't want two tests, I'm in favor of replacing min_allocator with safe_allocators, but another approach is to guard safe_allocator tests by an ifdef to ensure execution only when compiled with AddressSanitizer.

What do you think?

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As-is, landing this patch will break most users who use -fsanitize=address and use libc++ in their code. Indeed, vendors ship a single version of libc++.dylib right now and that version is not instrumented, which means that the extern template instantiations in libc++.dylib will cause problems.

I'm worried that this will be a significant regression. Ideally, we'd address the issue by having a proper plan for shipping an instrumented library and having the driver select it based on -fsanitize=address. But barring that, we'll need to either

  1. make the string annotations opt-in, or
  2. disable the extern template instantiation declarations under asan
  3. only enable these annotations when the dylib was built with support for ASAN

(1) basically negates the benefit of this patch since almost nobody will turn it on
(2) is still not correct because of ODR violations, but it should make most basic use cases work. It would still be a temporary solution.

I think (3) is probably what makes the most sense, since that wouldn't be a temporary solution. It would only mean that most platforms don't get this support by default because the dylib isn't built with instrumentation, but the minute we ship a library that does support that, everything would still work out of the box.

This could be done by encoding whether ASAN container annotations are supported in the __config_site, and that would then become a configuration-time property of the library.

@ldionne
Copy link
Member

ldionne commented Dec 5, 2023

If we encode the fact that libc++ was built with instrumentation in __config_site, a user should be able to pass -fsanitize=address while building against a library that isn't built with annotations, and the container annotations would be disabled. And if a user uses a dylib that is built with instrumentation and doesn't pass -fsanitize=address, I'm not sure what would happen but we should make it work.

Basically the idea would be to make these container annotations distinct from whether the user is compiling with -fsanitize=address, and in the future we should ensure that we ship an instrumented version of libc++ at all times, and then make -fsanitize=address select that instrumented version.

@AdvenamTacet
Copy link
Member Author

AdvenamTacet commented Dec 5, 2023

And if a user uses a dylib that is built with instrumentation and doesn't pass -fsanitize=address, I'm not sure what would happen but we should make it work.

I don't think we can make it work, linking is going to fail in that case. It already cannot work, string annotations have no impact on it. compiler-rt ASan functions used in dylib are missing during linking without the ASan flag.

  1. disable the extern template instantiation declarations under asan
    [...]
    (2) is still not correct because of ODR violations, but it should make most basic use cases work. It would still be a temporary solution.

While I agree, it's far from perfect, we already assume that ASan has no stable ABI. An advantage of this solution is that all users will now have annotated strings.

But (3) sounds good, if we just have a makro in __config_site, it's future-proof and easy to implement in libc++.
Biggest downside is fact that users who don't compile libc++ with ASan already, they have to be aware of that change to benefit from it, until we figure out how we can automatically choose a dylib based on -fsanitize=address flag.

Edit: I implemented 3, see a relevant commit.

@AdvenamTacet AdvenamTacet force-pushed the string-annotations-base branch 2 times, most recently from 14ee035 to 543b283 Compare December 6, 2023 08:08
@AdvenamTacet AdvenamTacet requested a review from a team as a code owner December 6, 2023 08:08
@llvmbot llvmbot added the libc++abi libc++abi C++ Runtime Library. Not libc++. label Dec 6, 2023
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Dec 19, 2023
This function replaces a call to `__move_assign` (internal function) with two calls to public member functions (`resize` and `erase`). The order of call is chosen for the best performance.

This change is required to turn on ASan string annotations for short strings (Short String Optimization - SSO).

The `std::basic_string` class's `void __move_assign(basic_string&& __str, size_type __pos, size_type __len)` function operates on uninitialized strings, where it is reasonable to assume that the memory is not poisoned. However, in `sstream` this function is applied to existing strings that may already have poisoned memory.

String ASan annotations turned on here: llvm#72677
AdvenamTacet pushed a commit that referenced this pull request Jan 8, 2024
This function replaces a call to `__move_assign` (internal function)
with two calls to public member functions (`resize` and `erase`). The
order of calls is chosen for the best performance.

This change is required to [turn on ASan string annotations for short
strings](#75882) (Short String
Optimization - SSO).

The `std::basic_string` class's `void __move_assign(basic_string&&
__str, size_type __pos, size_type __len)` function operates on
uninitialized strings, where it is reasonable to assume that the memory
is not poisoned. However, in `sstream` this function is applied to
existing strings that already have poisoned memory.

String ASan annotations turned on here:
#72677
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 11, 2024
This commit turns on ASan annotations in `std::basic_string` for all allocators by default.

Originally suggested here: https://reviews.llvm.org/D146214

This commit is part of our efforts to support container annotations with (almost) every allocator.
Annotating `std::basic_string` with default allocator is implemented in llvm#72677.

Support in ASan API exests since llvm@dd1b7b7.
This patch removes the check in std::basic_string annotation member function (__annotate_contiguous_container) to support different allocators.

You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 11, 2024
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7.
You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit that referenced this pull request Jan 13, 2024
…5845)

This commit turns on ASan annotations in `std::basic_string` for all
allocators by default.

Originally suggested here: https://reviews.llvm.org/D146214

String annotations added here:
#72677

This commit is part of our efforts to support container annotations with
(almost) every allocator. Annotating `std::basic_string` with default
allocator is implemented in
#72677.

Additionally it removes `__begin != nullptr` because `data()` should
never return a nullptr.

Support in ASan API exists since
1c5ad6d.
This patch removes the check in std::basic_string annotation member
function (__annotate_contiguous_container) to support different
allocators.

You can turn off annotations for a specific allocator based on changes
from
2fa1bec.

The motivation for a research and those changes was a bug, found by
Trail of Bits, in a real code where an out-of-bounds read could happen
as two strings were compared via a call to `std::equal` that took
`iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom
comparison function). When object `iter1` was longer than `iter2`, read
out-of-bounds on `iter2` could happen. Container sanitization would
detect it.

If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 17, 2024
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7.
You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit that referenced this pull request Jan 18, 2024
This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
#72677

Requires to pass CI without fails:
- #75845
- #75858

Annotating `std::basic_string` with default allocator is implemented in
#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
ampandey-1995 pushed a commit to ampandey-1995/llvm-project that referenced this pull request Jan 19, 2024
This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in
llvm#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
llvm@dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 22, 2024
This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in
llvm#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
llvm@dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit that referenced this pull request Jan 23, 2024
Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- #79065
- #79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
#ifndef NDEBUG
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
#72677

Requires to pass CI without fails:
- #75845
- #75858

Annotating `std::basic_string` with default allocator is implemented in
#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 26, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
)

This function replaces a call to `__move_assign` (internal function)
with two calls to public member functions (`resize` and `erase`). The
order of calls is chosen for the best performance.

This change is required to [turn on ASan string annotations for short
strings](llvm#75882) (Short String
Optimization - SSO).

The `std::basic_string` class's `void __move_assign(basic_string&&
__str, size_type __pos, size_type __len)` function operates on
uninitialized strings, where it is reasonable to assume that the memory
is not poisoned. However, in `sstream` this function is applied to
existing strings that already have poisoned memory.

String ASan annotations turned on here:
llvm#72677
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…vm#75845)

This commit turns on ASan annotations in `std::basic_string` for all
allocators by default.

Originally suggested here: https://reviews.llvm.org/D146214

String annotations added here:
llvm#72677

This commit is part of our efforts to support container annotations with
(almost) every allocator. Annotating `std::basic_string` with default
allocator is implemented in
llvm#72677.

Additionally it removes `__begin != nullptr` because `data()` should
never return a nullptr.

Support in ASan API exists since
llvm@1c5ad6d.
This patch removes the check in std::basic_string annotation member
function (__annotate_contiguous_container) to support different
allocators.

You can turn off annotations for a specific allocator based on changes
from
llvm@2fa1bec.

The motivation for a research and those changes was a bug, found by
Trail of Bits, in a real code where an out-of-bounds read could happen
as two strings were compared via a call to `std::equal` that took
`iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom
comparison function). When object `iter1` was longer than `iter2`, read
out-of-bounds on `iter2` could happen. Container sanitization would
detect it.

If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Feb 1, 2024
Extend `std::basic_string` tests to cover more buffer situations and
length in general, particularly non-SSO cases after SSO test cases
(changing buffers). This commit is a side effect of working on tests for
ASan annotations.

Related PR: llvm/llvm-project#72677

NOKEYCHECK=True
GitOrigin-RevId: c77cdbac9b121611121adf5806a99aff4812a40c
blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Feb 1, 2024
This commit introduces basic annotations for `std::basic_string`,
mirroring the approach used in `std::vector` and `std::deque`.
Initially, only long strings with the default allocator will be
annotated. Short strings (_SSO - short string optimization_) and strings
with non-default allocators will be annotated in the near future, with
separate commits dedicated to enabling them. The process will be similar
to the workflow employed for enabling annotations in `std::deque`.

**Please note**: these annotations function effectively only when libc++
and libc++abi dylibs are instrumented (with ASan). This aligns with the
prevailing behavior of Memory Sanitizer.

To avoid breaking everything, this commit also appends
`_LIBCPP_INSTRUMENTED_WITH_ASAN` to `__config_site` whenever libc++ is
compiled with ASan. If this macro is not defined, string annotations are
not enabled. However, linking a binary that does **not** annotate
strings with a dynamic library that annotates strings, is not permitted.

Originally proposed here: https://reviews.llvm.org/D132769

Related patches on Phabricator:
- Turning on annotations for short strings:
https://reviews.llvm.org/D147680
- Turning on annotations for all allocators:
https://reviews.llvm.org/D146214

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

This Pull Request introduces basic annotations for `std::basic_string`.
Long strings exhibit structural similarities to `std::vector` and will
be annotated accordingly. Short strings are already implemented, but
will be turned on separately in a forthcoming commit. Look at [a
comment](llvm/llvm-project#72677 (comment))
below to read about SSO issues at current moment.

Due to the functionality introduced in
[D132522](llvm/llvm-project@dd1b7b7),
the `__sanitizer_annotate_contiguous_container` function now offers
compatibility with all allocators. However, enabling this support will
be done in a subsequent commit. For the time being, only strings with
the default allocator will be annotated.

If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com

NOKEYCHECK=True
GitOrigin-RevId: 9ed20568e7de53dce85f1631d7d8c1415e7930ae
blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Feb 1, 2024
This function replaces a call to `__move_assign` (internal function)
with two calls to public member functions (`resize` and `erase`). The
order of calls is chosen for the best performance.

This change is required to [turn on ASan string annotations for short
strings](llvm/llvm-project#75882) (Short String
Optimization - SSO).

The `std::basic_string` class's `void __move_assign(basic_string&&
__str, size_type __pos, size_type __len)` function operates on
uninitialized strings, where it is reasonable to assume that the memory
is not poisoned. However, in `sstream` this function is applied to
existing strings that already have poisoned memory.

String ASan annotations turned on here:
llvm/llvm-project#72677

NOKEYCHECK=True
GitOrigin-RevId: 5351ded68d579921a61b26a34e36046c22f668bd
blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Feb 1, 2024
…5845)

This commit turns on ASan annotations in `std::basic_string` for all
allocators by default.

Originally suggested here: https://reviews.llvm.org/D146214

String annotations added here:
llvm/llvm-project#72677

This commit is part of our efforts to support container annotations with
(almost) every allocator. Annotating `std::basic_string` with default
allocator is implemented in
llvm/llvm-project#72677.

Additionally it removes `__begin != nullptr` because `data()` should
never return a nullptr.

Support in ASan API exists since
llvm/llvm-project@1c5ad6d.
This patch removes the check in std::basic_string annotation member
function (__annotate_contiguous_container) to support different
allocators.

You can turn off annotations for a specific allocator based on changes
from
llvm/llvm-project@2fa1bec.

The motivation for a research and those changes was a bug, found by
Trail of Bits, in a real code where an out-of-bounds read could happen
as two strings were compared via a call to `std::equal` that took
`iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom
comparison function). When object `iter1` was longer than `iter2`, read
out-of-bounds on `iter2` could happen. Container sanitization would
detect it.

If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com

NOKEYCHECK=True
GitOrigin-RevId: 60ac394dc9ed617f802b33c3b9ac8881ca6a940c
blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Feb 1, 2024
This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
llvm/llvm-project#72677

Requires to pass CI without fails:
- llvm/llvm-project#75845
- llvm/llvm-project#75858

Annotating `std::basic_string` with default allocator is implemented in
llvm/llvm-project#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
llvm/llvm-project@dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
llvm/llvm-project@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com

NOKEYCHECK=True
GitOrigin-RevId: d06fb0b29c7030497e0e6411cf256cabd71940c2
blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Feb 7, 2024
Originally merged here: llvm/llvm-project#75882
Reverted here: llvm/llvm-project#78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- llvm/llvm-project#79065
- llvm/llvm-project#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
#ifndef NDEBUG
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
llvm/llvm-project#72677

Requires to pass CI without fails:
- llvm/llvm-project#75845
- llvm/llvm-project#75858

Annotating `std::basic_string` with default allocator is implemented in
llvm/llvm-project#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
llvm/llvm-project@dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
llvm/llvm-project@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com

NOKEYCHECK=True
GitOrigin-RevId: cb528ec5e6331ce207c7b835d7ab963bd5e13af7
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request May 5, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request May 5, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request May 5, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com
AdvenamTacet pushed a commit that referenced this pull request May 7, 2024
This pull request is the third iteration aiming to integrate short
string annotations. This commit includes:
- Enabling basic_string annotations for short strings.
- Setting a value of `__trivially_relocatable` in `std::basic_string` to
`false_type` when compiling with ASan (nothing changes when compiling
without ASan). Short string annotations make `std::basic_string` to not
be trivially relocatable, because memory has to be unpoisoned.
- Adding a `_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS` modifier to two
functions.
- Creating a macro `_LIBCPP_ASAN_VOLATILE_WRAPPER` to prevent
problematic stack optimizations (the macro modifies code behavior only
when compiling with ASan).

Previously we had issues with compiler optimization, which we understand
thanks to @vitalybuka. This commit also addresses smaller changes in
short string, since previous upstream attempts.

Problematic optimization was loading two values in code similar to:
```
__is_long() ? __get_long_size() : __get_short_size();
```
We aim to resolve it with the volatile wrapper.

This commit is built on top of two previous attempts which descriptions
are below.

Additionally, in the meantime, annotations were updated (but it
shouldn't have any impact on anything):
- #79292

---

Previous PR: #79049
Reverted:
a16f81f

Previous description:

Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- #79065
- #79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
#72677

Requires to pass CI without fails:
- #75845
- #75858

Annotating `std::basic_string` with default allocator is implemented in
#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++abi libc++abi C++ Runtime Library. Not libc++. libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instrument std::string with ASAN contiguous container annotations.
5 participants