Suppress a redundant hardening check in basic_string_view::substr #91804

davidben · 2024-05-10T20:33:39Z

This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.

Fixes llvm#91634. This could alternatively be done with an _LIBCPP_ASSUME, after llvm#91801 lands, but would also require llvm#91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.

llvmbot · 2024-05-10T20:34:10Z

@llvm/pr-subscribers-libcxx

Author: David Benjamin (davidben)

Changes

Fixes #91634.

This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.

Full diff: https://github.com/llvm/llvm-project/pull/91804.diff

1 Files Affected:

(modified) libcxx/include/string_view (+14-1)

diff --git a/libcxx/include/string_view b/libcxx/include/string_view
index 5c4bec742bafa..2c9703cedfd2a 100644
--- a/libcxx/include/string_view
+++ b/libcxx/include/string_view
@@ -449,8 +449,11 @@ public:
   }
 
   _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI basic_string_view substr(size_type __pos = 0, size_type __n = npos) const {
+    // Use the `__assume_valid` form of the constructor to avoid an unnecessary check. Any substring of a view is a
+    // valid view. In particular, `size()` is known to be smaller than `numeric_limits<difference_type>::max()`, so the
+    // new size is also smaller. See also https://github.com/llvm/llvm-project/issues/91634.
     return __pos > size() ? (__throw_out_of_range("string_view::substr"), basic_string_view())
-                          : basic_string_view(data() + __pos, std::min(__n, size() - __pos));
+                          : basic_string_view(data() + __pos, std::min(__n, size() - __pos), __assume_valid{});
   }
 
   _LIBCPP_CONSTEXPR_SINCE_CXX14 int compare(basic_string_view __sv) const _NOEXCEPT {
@@ -675,6 +678,16 @@ public:
 #endif
 
 private:
+  struct __assume_valid {};
+
+  // This is the same as the pointer and length constructor, but it does not include additionally hardening checks. It
+  // is intended for use within the class, when the class invariants already imply the resulting object is valid. The
+  // compiler usually cannot apply this optimization itself, because it does not know class invariants.
+  _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI
+  basic_string_view(const _CharT* __s, size_type __len, __assume_valid) _NOEXCEPT
+      : __data_(__s),
+        __size_(__len) {}
+
   const value_type* __data_;
   size_type __size_;
 };

mordante

Did you benchmark the difference in performance?

IMO adding additional private constructors to avoid hardening tests feels like the wrong direction since it adds complexity to the code.

davidben · 2024-05-16T17:30:20Z

I'm happy to put together a microbenchmark that will definitely show this (I mean substr is barely any instructions!), but I don't think that's a great use of time. This should be looked at in the context of getting C++ to a bounds-safe place, libc++ hardening, and -Wunsafe-buffer-usage. substr (respectively, subspan) is the bounds-safe version of pointer arithmetic in string_view (respectively, span).

In order for C++ projects to replace the unsafe patterns with safe ones with minimal handwringing, we need to eliminate the redundant checks so that there is budget for the non-redundant ones. The hardening check here is redundant by way of a type invariant. The problem is simply that the compiler cannot infer type invariants. We can do this with _LIBCPP_ASSUME instead, but as noted, it requires #91801 and #91619. A private constructor gives more reliable performance across compilers, so I went with that one.

mordante · 2024-05-21T16:55:44Z

I don't disagree that the hardening check is redundant in this case.
The safety checks were sold as so little overhead it could be used without issues in production.
My concern is with adding code complexity and the precedent it sets.
I expect there are redundant checks in other places too, do we want to eliminate all of them?

davidben · 2024-05-21T17:26:54Z

I expect there are redundant checks in other places too, do we want to eliminate all of them?

Yes, we want to eliminate all redundant safety checks. Whether we should in each instance is a engineering tradeoff between performance and risk.

The point of safety checks isn't to check just because. It should generally be a goal that, when a safety check is truly redundant (i.e. one can prove that it is always true) that it is eliminated. Such a check is pure overhead, and there are a lot of such instances. Most code is mostly correct, and so most naively-applied safety checks are redundant.

That doesn't mean we don't safety checks. The key word here is "most". Humans are fallible and, as we scale out to large C++ programs using the STL, there will be failed preconditions. When those preconditions fail even once, the results are catastrophic for users. We need safety checks.

However, this balance of extremes means there are complex tradeoffs here. Safety checks are not expensive but, in aggregate and in hot paths, the costs of safety checks can add up. That is why it is important that most redundant ones be eliminated, so that projects have the performance budget for the ones that are either not redundant, or at least not obviously so. The best way to eliminate safety checks is via automatic compiler optimizations. That is one piece of code that we need to get right once, and then a whole class of redundant checks go away. Because usually the redundancy is locally obvious. E.g.

   std::optional<T> whatever = ...;
   if (whatever) {
     whatever->do_thing();
  }

It is totally fine that operator-> has a safety check because the compiler can easily see the check is redundant and delete it.

Other times, however, that is not possible. When that happens, we have a choice to make:

Improve the compiler so that it does see it.
Find another way to structure the code such that the compiler can see it.
Rely on fallible humans to assert that the invariant holds (i.e. take on a risk) and use an unsafe pattern that suppresses the check.
Live with this check and take on the performance cost.

Now, the invariant here is such that neither (1) nor (2) is not currently viable. Our language does not have the tools to capture that size_ < PTRDIFF_MAX. We have to pick between (3) and (4). This is an engineering tradeoff. We can choose to take on slightly more risk for better performance, or we can choose to lose performance and reduce the risk.

In these cases, ideally the balance would tilt towards (4) as much as possible. The consequences of getting these wrong as so extreme that we generally don't want to take on that risk. But, for this function and for this check, I think (3) is the clear winner:

First, we're within the STL, and in string_view::substr no less. This is a function we can expect to be called throughout C++ code, within parsers and whatnot to peel off substrings of your bigger structure. Performances costs here scale.

Second, we must look at the nature of the risk. This is not a safety check for bugs in the calling code. (If it were, I would pretty much always argue for (4).) This is a safety check for bugs in libc++. While that is indeed possible, I think it's quite obvious why the precondition holds here. Moreover, that safety check is to catch things like -1 accidentally making its way into a string_view constructor and wrapping around to SIZE_MAX. That's not a concern here. Therefore the risk we take on is minimal.

Finally, we look at the second-order effects. Because this is specifically string_view::substr, avoiding the performance hit relieves some performance budget in our callers for them to then spend on safer constructs elsewhere, where the tradeoff between (3) and (4) may be less clear. That is overall a safety-positive outcome.

ldionne · 2024-06-25T15:37:47Z

FWIW I think this makes sense. I wouldn't want to start doing this all over the place though, as a mere maintenance concern.

davidben · 2024-06-25T16:43:35Z

I wouldn't want to start doing this all over the place though, as a mere maintenance concern.

Agreed, yeah. If we find ourselves wanting to do this a lot, we probably need to start investing in better tools, like a smarter compiler or whatever else we need.

davidben · 2024-06-25T16:46:05Z

better tools

E.g., if we had infinite time to burn on compiler smarts, we could make some kind of size_t [[clang::integer_bounds(0, PTRDIFF_MAX)]]. Combine that with a fix for #91619 and I think the compiler could figure that out. But we don't have that right now. :-)

var-const · 2024-07-23T00:37:58Z

@davidben FWIW, I fully agree with your writeup -- we should try to avoid redundant overhead since this frees up our performance budget for other checks and makes enabling the feature more palatable. I also agree that very small inefficiencies can add up. While there is certainly a tradeoff involved (I'm concerned less about the extra code but more about the fact that hardening concerns affect the normal code flow), I think it's a reasonable pattern to use internal functions within the implementation that assume that invariants hold (and I'm pretty sure we do use that pattern in some existing parts of our code).

@mordante Do you still have concerns about this patch? Similar to @davidben, I also feel that while a microbenchmark would be easy enough to produce, this difference will only really show up in a real application; even then I'll acknowledge that it's unlikely to be huge, but IMO little inefficiencies can easily add up, and it can be difficult to debug when there is no single stand-out culprit. I feel we should prevent that from happening in the first place.

var-const

@davidben LGTM, thanks!

I had some minor nits about one of the comments, and since I know you're not available this week from the other thread, I just went ahead and applied the change -- please let me know if you have any objections!

mordante

I'm still not too thrilled about this patch, but not too unhappy to keep blocking it.
There is one review comment to address before landing.

libcxx/include/string_view

…1804) Summary: Fixes #91634. This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251551

Suppress a redundant hardening check in basic_string_view::substr

f71a3d9

Fixes llvm#91634. This could alternatively be done with an _LIBCPP_ASSUME, after llvm#91801 lands, but would also require llvm#91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.

davidben requested a review from a team as a code owner May 10, 2024 20:33

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label May 10, 2024

mordante requested changes May 16, 2024

View reviewed changes

ldionne assigned var-const Jun 25, 2024

var-const added 2 commits July 22, 2024 17:43

Slightly tweak a comment

350b01e

Don't use list initialization to hopefully fix the C++03 mode

aafa186

var-const added the hardening Issues related to the hardening effort label Jul 23, 2024

Merge branch 'main' into libcxx-string-view-assume-valid

cb85b24

var-const approved these changes Jul 23, 2024

View reviewed changes

mordante approved these changes Jul 23, 2024

View reviewed changes

libcxx/include/string_view Outdated Show resolved Hide resolved

Feedback

15e7af7

var-const merged commit 795a47f into llvm:main Jul 23, 2024
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suppress a redundant hardening check in basic_string_view::substr #91804

Suppress a redundant hardening check in basic_string_view::substr #91804

davidben commented May 10, 2024

llvmbot commented May 10, 2024

mordante left a comment

davidben commented May 16, 2024

mordante commented May 21, 2024

davidben commented May 21, 2024 •

edited

Loading

ldionne commented Jun 25, 2024

davidben commented Jun 25, 2024

davidben commented Jun 25, 2024

var-const commented Jul 23, 2024

var-const left a comment

mordante left a comment

Suppress a redundant hardening check in basic_string_view::substr #91804

Suppress a redundant hardening check in basic_string_view::substr #91804

Conversation

davidben commented May 10, 2024

llvmbot commented May 10, 2024

mordante left a comment

Choose a reason for hiding this comment

davidben commented May 16, 2024

mordante commented May 21, 2024

davidben commented May 21, 2024 • edited Loading

ldionne commented Jun 25, 2024

davidben commented Jun 25, 2024

davidben commented Jun 25, 2024

var-const commented Jul 23, 2024

var-const left a comment

Choose a reason for hiding this comment

mordante left a comment

Choose a reason for hiding this comment

davidben commented May 21, 2024 •

edited

Loading