Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress a redundant hardening check in basic_string_view::substr #91804

Merged
merged 5 commits into from
Jul 23, 2024

Conversation

davidben
Copy link
Contributor

Fixes #91634.

This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.

Fixes llvm#91634.

This could alternatively be done with an _LIBCPP_ASSUME, after
llvm#91801 lands, but would also
require llvm#91619 be fixed
first. Given the dependencies, it seemed simplest to just make a private
ctor.
@davidben davidben requested a review from a team as a code owner May 10, 2024 20:33
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label May 10, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 10, 2024

@llvm/pr-subscribers-libcxx

Author: David Benjamin (davidben)

Changes

Fixes #91634.

This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.


Full diff: https://github.com/llvm/llvm-project/pull/91804.diff

1 Files Affected:

  • (modified) libcxx/include/string_view (+14-1)
diff --git a/libcxx/include/string_view b/libcxx/include/string_view
index 5c4bec742bafa..2c9703cedfd2a 100644
--- a/libcxx/include/string_view
+++ b/libcxx/include/string_view
@@ -449,8 +449,11 @@ public:
   }
 
   _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI basic_string_view substr(size_type __pos = 0, size_type __n = npos) const {
+    // Use the `__assume_valid` form of the constructor to avoid an unnecessary check. Any substring of a view is a
+    // valid view. In particular, `size()` is known to be smaller than `numeric_limits<difference_type>::max()`, so the
+    // new size is also smaller. See also https://github.com/llvm/llvm-project/issues/91634.
     return __pos > size() ? (__throw_out_of_range("string_view::substr"), basic_string_view())
-                          : basic_string_view(data() + __pos, std::min(__n, size() - __pos));
+                          : basic_string_view(data() + __pos, std::min(__n, size() - __pos), __assume_valid{});
   }
 
   _LIBCPP_CONSTEXPR_SINCE_CXX14 int compare(basic_string_view __sv) const _NOEXCEPT {
@@ -675,6 +678,16 @@ public:
 #endif
 
 private:
+  struct __assume_valid {};
+
+  // This is the same as the pointer and length constructor, but it does not include additionally hardening checks. It
+  // is intended for use within the class, when the class invariants already imply the resulting object is valid. The
+  // compiler usually cannot apply this optimization itself, because it does not know class invariants.
+  _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI
+  basic_string_view(const _CharT* __s, size_type __len, __assume_valid) _NOEXCEPT
+      : __data_(__s),
+        __size_(__len) {}
+
   const value_type* __data_;
   size_type __size_;
 };

Copy link
Member

@mordante mordante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you benchmark the difference in performance?

IMO adding additional private constructors to avoid hardening tests feels like the wrong direction since it adds complexity to the code.

@davidben
Copy link
Contributor Author

I'm happy to put together a microbenchmark that will definitely show this (I mean substr is barely any instructions!), but I don't think that's a great use of time. This should be looked at in the context of getting C++ to a bounds-safe place, libc++ hardening, and -Wunsafe-buffer-usage. substr (respectively, subspan) is the bounds-safe version of pointer arithmetic in string_view (respectively, span).

In order for C++ projects to replace the unsafe patterns with safe ones with minimal handwringing, we need to eliminate the redundant checks so that there is budget for the non-redundant ones. The hardening check here is redundant by way of a type invariant. The problem is simply that the compiler cannot infer type invariants. We can do this with _LIBCPP_ASSUME instead, but as noted, it requires #91801 and #91619. A private constructor gives more reliable performance across compilers, so I went with that one.

@mordante
Copy link
Member

I don't disagree that the hardening check is redundant in this case.
The safety checks were sold as so little overhead it could be used without issues in production.
My concern is with adding code complexity and the precedent it sets.
I expect there are redundant checks in other places too, do we want to eliminate all of them?

@davidben
Copy link
Contributor Author

davidben commented May 21, 2024

I expect there are redundant checks in other places too, do we want to eliminate all of them?

Yes, we want to eliminate all redundant safety checks. Whether we should in each instance is a engineering tradeoff between performance and risk.

The point of safety checks isn't to check just because. It should generally be a goal that, when a safety check is truly redundant (i.e. one can prove that it is always true) that it is eliminated. Such a check is pure overhead, and there are a lot of such instances. Most code is mostly correct, and so most naively-applied safety checks are redundant.

That doesn't mean we don't safety checks. The key word here is "most". Humans are fallible and, as we scale out to large C++ programs using the STL, there will be failed preconditions. When those preconditions fail even once, the results are catastrophic for users. We need safety checks.

However, this balance of extremes means there are complex tradeoffs here. Safety checks are not expensive but, in aggregate and in hot paths, the costs of safety checks can add up. That is why it is important that most redundant ones be eliminated, so that projects have the performance budget for the ones that are either not redundant, or at least not obviously so. The best way to eliminate safety checks is via automatic compiler optimizations. That is one piece of code that we need to get right once, and then a whole class of redundant checks go away. Because usually the redundancy is locally obvious. E.g.

   std::optional<T> whatever = ...;
   if (whatever) {
     whatever->do_thing();
  }

It is totally fine that operator-> has a safety check because the compiler can easily see the check is redundant and delete it.

Other times, however, that is not possible. When that happens, we have a choice to make:

  1. Improve the compiler so that it does see it.
  2. Find another way to structure the code such that the compiler can see it.
  3. Rely on fallible humans to assert that the invariant holds (i.e. take on a risk) and use an unsafe pattern that suppresses the check.
  4. Live with this check and take on the performance cost.

Now, the invariant here is such that neither (1) nor (2) is not currently viable. Our language does not have the tools to capture that size_ < PTRDIFF_MAX. We have to pick between (3) and (4). This is an engineering tradeoff. We can choose to take on slightly more risk for better performance, or we can choose to lose performance and reduce the risk.

In these cases, ideally the balance would tilt towards (4) as much as possible. The consequences of getting these wrong as so extreme that we generally don't want to take on that risk. But, for this function and for this check, I think (3) is the clear winner:

First, we're within the STL, and in string_view::substr no less. This is a function we can expect to be called throughout C++ code, within parsers and whatnot to peel off substrings of your bigger structure. Performances costs here scale.

Second, we must look at the nature of the risk. This is not a safety check for bugs in the calling code. (If it were, I would pretty much always argue for (4).) This is a safety check for bugs in libc++. While that is indeed possible, I think it's quite obvious why the precondition holds here. Moreover, that safety check is to catch things like -1 accidentally making its way into a string_view constructor and wrapping around to SIZE_MAX. That's not a concern here. Therefore the risk we take on is minimal.

Finally, we look at the second-order effects. Because this is specifically string_view::substr, avoiding the performance hit relieves some performance budget in our callers for them to then spend on safer constructs elsewhere, where the tradeoff between (3) and (4) may be less clear. That is overall a safety-positive outcome.

@ldionne
Copy link
Member

ldionne commented Jun 25, 2024

FWIW I think this makes sense. I wouldn't want to start doing this all over the place though, as a mere maintenance concern.

@davidben
Copy link
Contributor Author

I wouldn't want to start doing this all over the place though, as a mere maintenance concern.

Agreed, yeah. If we find ourselves wanting to do this a lot, we probably need to start investing in better tools, like a smarter compiler or whatever else we need.

@davidben
Copy link
Contributor Author

better tools

E.g., if we had infinite time to burn on compiler smarts, we could make some kind of size_t [[clang::integer_bounds(0, PTRDIFF_MAX)]]. Combine that with a fix for #91619 and I think the compiler could figure that out. But we don't have that right now. :-)

@var-const
Copy link
Member

@davidben FWIW, I fully agree with your writeup -- we should try to avoid redundant overhead since this frees up our performance budget for other checks and makes enabling the feature more palatable. I also agree that very small inefficiencies can add up. While there is certainly a tradeoff involved (I'm concerned less about the extra code but more about the fact that hardening concerns affect the normal code flow), I think it's a reasonable pattern to use internal functions within the implementation that assume that invariants hold (and I'm pretty sure we do use that pattern in some existing parts of our code).

@mordante Do you still have concerns about this patch? Similar to @davidben, I also feel that while a microbenchmark would be easy enough to produce, this difference will only really show up in a real application; even then I'll acknowledge that it's unlikely to be huge, but IMO little inefficiencies can easily add up, and it can be difficult to debug when there is no single stand-out culprit. I feel we should prevent that from happening in the first place.

@var-const var-const added the hardening Issues related to the hardening effort label Jul 23, 2024
Copy link
Member

@var-const var-const left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidben LGTM, thanks!

I had some minor nits about one of the comments, and since I know you're not available this week from the other thread, I just went ahead and applied the change -- please let me know if you have any objections!

Copy link
Member

@mordante mordante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not too thrilled about this patch, but not too unhappy to keep blocking it.
There is one review comment to address before landing.

libcxx/include/string_view Outdated Show resolved Hide resolved
@var-const var-const merged commit 795a47f into llvm:main Jul 23, 2024
9 of 11 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…1804)

Summary:
Fixes #91634.

This could alternatively be done with an _LIBCPP_ASSUME, after
#91801 lands, but would also
require #91619 be fixed
first. Given the dependencies, it seemed simplest to just make a private
ctor.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251551
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hardening Issues related to the hardening effort libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std::string_view::substr triggers unnecessary ctor safety checks in libc++ safe mode
5 participants