-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppress a redundant hardening check in basic_string_view::substr #91804
Suppress a redundant hardening check in basic_string_view::substr #91804
Conversation
Fixes llvm#91634. This could alternatively be done with an _LIBCPP_ASSUME, after llvm#91801 lands, but would also require llvm#91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.
@llvm/pr-subscribers-libcxx Author: David Benjamin (davidben) ChangesFixes #91634. This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor. Full diff: https://github.com/llvm/llvm-project/pull/91804.diff 1 Files Affected:
diff --git a/libcxx/include/string_view b/libcxx/include/string_view
index 5c4bec742bafa..2c9703cedfd2a 100644
--- a/libcxx/include/string_view
+++ b/libcxx/include/string_view
@@ -449,8 +449,11 @@ public:
}
_LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI basic_string_view substr(size_type __pos = 0, size_type __n = npos) const {
+ // Use the `__assume_valid` form of the constructor to avoid an unnecessary check. Any substring of a view is a
+ // valid view. In particular, `size()` is known to be smaller than `numeric_limits<difference_type>::max()`, so the
+ // new size is also smaller. See also https://github.com/llvm/llvm-project/issues/91634.
return __pos > size() ? (__throw_out_of_range("string_view::substr"), basic_string_view())
- : basic_string_view(data() + __pos, std::min(__n, size() - __pos));
+ : basic_string_view(data() + __pos, std::min(__n, size() - __pos), __assume_valid{});
}
_LIBCPP_CONSTEXPR_SINCE_CXX14 int compare(basic_string_view __sv) const _NOEXCEPT {
@@ -675,6 +678,16 @@ public:
#endif
private:
+ struct __assume_valid {};
+
+ // This is the same as the pointer and length constructor, but it does not include additionally hardening checks. It
+ // is intended for use within the class, when the class invariants already imply the resulting object is valid. The
+ // compiler usually cannot apply this optimization itself, because it does not know class invariants.
+ _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI
+ basic_string_view(const _CharT* __s, size_type __len, __assume_valid) _NOEXCEPT
+ : __data_(__s),
+ __size_(__len) {}
+
const value_type* __data_;
size_type __size_;
};
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you benchmark the difference in performance?
IMO adding additional private constructors to avoid hardening tests feels like the wrong direction since it adds complexity to the code.
I'm happy to put together a microbenchmark that will definitely show this (I mean In order for C++ projects to replace the unsafe patterns with safe ones with minimal handwringing, we need to eliminate the redundant checks so that there is budget for the non-redundant ones. The hardening check here is redundant by way of a type invariant. The problem is simply that the compiler cannot infer type invariants. We can do this with |
I don't disagree that the hardening check is redundant in this case. |
Yes, we want to eliminate all redundant safety checks. Whether we should in each instance is a engineering tradeoff between performance and risk. The point of safety checks isn't to check just because. It should generally be a goal that, when a safety check is truly redundant (i.e. one can prove that it is always true) that it is eliminated. Such a check is pure overhead, and there are a lot of such instances. Most code is mostly correct, and so most naively-applied safety checks are redundant. That doesn't mean we don't safety checks. The key word here is "most". Humans are fallible and, as we scale out to large C++ programs using the STL, there will be failed preconditions. When those preconditions fail even once, the results are catastrophic for users. We need safety checks. However, this balance of extremes means there are complex tradeoffs here. Safety checks are not expensive but, in aggregate and in hot paths, the costs of safety checks can add up. That is why it is important that most redundant ones be eliminated, so that projects have the performance budget for the ones that are either not redundant, or at least not obviously so. The best way to eliminate safety checks is via automatic compiler optimizations. That is one piece of code that we need to get right once, and then a whole class of redundant checks go away. Because usually the redundancy is locally obvious. E.g.
It is totally fine that Other times, however, that is not possible. When that happens, we have a choice to make:
Now, the invariant here is such that neither (1) nor (2) is not currently viable. Our language does not have the tools to capture that In these cases, ideally the balance would tilt towards (4) as much as possible. The consequences of getting these wrong as so extreme that we generally don't want to take on that risk. But, for this function and for this check, I think (3) is the clear winner: First, we're within the STL, and in Second, we must look at the nature of the risk. This is not a safety check for bugs in the calling code. (If it were, I would pretty much always argue for (4).) This is a safety check for bugs in libc++. While that is indeed possible, I think it's quite obvious why the precondition holds here. Moreover, that safety check is to catch things like -1 accidentally making its way into a Finally, we look at the second-order effects. Because this is specifically |
FWIW I think this makes sense. I wouldn't want to start doing this all over the place though, as a mere maintenance concern. |
Agreed, yeah. If we find ourselves wanting to do this a lot, we probably need to start investing in better tools, like a smarter compiler or whatever else we need. |
E.g., if we had infinite time to burn on compiler smarts, we could make some kind of |
@davidben FWIW, I fully agree with your writeup -- we should try to avoid redundant overhead since this frees up our performance budget for other checks and makes enabling the feature more palatable. I also agree that very small inefficiencies can add up. While there is certainly a tradeoff involved (I'm concerned less about the extra code but more about the fact that hardening concerns affect the normal code flow), I think it's a reasonable pattern to use internal functions within the implementation that assume that invariants hold (and I'm pretty sure we do use that pattern in some existing parts of our code). @mordante Do you still have concerns about this patch? Similar to @davidben, I also feel that while a microbenchmark would be easy enough to produce, this difference will only really show up in a real application; even then I'll acknowledge that it's unlikely to be huge, but IMO little inefficiencies can easily add up, and it can be difficult to debug when there is no single stand-out culprit. I feel we should prevent that from happening in the first place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidben LGTM, thanks!
I had some minor nits about one of the comments, and since I know you're not available this week from the other thread, I just went ahead and applied the change -- please let me know if you have any objections!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not too thrilled about this patch, but not too unhappy to keep blocking it.
There is one review comment to address before landing.
…1804) Summary: Fixes #91634. This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251551
Fixes #91634.
This could alternatively be done with an _LIBCPP_ASSUME, after #91801 lands, but would also require #91619 be fixed first. Given the dependencies, it seemed simplest to just make a private ctor.