Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string_view and string conversion operators and functions to cstring #4676

Merged
merged 2 commits into from
May 27, 2024

Conversation

vlstill
Copy link
Contributor

@vlstill vlstill commented May 24, 2024

The cstring type is used a lot in the codebase as it is the type of names in the IR. It is, however not a good idea to perform operations on cstring unless the result of the operations is intended to be persistent. In these cases, the operations should be rather performed on std::string or std::string_view (based on what kind of operations is desired). Note that creating a std::string_view from a cstring is currently always safe as cstring is persistent.

This PR adds explicit operators for conversion to both string and string_view as well as functions which serve the same purpose. I've opted to add these too as I find it syntactically more pleasing to do something like dec->name.name.string() than std::string(dec->name.name).

@vlstill vlstill self-assigned this May 24, 2024
@vlstill vlstill added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label May 24, 2024
@vlstill vlstill requested review from asl and fruffy May 24, 2024 15:10
lib/cstring.cpp Outdated Show resolved Hide resolved
@@ -165,6 +166,12 @@ class cstring {
const char *c_str() const { return str; }
operator const char *() const { return str; }

std::string string() const { return std::string(str); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this mess with our C++20 conversion efforts? I remember we ran into some issues related to ambiguous string conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not as all the added conversions are explicit. Ideally we would make the existing conversions explicit too, but that is probably a no-go (I've tried looking into making cstring -> IR::ID explicit and the number of changes that would be required was HUGE).

As for C++20, I think we will probably have to name all the combinations of comparison operators for the C++20 to work (the exact match has always precedence over cast, so we would need all comparsions between cstring, IR::ID at least (std::string and std::string_view too). Luckily defining them in just one direction should be enough, but unluckily the <=> is not in GCC 9 but in GCC 10, so that might be a problem. We would also need to make sure there is no possibility to get clashes by flipping argument order. I haven't tried that yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just bike-shedding: won't it be better to name str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str is the name of the (private) member that holds the actual const char * data. So that would require renaming it. Also I don't like .sv() as a .string_view() and if we have string_view() it is more consistent to have .string(). Naming is hard...

lib/cstring.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks!

I started to untangle cstring usage in diagnostics where they are created for no particular reasons. This is the correct step forward.

@asl asl mentioned this pull request May 24, 2024
@@ -203,8 +203,7 @@ cstring cstring::before(const char *at) const { return substr(0, at - str); }

cstring cstring::substr(size_t start, size_t length) const {
if (size() <= start) return cstring::empty;
std::string s = str;
return s.substr(start, length);
return cstring(string_view().substr(start, length));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe return a string_view and have a constructor from a string_view? Not sure if that would break anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this. I'll try to take a look at that later, but I am not very happy about the idea of introducing a new implicit cast to cstring and without that I'm quite sure this would break something.

@vlstill vlstill added this pull request to the merge queue May 27, 2024
Merged via the queue into main with commit e1d5766 May 27, 2024
17 checks passed
@vlstill vlstill deleted the vstill/cstring-str-conv branch May 27, 2024 13:59
explicit operator std::string() const { return string(); }

std::string_view string_view() const { return std::string_view(str); }
explicit operator std::string_view() const { return string_view(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it... What if we do the opposite?

  • Conversion to string_view should be implicit
  • Construction from const char* should be explicit

Here is the rationale: currently it is not possible to have a string_view function argument if the function is assumed to accept both cstring and const char*. E.g. the following:

void foo(cstring);
void foo(string_view);

foo("bar")

Is ambiguous as cstring could be constructed from char* implicitly, string_view could be constructed from char* implicitly, but we cannot drop the cstring overload as cstring cannot be implicitly converted to string_view.

Certainly, there are lots of examples in codebase that do e.g.

cstring getName() const {  return "Foo"; }

that will be broken. But IMO these should be all fixed to cstring::literal call and explicit conversion to cstring should emphasize that this is not a cheap operation (involving strlen + map lookup).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Topics concerning the core segments of the compiler (frontend, midend, parser)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants