Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced std::string with std::string_view and removed excessive copies in cudf::io #17734

Merged
merged 19 commits into from
Feb 4, 2025

Conversation

lamarrr
Copy link
Contributor

@lamarrr lamarrr commented Jan 14, 2025

Description

As part of the improvement effort discussed in #15907, this merge request removes some of the excessive std::string copies and uses std::string_view in place of std::string when the lifetime semantics are clear.

std::string is only replaced in this MR in linear functions and constructors, but not in structs as there's no established ownership or lifetime semantics to guarantee the string_views will not outlive their source.
There were also some cases of excessive copies, i.e. consider:

struct source_info{
source_info(std::string const& s) : str{s}{}

private:
std::string str;
};

In the above example, the string is likely to be allocated twice if a temporary/string-literal is used to construct "s": one for the temporary and one for the copy constructor for str

struct source_info{
source_info(std::string  s) : str{std::move(s)}{}

private:
std::string str;
};

The string is only allocated once in all scenarios.
This also applies to std::vector and is arguably worse as there's no small-vector-optimization (i.e. std::string's small-string-optimization/SSO).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@lamarrr lamarrr requested a review from a team as a code owner January 14, 2025 16:23
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 14, 2025
@lamarrr lamarrr changed the title replaced std::string with std::string_view and removed excessive copies in cudf::io Replaced std::string with std::string_view and removed excessive copies in cudf::io Jan 14, 2025
@lamarrr lamarrr added feature request New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed feature request New feature or request labels Jan 14, 2025
@mhaseeb123
Copy link
Member

Nice optimization. General question, many of the strings in the PR (except for paths etc) are very tiny (< 10 chars) which may not impact performance or space otherwise. Should we just leave them as is?

@lamarrr
Copy link
Contributor Author

lamarrr commented Jan 15, 2025

Nice optimization. General question, many of the strings in the PR (except for paths etc) are very tiny (< 10 chars) which may not impact performance or space otherwise. Should we just leave them as is?

only a few, most of them aren't tiny (<16 characters) in the general case, regardless, incurring extra allocations even from small strings (>16 && < 1'000) when they aren't needed isn't ideal either as they lead to memory fragmentation for long-running programs.

@mhaseeb123 mhaseeb123 removed their assignment Jan 15, 2025
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. But I still have a different opinion that std::string_view should not be used in associative containers.

@mhaseeb123
Copy link
Member

Looks like we have build errors from replacing strings with string_views at certain places!

@mhaseeb123
Copy link
Member

Looks like we have build errors from replacing strings with string_views at certain places!

Looks like we are getting some runtime errors in libcudf with this PR as well

Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some python tests are now failing due to extra " chars in some of the output cols

@mhaseeb123 mhaseeb123 requested a review from vuule January 30, 2025 01:22
@lamarrr lamarrr requested a review from davidwendt January 30, 2025 14:08
@lamarrr lamarrr changed the base branch from branch-25.02 to branch-25.04 February 3, 2025 15:12
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
@lamarrr
Copy link
Contributor Author

lamarrr commented Feb 4, 2025

/merge

@rapids-bot rapids-bot bot merged commit 7baf1e9 into rapidsai:branch-25.04 Feb 4, 2025
107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

7 participants