-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] move creation of character vectors in some methods to C++ side #4256
Merged
Merged
Changes from 6 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
2aab103
[R-package] move creation of character vectors in some methods to C++…
jameslamb 3d6e989
convert LGBM_BoosterGetEvalNames_R
jameslamb 63856c2
convert LGBM_BoosterDumpModel_R and LGBM_BoosterSaveModelToString_R
jameslamb 826ae22
remove debugging code
jameslamb 1f07b05
update docs
jameslamb f84b84d
remove comment
jameslamb 6e80210
Merge branch 'master' into r/encode-char
jameslamb 147e1c2
add handling for larger model strings
jameslamb c272765
handle large strings in feature and eval names
jameslamb 31ce4bd
got long feature names working
jameslamb 1ca0f5e
more fixes
jameslamb d5a47aa
linting
jameslamb 179bcec
resize
jameslamb d5f5178
Apply suggestions from code review
jameslamb 26599ea
Merge branch 'master' into r/encode-char
jameslamb 6430f62
stricter test
jameslamb 63d7acd
Merge branch 'master' into r/encode-char
jameslamb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
1024 * 1024
is not enough to save some big model? With removed "try with default len, repeat with actual if not enough" (if (act_len > buf_len)
) this now looks like a regression compared to the current fully correct implementation.C API docs says:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OHHHH I see now, thank you for that explanation. I misunderstood the purpose of the code on the R side that was calling this function twice.
Ok yes you're right, that work needs to be done here. Will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've made these changes in recent commits.
I'm really glad you pointed this out, because it also made me realize an opportunity to allow larger feature names! Right now, the code in
LGBM_DatasetGetFeatureNames_R
onmaster
will not allow any feature names longer than 256 characters.LightGBM/R-package/src/lightgbm_R.cpp
Line 163 in f831808
LightGBM/R-package/src/lightgbm_R.cpp
Line 179 in f831808
The code below throws an error on
{lightgbm}
3.2.1, but works as of this branch.But it should be possible to! Based on
LightGBM/include/LightGBM/c_api.h
Line 295 in f831808
So I've updated the calls to
LGBM_DatasetGetFeatureNames_R
andLGBM_BoosterGetEvalNames_R
to retry with a larger buffer on long names.I've added tests to this PR to check that this is working as expected.
LGBM_DatasetGetFeatureNames_R()
:test_dataset.R
LGBM_BoosterGetEvalNames_R()
: I could not find a way to generate a large string value for this, but I might misunderstand howLGBM_BoosterGetEvalNames
works. Opened [docs] what should LGBM_BoosterGetEvalNames be used for? #4264 with a question.LGBM_BoosterSaveModelToString_R()
:test_lgb.Booster.R
LGBM_BoosterDumpModel_R()
:test_lgb.Booster.R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I think we can do the same for Python wrapper.
LightGBM/python-package/lightgbm/basic.py
Lines 1896 to 1900 in f831808
LightGBM/python-package/lightgbm/basic.py
Lines 3273 to 3277 in f831808
LightGBM/python-package/lightgbm/basic.py
Lines 3472 to 3476 in f831808
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! But I think it should be a separate PR. I'd really like to focus on finishing #3016 soon.