Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Move error handling into C++ side #4163

Merged
merged 15 commits into from
Apr 22, 2021

Conversation

jameslamb
Copy link
Collaborator

Short description

#4155 fixes a critical issue facing Windows users of the R package, but to implement it the R package needs to switch from using wrapper function lgb.call() to directly using the builtin .Call(). This PR is a first step in that refactoring.

Longer description

Today, the interaction between R code in {lightgbm} and C++ code in lib_lightgbm works like this:

  1. R initializes an integer call_state
  2. R calls C++ code with .Call()
  3. On the C++ side, if anything goes wrong then call_state is updated to a non-0 value, and an error message is stored in the place that LGBM_GetLastError() can find it
  4. When control returns to R, it checks the value of call_state. If it is non-0, R calls lgb.last_error(), which calls C++ function LGBM_GetLastError(), then raises an R exception using stop().

This back-and-forth is the reason that almost all of {lightgbm}'s calls to C++ code use an internal wrapper function lgb.call().

lgb.call <- function(fun_name, ret, ...) {
# Set call state to a zero value
call_state <- 0L
# Check for a ret call
if (!is.null(ret)) {
call_state <- .Call(
fun_name
, ...
, ret
, call_state
, PACKAGE = "lib_lightgbm"
)
} else {
call_state <- .Call(
fun_name
, ...
, call_state
, PACKAGE = "lib_lightgbm"
)
}
call_state <- as.integer(call_state)
# Check for call state value post call
if (call_state != 0L) {
lgb.last_error()
}
return(ret)
}

It seems that the use of that function (not calling .Call() directly with a name that matches a symbol exported from the library) is not allowed if you use R's native routine registration, which is blocking the bugfix in #4155 (see that PR's description for more background).

This is a first step towards factoring out lgb.call(), as described in #4155 (comment). This PR proposes just raising errors directly from the C++ side. Using Rf_error() from R, those still become regular R exceptions. So this change does not have any user-facing impact. I've added a unit test to make that expectation explicit.

If call_state isn't necessary for raising errors, why doesn't this PR just remove it from all the interfaces defined in LightGBM_R.h?

I did spent a while trying this, but unfortunately it can result in this warning every time code .Call() is used:

converting NULL pointer to R NULL

This comes from Rinternals.h, at https://github.com/wch/r-source/blob/d22ee2fc0dc8142b23eed9f46edf76ea9d3ca69a/src/main/dotcode.c#L538-L541. I think those now-unnecessary uses of call_state can be cleaned up as part of #3016.

@shiyu1994
Copy link
Collaborator

Just wondering about the next step after this PR. Does that mean we need to replace every lgb.call with .Call wrapped by tryCatch?

@jameslamb
Copy link
Collaborator Author

Just wondering about the next step after this PR. Does that mean we need to replace every lgb.call with .Call wrapped by tryCatch?

Next, every lgb.call() would be replaced by .Call() (in #4155). But it won't be necessary to introduce any new tryCatch() stuff in the R package, since as of this PR that logic moves to the C++ side.

@shiyu1994
Copy link
Collaborator

Then I have no question about this PR. But it seems that checks on macOS failed.

@jameslamb
Copy link
Collaborator Author

OpenMP was not found, and should be when testing the CRAN package on macOS

Looks like this was similar to the issue that briefly blocked the most recent release (#4169 (comment)).

https://github.com/microsoft/LightGBM/pull/4163/checks?check_run_id=2325957398

==> Downloading https://homebrew.bintray.com/bottles/libomp-11.1.0.catalina.bottle.tar.gz
curl: (22) The requested URL returned error: 403 Forbidden
Error: Failed to download resource "libomp"

I'll merge in the latest master to this and re-run the jobs. If we're going to re-run jobs anyway, might as well also update the PR to the latest master!

If you're ok with the change @shiyu1994 , could you approve the PR?

@jameslamb
Copy link
Collaborator Author

jameslamb commented Apr 14, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/748772010

Status: failure ❌.

@jameslamb
Copy link
Collaborator Author

Hey cool example of failing valgrind tests blocking a merge! Nice job @StrikerRUS !

image

I'll fix this tonight.

@jameslamb
Copy link
Collaborator Author

Ok I just pushed a change that, based on running valgrind locally, I think will test this PR without triggering any issues from valgrind.

I originally wanted to test that errors are raised correctly by constructing a Dataset where the size of data and label don't match. But that results in this valgrind finding.

==3577== 40 bytes in 1 blocks are definitely lost in loss record 46 of 2,254
==3577==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3577==    by 0x1525F509: __gnu_cxx::new_allocator<float>::allocate(unsigned long, void const*) (new_allocator.h:115)
==3577==    by 0x1525D31B: std::allocator_traits<std::allocator<float> >::allocate(std::allocator<float>&, unsigned long) (alloc_traits.h:460)
==3577==    by 0x15259D57: std::_Vector_base<float, std::allocator<float> >::_M_allocate(unsigned long) (stl_vector.h:346)
==3577==    by 0x15259EDE: std::_Vector_base<float, std::allocator<float> >::_M_create_storage(unsigned long) (stl_vector.h:361)
==3577==    by 0x1525506E: std::_Vector_base<float, std::allocator<float> >::_Vector_base(unsigned long, std::allocator<float> const&) (stl_vector.h:305)
==3577==    by 0x1536F8E2: std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&) (stl_vector.h:511)
==3577==    by 0x155BE08F: LGBM_DatasetSetField_R (lightgbm_R.cpp:224)
==3577==    by 0x49422DE: R_doDotCall (dotcode.c:610)
==3577==    by 0x494D3CA: do_dotcall (dotcode.c:1281)
==3577==    by 0x49A07AE: bcEval (eval.c:7072)
==3577==    by 0x498C38F: Rf_eval (eval.c:727)

I think that has something to do with how this vector is filled:

std::vector<float> vec(len);
#pragma omp parallel for schedule(static, 512) if (len >= 1024)
for (int i = 0; i < len; ++i) {
vec[i] = static_cast<float>(R_REAL_PTR(field_data)[i]);
}

I suspect that that issue might be fixed as part of #3016. Either way, it's not directly related to this PR.

I changed the unit test in this PR to use an init_score of the wrong size, to use this branch of code that doesn't need to allocate a new vector:

CHECK_CALL(LGBM_DatasetSetField(R_GET_PTR(handle), name, R_REAL_PTR(field_data), len, C_API_DTYPE_FLOAT64));
.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Apr 17, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/757732931

Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

Sorry, realized I still had the label "in progress" on this! This is ready for review.

@jameslamb
Copy link
Collaborator Author

@shiyu1994 sorry to bother you, could you look at this PR again and approve it if you agree with the changes?

Some of the discussion from #4207 and #4216 are giving new urgency to #3016, and I'd like to handle this PR and #4155 as steps towards that.

@shiyu1994
Copy link
Collaborator

@jameslamb Sorry for the delay. I've checked and already approved.

@jameslamb
Copy link
Collaborator Author

No problem, thanks very much for the help!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants