-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add glog-like CHECK and LOG for k2. #114
Conversation
+1,i like this version, simple enough and easy to use like glog |
Looks OK to me.. will wait to hear from Meixu though. |
c48dbb9
to
29e9872
Compare
29e9872
to
3da00ee
Compare
k2/csrc/cuda/log_test.cu
Outdated
|
||
__global__ void DummyKernel(int *b, int a) { | ||
K2_LOG(INFO) << "In kernel"; | ||
K2_CHECK_LT(*b, a) << K2_KERNEL_DEBUG_STR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's better if CHECK
can print out K2_KERNEL_DEBUG_STR
automatically when called in kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csukuangfj Bear in mind that a lot of the time we write kernel code in lambda expressions that could get evaluated on the kernel or on the CPU.
Also I'd rather not have to do explicitly << K2_KERNEL_DEBUG_STR
every time, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When more threads are here, like this <<<2, 2>>>
, and to compare with printf, add a printf line below. The code used is:
__device__ void DummyKernel(int *b, int a) {
K2_LOG(INFO) << "In kernel";
K2_CHECK_EQ(*b, a) << K2_KERNEL_DEBUG_STR << " asdfsad " << a << " = a";
printf("[%d, %d, %d]: a=%d, b=%d\n", threadIdx.x, threadIdx.y, threadIdx.z, a, *b);
*b += 1;
K2_CHECK_EQ(*b, a) << K2_KERNEL_DEBUG_STR;
}
__global__ void EntryKernel(int *b, int a) {
DummyKernel(b, a);
}
.
.
.
EntryKernel<<<2, 2>>>(b, a);
That indicate overload the "<<" would make msg mess up with theads as each "<<" call printf one-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's better if CHECK can print out K2_KERNEL_DEBUG_STR automatically when called in kernel
Done.
There is also a DCHECK.
k2/csrc/cuda/log_test.cu
Outdated
K2_CHECK_EQ(ret, cudaSuccess) << "Failed to allocate memory"; | ||
|
||
ret = cudaMemcpy(b, &a, sizeof(a), cudaMemcpyHostToDevice); | ||
K2_CHECK_EQ(ret, cudaSuccess) << "Failed to copy memory to gpu"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to define macro K2_CHECK_CUDA_ERROR
for such usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... currently, K2_CHECK_CUDA_ERROR is for checking cudaError_t return codes, which I think makes sense..
I'm OK in principle with having different macro names for usage inside kernels (or lambdas) but not sure if it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very simple to support K2_CHECK_CUDA_ERROR
:
#define K2_CHECK_CUDA_ERROR(x) K2_CHECK_EQ(x, cudaSuccess)
Not sure if we need another macro.
BTW, I find that the macro name K2_CHECK_CUDA_ERROR
is too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K2_CHECK_CUDA_ERROR does more than that, it prints out the CUDA error
code's name. (Or if it doesn't, it should).
Done.
K2_CHECK_CUDA_ERROR does more than that, it prints out the CUDA error
code's name. (Or if it doesn't, it should).
…On Fri, Sep 4, 2020 at 11:02 AM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/cuda/log_test.cu
<https://github.com/danpovey/k2/pull/114#discussion_r483358900>:
> +__global__ void DummyKernel(int *b, int a) {
+ K2_LOG(INFO) << "In kernel";
+ K2_CHECK_LT(*b, a) << K2_KERNEL_DEBUG_STR;
+ *b += 1;
+ K2_CHECK_EQ(*b, a) << K2_KERNEL_DEBUG_STR;
+}
+
+TEST(Log, Cuda) {
+ K2_LOG(INFO) << "Test log for cuda";
+ int a = 10;
+ int *b;
+ auto ret = cudaMalloc(&b, sizeof(a));
+ K2_CHECK_EQ(ret, cudaSuccess) << "Failed to allocate memory";
+
+ ret = cudaMemcpy(b, &a, sizeof(a), cudaMemcpyHostToDevice);
+ K2_CHECK_EQ(ret, cudaSuccess) << "Failed to copy memory to gpu";
It's very simple to support K2_CHECK_CUDA_ERROR:
#define K2_CHECK_CUDA_ERROR(x) K2_CHECK_EQ(x, cudaSuccess)
Not sure if we need another macro.
BTW, I find that the macro name K2_CHECK_CUDA_ERROR is too long.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/114#discussion_r483358900>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4LUSBQUXREXS5CTETSEBKENANCNFSM4QUZCQCQ>
.
|
I'll fix all the comments this night. |
} | ||
} | ||
|
||
__host__ __device__ const Logger &operator<<(const char *s) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK much about cuda, but does it work on host&device to overload "<<" like this? I haven't tried to compile and test this. But I think if it works, Nvidia would not just provide a rewritten printf
but the stream connected one, b/c everyone like the stream format (except its formatting part).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@songmeixu I think it's OK. Full C++ is supported on the device, including c++14, it's just the standard C library
which is not fully supported, so the operator itself is not the issue.
I am still open to using glog but you need to explain to us your previous comments, better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my below comments @danpovey≈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know why it's not working now. As the printf that rewritten by nvidia must have some thread guard for the stream. And that's also the reason why we shouldn't log things in cuda, it's performance killer. Just print msg if error happens is OK, that's why I put K2_MAKE_ERROR_CHECK
to silence some msg from cuda in debug.h, @danpovey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK much about cuda, but does it work on host&device to overload "<<" like this?
I have never opened a pullrequest without making my code pass the local unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test would say nothing for right case..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see the "<<" problem is solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test would say nothing for right case..
I've added some death tests. You can see what the error messages look like from the logs in this colab notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see the "<<" problem is solved.
Normally please avoid printing messages in the kernel. The race condition happens
only in case of a check error with multiple threads without any synchronization.
The check error will abort the program. So I don't think the issue of <<
really matters.
Guys, after talking more with Meixu, I think there may not really be a problem with glog. I think it might be easiest to just use glog for CPU-only code and our own K2_ macros for code that might be on GPU (i.e. in lambdas or kernels). |
Please wait minutes for me to explain it clearly. |
OK. For CPU-only log, should we use marcos from glog directly (e.g. |
Okay, have sorted my thoughts. case-1: K2 uses glog, upper framework doesn't. |
@songmeixu we can always provide a method to supply flags to glog, e.g. by exposing some Python API when we wrap the code. That's fine as far as I am concerned. |
The solution doesn't have to be perfect but it has to be done, and very soon. @songmeixu do you have a proposal that you could finish today? |
case-2: K2 use glog, upper FW use too, but it's have candidate which is a glog-like one. (As pytorch and many others do). If the upper FW choose to use glog-like one, then the glog lib that k2 used should be provided by who? |
I can do it today. But in the way i think right. As I have pushed a PR about it, but didn't get understand. I could make it clear through usage examples and tests today. |
I can see that the linking etc. might get complicated. |
I guess for @csukuangfj 's version, he may provide some configuration to let user provide file name to log. I'm just thinking, as @songmeixu said, using third-party log libraries (e.g. |
OK fine, I agree, making our own logging/checking macros will be simpler
and less prone to error.
Meixu, do you think you could finish something today?
…On Fri, Sep 4, 2020 at 2:55 PM Haowen Qiu ***@***.***> wrote:
I guess for @csukuangfj <https://github.com/csukuangfj> 's version, he
may provide some configuration to let user provide file name to log. I'm
just thinking, as @songmeixu <https://github.com/songmeixu> said, using
third-party log libraries (e.g. glog) will lead to some problems
(especially for case-2), then using glog together with k2 will double the
problems? that's why I now prefer to only use our own log systems, no glog,
no other 3rd party log libraries.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/114#issuecomment-686953595>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO5WGG4TIAC7MYWGD4DSECFNJANCNFSM4QUZCQCQ>
.
|
It's not about the custom logger could or not, it's about it shouldn't. |
For this PR, firstly I want to compile it and pass some tests. As I doubt it works with the stream format. |
case-3: K2 use custom logger like here, upper FW don't use the same obviously. |
IMO we don't really need to configure the logging messages. If it works,
it works; if it crashes, it crashes. We don't need to overthink this right
now.
…On Fri, Sep 4, 2020 at 3:01 PM Meixu Song ***@***.***> wrote:
case-3: K2 use custom logger like here, upper FW don't use the same
obviously.
How cloud the user to config K2's logging msg?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/114#issuecomment-686956365>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2IXM2IJHWMF3TKUY3SECGEJANCNFSM4QUZCQCQ>
.
|
If we just create our ow log systems (no any dependencies on glog), we can just expose API to let user configure (e.g. file name to log, log level) our log systems. RE stream format, I guess it works, as it seems @csukuangfj has written some kernel code to test. |
case4: K2 use glog, upper FW use glog, and as case-2, it has a glog-like candidate. Now, the k2 own its glog-like too. And K2 don't provide the glog lib, just include the header. |
Sure, for asserts. But we should have a macro Also, I don't think we should be using assert at all in macros like K2_CHECK_EQ. I want those macros to print the values involved, like glog and loguru do-- at least on CPU. |
OK, so in that case the program would die next time we call
K2_CHECK_CUDA_ERROR.
…On Sun, Sep 6, 2020 at 5:06 PM Meixu Song ***@***.***> wrote:
And actully assert just return a msg and cudaErrorAssert, no abort would
be caused. I guess a man may not search assert failed msg of logs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/114#issuecomment-687732891>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2JNIHLUNFYHMDSZ2LSENGIVANCNFSM4QUZCQCQ>
.
|
So, one call K2_PARANOID_ASSERT, must check outside.. And K2_CHECK_CUDA_ERROR say nothing in NDEBUG.. Hi @danpovey, I have enough confidence about this logging thing, as I investigated a lot. But I still check thoughts here actually wrong, I guess it's better ignore it instead of discuss, there is no end to let everyone understand why to do like this. I would just refine it and make PR for review. I also don't wanna you put energy at edge functions. I really give enough truth about why this not that. And nothing from guys argue with me... |
Isn't that enough? How can you let a thread running on GPU to kill the process running on the CPU? |
Ok, as you ask, it abort for host assert. |
Yes, exactly.
It actually does. Please see the log message from the colab notebook:
|
You means each call should be look like this, then no
|
sorry, I have a different opinion with you on this point. |
Meixu: K2_CHECK_CUDA_ERROR does check the code even if NDEBUG is defined, I asked for this. |
About this, I am same as most coders write .cu around github. |
Meixu: we will be checking the error status each time, even if NDEBUG is defined (but not synchronizing if NDEBUG is defined). Don't argue about this, it is my decision. |
I think I have posted in the above:
Not sure if you have ever tried to compile and run the code or tried to see the log message from the above notebook. |
Yes I see it works, and does address my problems with the existing stuff. Is there any way you can fix the conflicts? I may merge. |
I know it's ok to compile, and I do test that get failed, which with results was post above. Do you answer the problem of "<<" . thread == 0 is not an answer. |
I think it's better make things clear. In case that we can't work together.. I can't watch error happened in face and I do question it, while my thing was said error, with what??? |
I agree the << thing is not 100% ideal, but I think this will happen fairly
rarely and it will be possible to figure out what happened by looking
carefully at the log.
…On Sun, Sep 6, 2020 at 5:47 PM Meixu Song ***@***.***> wrote:
I know it's ok to compile, and I do test that get failed, which with
results was post above. Do you answer the problem of "<<" . thread == 0 is
not an answer.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/114#issuecomment-687741824>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2MCJR3VIJACRC64I3SENLBLANCNFSM4QUZCQCQ>
.
|
Meixu, making threats like that is not the way to work with others. Why don't you take a break and we'll talk when your mood is better? |
If just care about one thread, is't enough for the kernel checking? It's not the same status for each thread as I learnd from many sources. |
I don't waste my time on these too. The worst thing is this, do it once, explain how much time? |
Yes, we can't do if (threadIdx.x == 0) { ... }. We may just have to accept multiple threads printing the same thing, hopefully this won't happen too much. |
... an internal implementation with printf would probably be more ideal in terms of not getting lines mixed up. |
Ok, after this, I guess my energy is getting fire. To kill the "<<" talk, consider about multiple kernel running as the same time. When I face a problem, I do learn from others, as I dare not to argue without learning enough. |
Sure, it's not ideal, a printf-based implementation would be better but I
think it can be done later on, while keeping the interfaces the same.
I like that the interface is consistent with glog. We could even use the
same macros for CPU debugging.
The part of the message that's printed by explicitly calling << would have
to be printed as a separate string,
but that may not happen so often, and I don't think the mixing of log
messages is so terrible. We could perhaps later create
a global GPU memory address which we set to nonzero on error, and use
appropriate memory fences to ensure only one
thread prints the error message.
…On Sun, Sep 6, 2020 at 6:11 PM Meixu Song ***@***.***> wrote:
Ok, after this, I guess my energy is getting fire. To kill the "<<" talk,
consider about multiple kernel running as the same time. When I face a
problem, I do learn from others, as I dare not to argue without learning
enough.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/114#issuecomment-687746966>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4UMXS2J5HBEPKHBX3SENN63ANCNFSM4QUZCQCQ>
.
|
Is there a logger works fine on both, I think no. As the device gives the dead end. The most conventional way for both. It's raises an exception, which gets handled at host. But we don't want that heavy. This way is pytorch/tf adopted. As to third-party, they just check the API for user. |
Sorry, I was busy.
I guess As for the issue |
Actually I think it is not so helpful to print out log message in the kernel, may be we can even just use something like cuda |
Haowen:
I think we can just keep the current interface but change the
implementation if needed so that in the kernel, the assertion values
are printed out using the same printf statement as everything else. That's
compatible with allowing the "<<" notation after the
K2_ASSERT invocation; if the << is not used, it will still be 1 statement.
…On Mon, Sep 7, 2020 at 9:29 AM Haowen Qiu ***@***.***> wrote:
Actually I think it is not so helpful to print out log message in the
kernel, may be we can even just use something like cuda assert , it will
also print message like function_name, blockID, thread id, assertion
values, that should be good enough for our requirement. BTW, IIRC,
@songmeixu <https://github.com/songmeixu> 's version K2_CHECK just call
assert so it's ok for this usage, of course we need to rename the macro
with a new name to avoid name conflict, e.g. K2_CUDA_CHECK? or just use
assert?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/114#issuecomment-687967121>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO3YJCB7A3K25TXD35LSEQZP7ANCNFSM4QUZCQCQ>
.
|
OK, got it. |
Guys, please keep the logging module simple. If glog is excluded from k2, please avoid introducing any other third party logging modules.
I just wrote a simple glog-like CHECK and LOG for k2. If you find it useful and more readable, I'll continue to add support
for debug check and log.