Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add glog-like CHECK and LOG for k2. #114

Merged
merged 8 commits into from
Sep 6, 2020

Conversation

csukuangfj
Copy link
Collaborator

Guys, please keep the logging module simple. If glog is excluded from k2, please avoid introducing any other third party logging modules.

I just wrote a simple glog-like CHECK and LOG for k2. If you find it useful and more readable, I'll continue to add support
for debug check and log.

@qindazhu
Copy link
Collaborator

qindazhu commented Sep 3, 2020

+1,i like this version, simple enough and easy to use like glog

@danpovey
Copy link
Collaborator

danpovey commented Sep 3, 2020

Looks OK to me.. will wait to hear from Meixu though.


__global__ void DummyKernel(int *b, int a) {
K2_LOG(INFO) << "In kernel";
K2_CHECK_LT(*b, a) << K2_KERNEL_DEBUG_STR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better if CHECK can print out K2_KERNEL_DEBUG_STR automatically when called in kernel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csukuangfj Bear in mind that a lot of the time we write kernel code in lambda expressions that could get evaluated on the kernel or on the CPU.
Also I'd rather not have to do explicitly << K2_KERNEL_DEBUG_STR every time, if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When more threads are here, like this <<<2, 2>>>, and to compare with printf, add a printf line below. The code used is:

__device__ void DummyKernel(int *b, int a) {
  K2_LOG(INFO) << "In kernel";
  K2_CHECK_EQ(*b, a) << K2_KERNEL_DEBUG_STR << " asdfsad " << a << " = a";
  printf("[%d, %d, %d]: a=%d, b=%d\n", threadIdx.x, threadIdx.y, threadIdx.z, a, *b);
  *b += 1;
  K2_CHECK_EQ(*b, a) << K2_KERNEL_DEBUG_STR;
}

__global__ void EntryKernel(int *b, int a) {
  DummyKernel(b, a);
}
.
.
.
  EntryKernel<<<2, 2>>>(b, a);

I got the results below:
image

That indicate overload the "<<" would make msg mess up with theads as each "<<" call printf one-time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better if CHECK can print out K2_KERNEL_DEBUG_STR automatically when called in kernel

Done.

There is also a DCHECK.

K2_CHECK_EQ(ret, cudaSuccess) << "Failed to allocate memory";

ret = cudaMemcpy(b, &a, sizeof(a), cudaMemcpyHostToDevice);
K2_CHECK_EQ(ret, cudaSuccess) << "Failed to copy memory to gpu";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to define macro K2_CHECK_CUDA_ERROR for such usages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... currently, K2_CHECK_CUDA_ERROR is for checking cudaError_t return codes, which I think makes sense..
I'm OK in principle with having different macro names for usage inside kernels (or lambdas) but not sure if it's necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very simple to support K2_CHECK_CUDA_ERROR:

#define K2_CHECK_CUDA_ERROR(x) K2_CHECK_EQ(x, cudaSuccess)

Not sure if we need another macro.

BTW, I find that the macro name K2_CHECK_CUDA_ERROR is too long.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K2_CHECK_CUDA_ERROR does more than that, it prints out the CUDA error
code's name. (Or if it doesn't, it should).

Done.

@danpovey
Copy link
Collaborator

danpovey commented Sep 4, 2020 via email

@csukuangfj
Copy link
Collaborator Author

I'll fix all the comments this night.

megazone87
megazone87 previously approved these changes Sep 4, 2020
}
}

__host__ __device__ const Logger &operator<<(const char *s) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK much about cuda, but does it work on host&device to overload "<<" like this? I haven't tried to compile and test this. But I think if it works, Nvidia would not just provide a rewritten printf but the stream connected one, b/c everyone like the stream format (except its formatting part).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@songmeixu I think it's OK. Full C++ is supported on the device, including c++14, it's just the standard C library
which is not fully supported, so the operator itself is not the issue.

I am still open to using glog but you need to explain to us your previous comments, better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my below comments @danpovey

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know why it's not working now. As the printf that rewritten by nvidia must have some thread guard for the stream. And that's also the reason why we shouldn't log things in cuda, it's performance killer. Just print msg if error happens is OK, that's why I put K2_MAKE_ERROR_CHECK to silence some msg from cuda in debug.h, @danpovey.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK much about cuda, but does it work on host&device to overload "<<" like this?

I have never opened a pullrequest without making my code pass the local unit tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test would say nothing for right case..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see the "<<" problem is solved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test would say nothing for right case..

I've added some death tests. You can see what the error messages look like from the logs in this colab notebook

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see the "<<" problem is solved.

Normally please avoid printing messages in the kernel. The race condition happens
only in case of a check error with multiple threads without any synchronization.

The check error will abort the program. So I don't think the issue of << really matters.

k2/csrc/cuda/log_test.cu Show resolved Hide resolved
@megazone87 megazone87 self-requested a review September 4, 2020 05:54
@megazone87 megazone87 dismissed their stale review September 4, 2020 05:57

error manipulation

@danpovey
Copy link
Collaborator

danpovey commented Sep 4, 2020

Guys, after talking more with Meixu, I think there may not really be a problem with glog. I think it might be easiest to just use glog for CPU-only code and our own K2_ macros for code that might be on GPU (i.e. in lambdas or kernels).

@megazone87
Copy link
Collaborator

Please wait minutes for me to explain it clearly.

@qindazhu
Copy link
Collaborator

qindazhu commented Sep 4, 2020

I think it might be easiest to just use glog for CPU-only code and our own K2_ macros for code that might be on GPU (i.e. in lambdas or kernels).

OK. For CPU-only log, should we use marcos from glog directly (e.g. CHECK_LT) or use marocs K2_ which wraps glog marcos (e.g. K2_CHECK_LT)? I just feel that it's (sort of) in chaos to use two different macro names.

@megazone87
Copy link
Collaborator

Okay, have sorted my thoughts.

case-1: K2 uses glog, upper framework doesn't.
Now the user wants to use her own logging system, e.g. output to a file (which is different from the glog default), so how could he modify it so that the INFO, FATAL messages go to the corresponding file? The user first needs to know that k2 uses glog, but he may not right? And know also the way to init glog as he will, but the user has no plans to get involved with glog. what to do?

@danpovey
Copy link
Collaborator

danpovey commented Sep 4, 2020

@songmeixu we can always provide a method to supply flags to glog, e.g. by exposing some Python API when we wrap the code. That's fine as far as I am concerned.
And I agree with @qindazhu that it is a bit of a mess to have to use 2 different macro names. One possibility might be
for us to use #ifdefs to only override the glog names when compiling CUDA code. I was hoping that someone would solve these issues. @songmeixu do you have a proposal or plan?

@danpovey
Copy link
Collaborator

danpovey commented Sep 4, 2020

The solution doesn't have to be perfect but it has to be done, and very soon. @songmeixu do you have a proposal that you could finish today?

@megazone87
Copy link
Collaborator

megazone87 commented Sep 4, 2020

case-2: K2 use glog, upper FW use too, but it's have candidate which is a glog-like one. (As pytorch and many others do). If the upper FW choose to use glog-like one, then the glog lib that k2 used should be provided by who?
If K2 should provide a glog lib, fine, but now the user change mind to use glog again, then he provider another glog lib (They do so through a check about if glog is asked in config.), in a situation of compile or link, are there probably compatiblity or redefine problems?

@megazone87
Copy link
Collaborator

@songmeixu we can always provide a method to supply flags to glog, e.g. by exposing some Python API when we wrap the code. That's fine as far as I am concerned.
And I agree with @qindazhu that it is a bit of a mess to have to use 2 different macro names. One possibility might be
for us to use #ifdefs to only override the glog names when compiling CUDA code. I was hoping that someone would solve these issues. @songmeixu do you have a proposal or plan?

I can do it today. But in the way i think right. As I have pushed a PR about it, but didn't get understand. I could make it clear through usage examples and tests today.

@danpovey
Copy link
Collaborator

danpovey commented Sep 4, 2020

I can see that the linking etc. might get complicated.
My feeling is we should use lots of CHECK macros but not really use LOG except when debugging,
as it doesn't make sense and it's not clear when we'd want to log stuff.
I'm fine with defining CHECK macros and LOG macros (for debugging!) and have them just print to stderr and abort
if necessary.
Just a single set of macros, K2_XXX, would be fine with me. And not use glog.

@qindazhu
Copy link
Collaborator

qindazhu commented Sep 4, 2020

I guess for @csukuangfj 's version, he may provide some configuration to let user provide file name to log. I'm just thinking, as @songmeixu said, using third-party log libraries (e.g. glog) will lead to some problems (especially for case-2), then using glog together with k2 will double the problems? that's why I now prefer to only use our own log systems, no glog, no other 3rd party log libraries.

@danpovey
Copy link
Collaborator

danpovey commented Sep 4, 2020 via email

@megazone87
Copy link
Collaborator

I guess for @csukuangfj 's version, he may provide some configuration to let user provide file name to log. I'm just thinking, as @songmeixu said, using third-party log libraries (e.g. glog) will lead to some problems (especially for case-2), then using glog together with k2 will double the problems? that's why I now prefer to only use our own log systems, no glog, no other 3rd party log libraries.

It's not about the custom logger could or not, it's about it shouldn't.

@megazone87
Copy link
Collaborator

For this PR, firstly I want to compile it and pass some tests. As I doubt it works with the stream format.

@megazone87
Copy link
Collaborator

case-3: K2 use custom logger like here, upper FW don't use the same obviously.
How cloud the user to config K2's logging msg?

@danpovey
Copy link
Collaborator

danpovey commented Sep 4, 2020 via email

@qindazhu
Copy link
Collaborator

qindazhu commented Sep 4, 2020

case-3: K2 use custom logger like here, upper FW don't use the same obviously.
How cloud the user to config K2's logging msg?

If we just create our ow log systems (no any dependencies on glog), we can just expose API to let user configure (e.g. file name to log, log level) our log systems.

RE stream format, I guess it works, as it seems @csukuangfj has written some kernel code to test.

@megazone87
Copy link
Collaborator

megazone87 commented Sep 4, 2020

case4: K2 use glog, upper FW use glog, and as case-2, it has a glog-like candidate. Now, the k2 own its glog-like too. And K2 don't provide the glog lib, just include the header.
In K2, through a macro define check (that glog make), if glog is presented in FW, fine we use it, if not, the FW still need follow the glog-like way, so do we, we have our glog-like one.
Please try to attack this situation now. Do the above problems solved?

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020

Thus it's fine just use assert with K2_PARANOID guarded. Why consider the __assertfail which is called inside of !NDEBUG part of assert?

Sure, for asserts. But we should have a macro K2_FATAL_ERROR(format, ....) which will die regardless of NDEBUG.

Also, I don't think we should be using assert at all in macros like K2_CHECK_EQ. I want those macros to print the values involved, like glog and loguru do-- at least on CPU.

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020 via email

@megazone87
Copy link
Collaborator

So, one call K2_PARANOID_ASSERT, must check outside.. And K2_CHECK_CUDA_ERROR say nothing in NDEBUG..

Hi @danpovey, I have enough confidence about this logging thing, as I investigated a lot. But I still check thoughts here actually wrong, I guess it's better ignore it instead of discuss, there is no end to let everyone understand why to do like this. I would just refine it and make PR for review. I also don't wanna you put energy at edge functions.

I really give enough truth about why this not that. And nothing from guys argue with me...

@csukuangfj
Copy link
Collaborator Author

And actually assert just return a msg and cudaErrorAssert, no abort would be caused

Isn't that enough? How can you let a thread running on GPU to kill the process running on the CPU?
The programmer has to check the return value of every call, if any. Not sure if you have ever checked the return value of system calls while doing system programming.

@megazone87
Copy link
Collaborator

megazone87 commented Sep 6, 2020

And actually assert just return a msg and cudaErrorAssert, no abort would be caused

Isn't that enough? How can you let a thread running on GPU to kill the process running on the CPU?
The programmer has to check the return value of every call, if any. Not sure if you have ever checked the return value of system calls while doing system programming.

Ok, as you ask, it abort for host assert.

@csukuangfj
Copy link
Collaborator Author

OK, so in that case the program would die next time we call
K2_CHECK_CUDA_ERROR.

Yes, exactly.


I want those macros to print the values involved, like glog and loguru do-- at least on CPU.

It actually does. Please see the log message from the colab notebook:

Running main() from /content/k2/build_debug/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 3 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from LogDeathTest
[ RUN      ] LogDeathTest.NegativeCases
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:69 This will crash the program
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:74 Check failed: a == b (10 vs. 20) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:75 Check failed: a != c (10 vs. 10) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:77 Check failed: b <= a (20 vs. 10) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:78 Check failed: b < a (20 vs. 10) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:80 Check failed: a >= b (10 vs. 20) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:81 Check failed: a > b (10 vs. 20) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:84 Check failed: ret == cudaSuccess (2 vs. 0)  Error: out of memory. 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:87 Check failed: ret == cudaSuccess (710 vs. 0)  Error: device-side assert triggered. 
[I] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:97 Check for debug build
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:98 This will crash the program
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:100 Check failed: a == b (10 vs. 20) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:101 Check failed: a != c (10 vs. 10) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:103 Check failed: b <= a (20 vs. 10) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:104 Check failed: b < a (20 vs. 10) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:106 Check failed: a >= b (10 vs. 20) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:107 Check failed: a > b (10 vs. 20) 
[F] /content/k2/k2/csrc/cuda/log_test.cu:TestBody:110 Check failed: ret == cudaSuccess (3 vs. 0)  Error: initialization error. 

@megazone87
Copy link
Collaborator

You means each call should be look like this, then no

K2_CUDA_SAFE_CALL(a)

@csukuangfj
Copy link
Collaborator Author

You means each call should be look like this, then no

sorry, I have a different opinion with you on this point.

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020

So, one call K2_PARANOID_ASSERT, must check outside.. And K2_CHECK_CUDA_ERROR say nothing in NDEBUG..

Hi @danpovey, I have enough confidence about this logging thing, as I investigated a lot. But I still check thoughts here actually wrong, I guess it's better ignore it instead of discuss, there is no end to let everyone understand why to do like this. I would just refine it and make PR for review. I also don't wanna you put energy at edge functions.

I really give enough truth about why this not that. And nothing from guys argue with me...

Meixu: K2_CHECK_CUDA_ERROR does check the code even if NDEBUG is defined, I asked for this.

@megazone87
Copy link
Collaborator

About this, I am same as most coders write .cu around github.

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020

Meixu: we will be checking the error status each time, even if NDEBUG is defined (but not synchronizing if NDEBUG is defined). Don't argue about this, it is my decision.

@csukuangfj
Copy link
Collaborator Author

So, one call K2_PARANOID_ASSERT, must check outside.. And K2_CHECK_CUDA_ERROR say nothing in NDEBUG..

I think I have posted in the above:

Guys, if you do not want or do not have time to compile the code, please refer to the colab notebook that I just wrote
for this pullrequest. You can see from the log that the code compiles and runs.

Not sure if you have ever tried to compile and run the code or tried to see the log message from the above notebook.
K2_CHECK_CUDA_ERROR actually always prints something in case of failure.

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020

Yes I see it works, and does address my problems with the existing stuff. Is there any way you can fix the conflicts? I may merge.

@megazone87
Copy link
Collaborator

I know it's ok to compile, and I do test that get failed, which with results was post above. Do you answer the problem of "<<" . thread == 0 is not an answer.

@megazone87
Copy link
Collaborator

I think it's better make things clear. In case that we can't work together.. I can't watch error happened in face and I do question it, while my thing was said error, with what???

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020

Meixu, making threats like that is not the way to work with others. Why don't you take a break and we'll talk when your mood is better?

@megazone87
Copy link
Collaborator

If just care about one thread, is't enough for the kernel checking? It's not the same status for each thread as I learnd from many sources.

@megazone87
Copy link
Collaborator

I don't waste my time on these too. The worst thing is this, do it once, explain how much time?

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020

Yes, we can't do if (threadIdx.x == 0) { ... }. We may just have to accept multiple threads printing the same thing, hopefully this won't happen too much.

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020

... an internal implementation with printf would probably be more ideal in terms of not getting lines mixed up.
But Meixu, I asked you many times to make things like K2_CHECK_EQ print the values, but it never happened.
At least this PR is ready!

@megazone87
Copy link
Collaborator

Ok, after this, I guess my energy is getting fire. To kill the "<<" talk, consider about multiple kernel running as the same time. When I face a problem, I do learn from others, as I dare not to argue without learning enough.

@danpovey
Copy link
Collaborator

danpovey commented Sep 6, 2020 via email

@megazone87
Copy link
Collaborator

Is there a logger works fine on both, I think no. As the device gives the dead end. The most conventional way for both. It's raises an exception, which gets handled at host. But we don't want that heavy. This way is pytorch/tf adopted. As to third-party, they just check the API for user.

@danpovey danpovey merged commit 45187b6 into k2-fsa:master Sep 6, 2020
@csukuangfj
Copy link
Collaborator Author

Is there any way you can fix the conflicts? I may merge.

Sorry, I was busy.


Also IMO it shouldn't be necessary to ever evaluate args more than once, because we can use auto, no,
to make a variable?

I guess auto will not work for the current implementation. We can invoke a function to avoid the second evaluation
in the operator << in case of check failure. If this is really needed, I can create another pullreqeust to address it.

As for the issue << in GPU, to be honest, I don't know how to avoid the output from different threads
being interleaved in the stdout without any synchronization. But this happens only in the case of check failure,
after which the program aborts. Someone may use LOG(xxx) in the kernel, but I think this can be avoided through
code review.

@qindazhu
Copy link
Collaborator

qindazhu commented Sep 7, 2020

Actually I think it is not so helpful to print out log message in the kernel, may be we can even just use something like cuda assert , it will also print message like function_name, blockID, thread id, assertion values, that should be good enough for our requirement. BTW, IIRC, @songmeixu 's version K2_CHECK just call assert so it's ok for this usage, of course we need to rename the macro with a new name to avoid name conflict, e.g. K2_CUDA_CHECK? or just use assert?

@danpovey
Copy link
Collaborator

danpovey commented Sep 7, 2020 via email

@qindazhu
Copy link
Collaborator

qindazhu commented Sep 7, 2020

OK, got it.

@csukuangfj csukuangfj deleted the fangjun-logger branch September 14, 2020 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants