Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a logging alternative #80

Closed
wants to merge 8 commits into from
Closed

add a logging alternative #80

wants to merge 8 commits into from

Conversation

megazone87
Copy link
Collaborator

Add a lightweight alternative logger, that make it be able to use "glog-same" API on platforms that glog may be unavailable and too heavy.

@megazone87 megazone87 mentioned this pull request Aug 3, 2020
enable_testing()

list(APPEND CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/cmake)
include(cpplint)
if(K2_USE_GLOG)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(K2_USE_GLOG)
  include(glog)
endif()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

@@ -1,2 +1,3 @@
add_subdirectory(util)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put it into csrc. No need to create a new directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed RE no need for new directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap.

@@ -11,7 +11,7 @@
#include <utility>
#include <vector>

#include "glog/logging.h"
#include "k2/util/Logging.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow the code style and sort the headers
in alphabetic order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All filenames should be in lowercase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what may differ with logging.* in glog. But on case-insensitive platforms (windows), it's useless. Thus I changed to lowercase.

Changed to the alphabetic order.

@@ -0,0 +1,57 @@
// $FILE_PATHNAME $HEADER_FILENAME $HEADER_PATHNAME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please replace this line with the one used by other k2 files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#include "k2/util/Logging.h"

#ifdef K2_USE_GLOG
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use glog directly? K2 is a training framework and I don't think
mobile devices have the power for training. As far as I know, glog
is supported on Linux, macOS and Windows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, k2 would be used in inference as well.
Of course that will require some things to be worked on!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mobile is one thing, the other more important reason is about glog itself actually. After investigations, I found glog is not beautiful as I thought. Even the logging system itself is a trade-off/confusing concept, that would easily be noisy, expose too much global flags/macro, takes too much responsibility (exception/error handling). So in conclusion, I found there isn't the best practice. However, glog with one minimal glog-like logging is preferred by many great projects. A minimal glog could be very simple to just support minimal functions of glog. And loguru turn out to be the expected one. So, with loguru, the two-logger thought is implemented as this PR.

#else // !K2_USE_GLOG
namespace k2 {
bool InitK2Logging(int *argc, char **argv) {
// When doing InitCaffeLogging, we will assume that caffe's flag parser has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the comment contain caffe?
Is the code copied from caffe? Please add some comment
in the code to indicate where it is from.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the argc,argv stuff may not be necessary as I don't anticipate using k2 much in command line programs.

Copy link
Collaborator Author

@megazone87 megazone87 Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, let me explain this. Before coding, I investigated pytorch, tf, bazel, and other projects that make use of glog, and many alternative logging methods. To find one best practice for logging, some common thought is used here. I try to keep minimal and non-specific to some project codes here. As here, it declares an internal function of glog, which is a common hack as it said, so I leave it as it is. However, this comment shouldn't be here. As the loguru here does parse the arg. The comment is removed now.


To @danpovey :
This function is necessary to init both loggers. At least, to configure the log to stdout or logfile. So I prefer to keep it right now and make the changes we need later.

if (*argc == 0)
return true;

loguru::init(*argc, argv);
Copy link
Collaborator

@csukuangfj csukuangfj Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name of functions should be in Uppercase. init->Init.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the name init is from loguru

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is from loguru.

@@ -0,0 +1,32 @@
//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first line of every file should be the path to the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,32 @@
//
// Created by songmeixu (songmeixu@outlook.com) on 2020/8/3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the header style of other files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@songmeixu And I think it's better if you declare your copyright under Xiaomi Corporation, as Dan and I did.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it's from my IDE old setting, and I forget it in some files. Changes have been made referring to the existing style.

@@ -0,0 +1,1875 @@
#ifndef _WIN32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indicate the origin of this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#include <algorithm>
#include <limits>
#include <stack>
#include <unordered_map>
#include <vector>

#include "glog/logging.h"
#include "k2/csrc/connect.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding header file (i.e. connect.h here) should be put at the top of .cc file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#ifdef K2_USE_GLOG
// Google glog's api does not have an external function that allows one to check
// if glog is initialized or not. It does have an internal function - so we are
// declaring it here. This is a hack but has been used by a bunch of others too.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE has been used by a bunch of others, it's better if we could list some famous project here with the corresponding code links.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020

Thanks! Hopefully @csukuangfj can review the changes when he has time.

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020

Cool... can you please also try to familiarize yourself with cub's own asserty things, CubDebug() and CubDebugExit()?
The tricky thing is that quite a lot of the code to run in CUDA will also be CPU code, so we need an assert
that can compile to something on the device and host. Perhaps that might mean creating a version of some of the
glob macros that can compile on device and that uses printf.
The compilation of this codebase is going to be a little unconventional, we'll put basically everything through nvcc,
which will pass most code through to the host compiler.
The reason is to enable lambdas that can execute on host and device. You'll see examples in compose.cc in my branch.

@megazone87
Copy link
Collaborator Author

Cool... can you please also try to familiarize yourself with cub's own asserty things, CubDebug() and CubDebugExit()?
The tricky thing is that quite a lot of the code to run in CUDA will also be CPU code, so we need an assert
that can compile to something on the device and host. Perhaps that might mean creating a version of some of the
glob macros that can compile on device and that uses printf.
The compilation of this codebase is going to be a little unconventional, we'll put basically everything through nvcc,
which will pass most code through to the host compiler.
The reason is to enable lambdas that can execute on host and device. You'll see examples in compose.cc in my branch.

I see. I have started to do it.

@@ -33,11 +35,16 @@ if(WIN32 AND BUILD_SHARED_LIBS)
set(BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)
endif()

# build options
option(K2_USE_GLOG "Flag that chooses logger from [GLOG, NON-GLOG (loguru)]" OFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this off by default? Since we install gtest by default, doesn't that already require glog?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@songmeixu
when to enable this option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OFF is set for test, and help figure out what we need for logging that is outside a minimal one.
The gtest doesn't depend on glog, the former is for unit test, while the latter is for logging. Though the implementations are similar and very simple for some cases (*_EQ), I think there is a design principle to isolate these two.

#include "k2/csrc/array.h"
#include "k2/csrc/util.h"
#include "k2/csrc/util/logging.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you were going to get rid of the directory util/?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm.. there is more code than I realized, now wondering whether we should keep util/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect more subdirs would be here, as k2 surely needs more function ((IDK) IO, bin ..) that should distribute more organized.

And I think the dlpack related codes under python/csrc/ may be better under csrc/dlpack as dlpack is not for pybind essentially (dlpack gives a way to transfer between data structures, as here we use it transfer k2.Tensor with torch.Tensor). Pytorch put it as from torch.utils.dlpack import to_dlpack in py API. I think we would need the class k2.Tensor from current python/csrc/tensor.h.

Anyway, I hold no opinion about the right project structure🤗. Changes happen fast. So, util/ is OK for now?

@@ -0,0 +1,33 @@
// k2/csrc/util/logging_is_google_glog.h
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filename is referred from pytorch, and I think it's clear (TF use default/logging.h (customized minimal glog) and google/build_config/logging.h(glog) to distinguish each other.).

@qindazhu
Copy link
Collaborator

qindazhu commented Aug 4, 2020

Oh, BTW, which branch should this PR be merged into? master or cuda, or both?

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020

Meixu, am I correct that to use this, one is supposed to use things like CHECK_EQ and so on? (That's what I seem to see in loguru.h, for instance). I'm concerned that a method that relies on macros without prefixes may easily clash with external codebases such as PyTorch; at some point we need a .cc file that #includes both torch and k2. Or is there some reason why this is not a problem?

Also: where is one supposed to go to see what the interface of the logging is? It would be nice if logging.h explained this clearly: I tried by it goes down a rabbit hole of conditional includes.

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020

Mm, it's up to you guys, but maybe master first and then merge it to cuda?

@@ -0,0 +1,1894 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest downloading these two files via cmake. You can refer to k2/cmake for how pybind11 is downloaded and included into k2.

// Google glog's api does not have an external function that allows one to check
// if glog is initialized or not. It does have an internal function - so we are
// declaring it here. This is a hack but has been used by a bunch of others too.
// (e.g. Torch).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better if you add the corresponding code link of PyTorch? (and I don't know what Torch here is, sorry)

@qindazhu
Copy link
Collaborator

qindazhu commented Aug 4, 2020

Mm, it's up to you guys, but maybe master first and then merge it to cuda?

+1, master first and then cuda.
Note for assertion code in cuda, we may merge the those code first to cuda branch as there's no nvcc build support on master now (since we have decided to do cuda stuff on cuda branch for now and merge back into master at some time.)

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020

About including both (say) PyTorch and k2, or TensorFlow and k2, I'm thinking that to avoid problems we could arrange to only include a small part of k2 code that doesn't require the logging #include, in the file that we do those PyTorch/TF #includes. That way we don't have to worry so much about clashing macros. BUT this does worry me a bit. E.g. what if someone were to try to include both k2 and torch in a project? Would it even work?
I'm thinking it would cause linking issues, at least.

@qindazhu
Copy link
Collaborator

qindazhu commented Aug 4, 2020

And again, I suggest wrapper glog (or loguru, but I really don't think it's necessary...) into some functions/macros with some name prefixed with K2_ or else, which will keep scalability in the future.

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020

How about we just #include glog as-is without anything else special, wrap it with K2_ for future compatibility via e.g.

#define K2_CHECK_EQ CHECK_EQ

and leave it till later to have this loguru stuff? I feel like that would keep things future-compatible with the minimum amount of complexity at this stage.

@csukuangfj
Copy link
Collaborator

How about we just #include glog as-is without anything else special, wrap it with K2_ for future compatibility via e.g.

#define K2_CHECK_EQ CHECK_EQ
and leave it till later to have this loguru stuff? I feel like that would keep things future-compatible with the minimum amount of complexity at this stage.

+2

@megazone87
Copy link
Collaborator Author

Meixu, am I correct that to use this, one is supposed to use things like CHECK_EQ and so on? (That's what I seem to see in loguru.h, for instance).

Correct. The same macro name is undef and define in loguru with the same signature. So no changes for codes use glog.

I'm concerned that a method that relies on macros without prefixes may easily clash with external codebases such as PyTorch; at some point we need a .cc file that #includes both torch and k2. Or is there some reason why this is not a problem?

Once glog is used for library (best to be shared), the glog init/config/link.. management responsibility is transfer to the final user (application level). Worrying but unfortunately, it is the doomed usage from the official:
google/glog#83
google/glog#244
kubernetes/kubernetes#61006

You see why glog is problematic. However, pytorch/tf (what k2 would live together) and many good packages use glog and follow this way.
In case somehow the glog is misuse, unavailable, unsuitable, confusing .., there is another minimal glog-like logger for configuration. Here loguru is it.

Thus, K2_ prefix is against the rule of glog, as it would make things worse. Think about multiple libraries link together, and keep in mind glog only init once. At least, the user would lost in configurations.

To own K2_ prefix is good if exclude glog totally and uses customized one (loguru is still one promising option. Its macro would append _F to be glog-dislike if set LOGURU_REPLACE_GLOG = 0.).

Also: where is one supposed to go to see what the interface of the logging is? It would be nice if logging.h explained this clearly: I tried by it goes down a rabbit hole of conditional includes.

Yes, loguru provides some flags to tune, like set LOGURU_REPLACE_GLOG=1 to imitate glog. LOGURU_REPLACE_GLOG=1 is set to be the default, the usage is as same as glog (except only part of common features/macro of glog are supported). As to the usage of glog, it is simple to refer to some articles, e.g. http://rpg.ifi.uzh.ch/docs/glog.html


I'm not a loguru fan, a logger replacement could introduce for reasons like: to be compatible with nvcc. I mean loguru is still in consideration, IDK if it is applicable (it support printf instead of stream.), probably give my answer soon.

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020 via email

@qindazhu
Copy link
Collaborator

qindazhu commented Aug 4, 2020

I'm OK with writing those K2_CHECK by ourselves. The only thing I worry about is its robustness for complex scenarios (e.g. multiple thread,file system support, cache, etc.). But maybe we don't really need to handle those scenarios.
BTW, I like the sytle of log in protobuf, it's so lightweight and easy to understand. But it seems there's no stack trace in it?

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020 via email

@qindazhu
Copy link
Collaborator

qindazhu commented Aug 4, 2020

Mm, the multi-threaded stuff can be handled, I think, by first formatting to a string and then writing with fwrite() or whatever it is, then it becomes an atomic operation. In CUDA, I think we'd just print to stderr using fprintf directly, assuming you can get fprintf there.

On Tue, Aug 4, 2020 at 9:36 PM Haowen Qiu @.***> wrote: I'm OK with writing those K2_CHECK by ourselves. The only thing I worry about is its robustness for complex scenarios (e.g. multiple thread,file system support, cache, etc.). But maybe we don't really need to handle those scenarios. BTW, I like the sytle of log in protobuf, it's so lightweight and easy to understand. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#80 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZBQQXPXH43SEQ6EDLR7AFG7ANCNFSM4PTOHZ6A .

Yes, so if we decide to write those CHECKs by ourselves (instead of based on glog or other log systems), we need to list what features (multiple-thread, stace trace, etc.) we would like to support first.. That maybe needs some time to investigate.
Maybe it's better if we can start from glog first (as we have added it in our codebase already). I mean, based on glog to write some K2_CHECK, and then we can start to use those macros to do real work. we can just update those K2_CHECK_ macros later if we would like to replace glog with something else, this will not affect those code calling K2_CHECK.

@danpovey
Copy link
Collaborator

danpovey commented Aug 4, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants