Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging level #38

Merged
merged 4 commits into from
Nov 16, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/common/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ redis:
hiredis:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried running

@ray.remote
def f():
  return 1

And it printed

[ERROR] (state/table.c:70: errno: Operation now in progress) Table command with timer ID 0 failed

If that is something that happens normally while Ray is functioning correctly, then there it shouldn't be an error or even a warning. It should be INFO or DEBUG, right?

git submodule update --init --recursive -- "thirdparty/hiredis" ; cd thirdparty/hiredis ; make

test: CFLAGS += -DRAY_COMMON_LOG_LEVEL=4
test: hiredis redis $(BUILD)/common_tests $(BUILD)/task_table_tests $(BUILD)/object_table_tests $(BUILD)/db_tests $(BUILD)/io_tests $(BUILD)/task_tests $(BUILD)/redis_tests FORCE
./thirdparty/redis-3.2.3/src/redis-server &
sleep 1s
Expand Down
2 changes: 1 addition & 1 deletion src/common/common.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ unique_id globally_unique_id(void) {
/* Use /dev/urandom for "real" randomness. */
int fd;
if ((fd = open("/dev/urandom", O_RDONLY)) == -1) {
LOG_ERR("Could not generate random number");
LOG_ERROR("Could not generate random number");
}
unique_id result;
read(fd, &result.id[0], UNIQUE_ID_SIZE);
Expand Down
39 changes: 31 additions & 8 deletions src/common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,42 @@

#include "utarray.h"

#ifndef RAY_COMMON_DEBUG
#define RAY_COMMON_DEBUG 0
#define RAY_COMMON_INFO 1
#define RAY_COMMON_WARNING 2
#define RAY_COMMON_ERROR 3
#define RAY_COMMON_FATAL 4

/* Default logging level is INFO. */
#ifndef RAY_COMMON_LOG_LEVEL
#define RAY_COMMON_LOG_LEVEL RAY_COMMON_INFO
#endif

#if (RAY_COMMON_LOG_LEVEL > RAY_COMMON_DEBUG)
#define LOG_DEBUG(M, ...)
#else
#define LOG_DEBUG(M, ...) \
fprintf(stderr, "[DEBUG] (%s:%d) " M "\n", __FILE__, __LINE__, ##__VA_ARGS__)
#endif

#define LOG_ERR(M, ...) \
fprintf(stderr, "[ERROR] (%s:%d: errno: %s) " M "\n", __FILE__, __LINE__, \
errno == 0 ? "None" : strerror(errno), ##__VA_ARGS__)

#if (RAY_COMMON_LOG_LEVEL > RAY_COMMON_INFO)
#define LOG_INFO(M, ...)
#else
#define LOG_INFO(M, ...) \
fprintf(stderr, "[INFO] (%s:%d) " M "\n", __FILE__, __LINE__, ##__VA_ARGS__)
#endif

#if (RAY_COMMON_LOG_LEVEL > RAY_COMMON_ERROR)
#define LOG_ERROR(M, ...)
#else
#define LOG_ERROR(M, ...) \
fprintf(stderr, "[ERROR] (%s:%d: errno: %s) " M "\n", __FILE__, __LINE__, \
errno == 0 ? "None" : strerror(errno), ##__VA_ARGS__)
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may have introduced some compiler warnings.

state/table.c:70:57: warning: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Wformat]
    LOG_ERROR("Table command with timer ID %ld failed", timer_id);
                                           ~~~          ^~~~~~~~
                                           %lld
./common.h:44:52: note: expanded from macro 'LOG_ERROR'
          errno == 0 ? "None" : strerror(errno), ##__VA_ARGS__)


#if (RAY_COMMON_LOG_LEVEL > RAY_COMMON_FATAL)
#define LOG_FATAL(M, ...)
#else
#define LOG_FATAL(M, ...) \
do { \
fprintf(stderr, "[FATAL] (%s:%d) " M "\n", __FILE__, __LINE__, \
Expand All @@ -34,10 +56,11 @@
backtrace_symbols_fd(buffer, calls, 1); \
exit(-1); \
} while (0);
#endif

#define CHECKM(COND, M, ...) \
if (!(COND)) { \
LOG_ERR("Check failure: %s \n" M, #COND, ##__VA_ARGS__); \
#define CHECKM(COND, M, ...) \
if (!(COND)) { \
LOG_ERROR("Check failure: %s \n" M, #COND, ##__VA_ARGS__); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECKM definitely should be fatal.

That said, do we actually want LOG_WARN, LOG_ERROR, and LOG_FATAL? It seems like either we can recover, in which case it should be LOG_WARN, or we can't, in which case it should be LOG_FATAL.

I think pretty much all of the LOG_ERRORs that we currently have should potentially be fatal errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was just following standard logging levels found in other systems here. I think it makes sense to differentiate between LOG_WARN and LOG_ERROR. Here is a good post that talks about the different levels: link.

I'll look through the LOG_ERRORs and see which ones might need to be LOG_FATAL.

}

#define CHECK(COND) CHECKM(COND, "")
Expand Down
28 changes: 14 additions & 14 deletions src/common/io.c
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ int bind_inet_sock(const int port, bool shall_listen) {
struct sockaddr_in name;
int socket_fd = socket(PF_INET, SOCK_STREAM, 0);
if (socket_fd < 0) {
LOG_ERR("socket() failed for port %d.", port);
LOG_ERROR("socket() failed for port %d.", port);
return -1;
}
name.sin_family = AF_INET;
Expand All @@ -42,22 +42,22 @@ int bind_inet_sock(const int port, bool shall_listen) {
int on = 1;
/* TODO(pcm): http://stackoverflow.com/q/1150635 */
if (ioctl(socket_fd, FIONBIO, (char *) &on) < 0) {
LOG_ERR("ioctl failed");
LOG_ERROR("ioctl failed");
close(socket_fd);
return -1;
}
if (setsockopt(socket_fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) < 0) {
LOG_ERR("setsockopt failed for port %d", port);
LOG_ERROR("setsockopt failed for port %d", port);
close(socket_fd);
return -1;
}
if (bind(socket_fd, (struct sockaddr *) &name, sizeof(name)) < 0) {
LOG_ERR("Bind failed for port %d", port);
LOG_ERROR("Bind failed for port %d", port);
close(socket_fd);
return -1;
}
if (shall_listen && listen(socket_fd, 5) == -1) {
LOG_ERR("Could not listen to socket %d", port);
LOG_ERROR("Could not listen to socket %d", port);
close(socket_fd);
return -1;
}
Expand All @@ -77,14 +77,14 @@ int bind_ipc_sock(const char *socket_pathname, bool shall_listen) {
struct sockaddr_un socket_address;
int socket_fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (socket_fd < 0) {
LOG_ERR("socket() failed for pathname %s.", socket_pathname);
LOG_ERROR("socket() failed for pathname %s.", socket_pathname);
return -1;
}
/* Tell the system to allow the port to be reused. */
int on = 1;
if (setsockopt(socket_fd, SOL_SOCKET, SO_REUSEADDR, (char *) &on,
sizeof(on)) < 0) {
LOG_ERR("setsockopt failed for pathname %s", socket_pathname);
LOG_ERROR("setsockopt failed for pathname %s", socket_pathname);
close(socket_fd);
return -1;
}
Expand All @@ -93,7 +93,7 @@ int bind_ipc_sock(const char *socket_pathname, bool shall_listen) {
memset(&socket_address, 0, sizeof(struct sockaddr_un));
socket_address.sun_family = AF_UNIX;
if (strlen(socket_pathname) + 1 > sizeof(socket_address.sun_path)) {
LOG_ERR("Socket pathname is too long.");
LOG_ERROR("Socket pathname is too long.");
close(socket_fd);
return -1;
}
Expand All @@ -102,12 +102,12 @@ int bind_ipc_sock(const char *socket_pathname, bool shall_listen) {

if (bind(socket_fd, (struct sockaddr *) &socket_address,
sizeof(struct sockaddr_un)) != 0) {
LOG_ERR("Bind failed for pathname %s.", socket_pathname);
LOG_ERROR("Bind failed for pathname %s.", socket_pathname);
close(socket_fd);
return -1;
}
if (shall_listen && listen(socket_fd, 5) == -1) {
LOG_ERR("Could not listen to socket %s", socket_pathname);
LOG_ERROR("Could not listen to socket %s", socket_pathname);
close(socket_fd);
return -1;
}
Expand All @@ -125,22 +125,22 @@ int connect_ipc_sock(const char *socket_pathname) {

socket_fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (socket_fd < 0) {
LOG_ERR("socket() failed for pathname %s.", socket_pathname);
LOG_ERROR("socket() failed for pathname %s.", socket_pathname);
return -1;
}

memset(&socket_address, 0, sizeof(struct sockaddr_un));
socket_address.sun_family = AF_UNIX;
if (strlen(socket_pathname) + 1 > sizeof(socket_address.sun_path)) {
LOG_ERR("Socket pathname is too long.");
LOG_ERROR("Socket pathname is too long.");
return -1;
}
strncpy(socket_address.sun_path, socket_pathname,
strlen(socket_pathname) + 1);

if (connect(socket_fd, (struct sockaddr *) &socket_address,
sizeof(struct sockaddr_un)) != 0) {
LOG_ERR("Connection to socket failed for pathname %s.", socket_pathname);
LOG_ERROR("Connection to socket failed for pathname %s.", socket_pathname);
return -1;
}

Expand All @@ -154,7 +154,7 @@ int connect_ipc_sock(const char *socket_pathname) {
int accept_client(int socket_fd) {
int client_fd = accept(socket_fd, NULL, NULL);
if (client_fd < 0) {
LOG_ERR("Error reading from socket.");
LOG_ERROR("Error reading from socket.");
return -1;
}
return client_fd;
Expand Down
35 changes: 19 additions & 16 deletions src/common/state/redis.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,11 @@
#include "redis.h"
#include "io.h"

#define LOG_REDIS_ERR(context, M, ...) \
LOG_INFO("Redis error %d %s; %s", context->err, context->errstr, M)
#define LOG_REDIS_ERROR(context, M, ...) \
LOG_ERROR("Redis error %d %s; %s", context->err, context->errstr, M)

#define LOG_REDIS_DEBUG(context, M, ...) \
LOG_DEBUG("Redis error %d %s; %s", context->err, context->errstr, M)

#define CHECK_REDIS_CONNECT(CONTEXT_TYPE, context, M, ...) \
do { \
Expand All @@ -26,7 +29,7 @@
LOG_FATAL("could not allocate redis context"); \
} \
if (_context->err) { \
LOG_REDIS_ERR(_context, M, ##__VA_ARGS__); \
LOG_REDIS_ERROR(_context, M, ##__VA_ARGS__); \
exit(-1); \
} \
} while (0);
Expand Down Expand Up @@ -202,7 +205,7 @@ void redis_object_table_add(table_callback_data *callback_data) {
(void *) callback_data->timer_id, "SADD obj:%b %d",
id.id, sizeof(object_id), db->client_id);
if ((status == REDIS_ERR) || db->context->err) {
LOG_REDIS_ERR(db->context, "could not add object_table entry");
LOG_REDIS_DEBUG(db->context, "could not add object_table entry");
}
}

Expand All @@ -216,7 +219,7 @@ void redis_object_table_lookup(table_callback_data *callback_data) {
(void *) callback_data->timer_id,
"SMEMBERS obj:%b", id.id, sizeof(object_id));
if ((status == REDIS_ERR) || db->context->err) {
LOG_REDIS_ERR(db->context, "error in object_table lookup");
LOG_REDIS_DEBUG(db->context, "error in object_table lookup");
}
}

Expand Down Expand Up @@ -247,7 +250,7 @@ void redis_result_table_add(table_callback_data *callback_data) {
"SET result:%b %b", id.id, sizeof(object_id),
(*result_task_id).id, sizeof(task_id));
if ((status == REDIS_ERR) || db->context->err) {
LOG_REDIS_ERR(db->context, "Error in result table add");
LOG_REDIS_DEBUG(db->context, "Error in result table add");
}
}

Expand Down Expand Up @@ -292,11 +295,11 @@ void redis_result_table_lookup_object_callback(redisAsyncContext *c,
(void *) callback_data->timer_id, "HGETALL task:%b",
(*result_task_id).id, sizeof(task_id));
if ((status == REDIS_ERR) || db->context->err) {
LOG_REDIS_ERR(db->context, "Could not look up result table entry");
LOG_REDIS_DEBUG(db->context, "Could not look up result table entry");
}
} else if (reply->type == REDIS_REPLY_NIL) {
/* The object with the requested ID was not in the table. */
LOG_ERR("Object's result not in table.");
LOG_ERROR("Object's result not in table.");
result_table_lookup_callback done_callback = callback_data->done_callback;
if (done_callback) {
done_callback(callback_data->id, NULL, callback_data->user_context);
Expand All @@ -318,7 +321,7 @@ void redis_result_table_lookup(table_callback_data *callback_data) {
(void *) callback_data->timer_id, "GET result:%b",
id.id, sizeof(object_id));
if ((status == REDIS_ERR) || db->context->err) {
LOG_REDIS_ERR(db->context, "Error in result table lookup");
LOG_REDIS_DEBUG(db->context, "Error in result table lookup");
}
}

Expand Down Expand Up @@ -413,8 +416,8 @@ void redis_object_table_subscribe(table_callback_data *callback_data) {
"SUBSCRIBE __keyspace@0__:%b add", id.id,
sizeof(object_id));
if ((status == REDIS_ERR) || db->sub_context->err) {
LOG_REDIS_ERR(db->sub_context,
"error in redis_object_table_subscribe_callback");
LOG_REDIS_DEBUG(db->sub_context,
"error in redis_object_table_subscribe_callback");
}
}

Expand Down Expand Up @@ -453,7 +456,7 @@ void redis_task_table_get_task(table_callback_data *callback_data) {
(void *) callback_data->timer_id, "HGETALL task:%b",
id.id, sizeof(task_id));
if ((status == REDIS_ERR) || db->sub_context->err) {
LOG_REDIS_ERR(db->sub_context, "Could not get task from task table");
LOG_REDIS_DEBUG(db->sub_context, "Could not get task from task table");
}
}

Expand Down Expand Up @@ -506,7 +509,7 @@ void redis_task_table_publish(table_callback_data *callback_data,
(char *) spec, task_spec_size(spec));
}
if ((status = REDIS_ERR) || db->context->err) {
LOG_REDIS_ERR(db->context, "error setting task in task_table_add_task");
LOG_REDIS_DEBUG(db->context, "error setting task in task_table_add_task");
}
}

Expand All @@ -518,8 +521,8 @@ void redis_task_table_publish(table_callback_data *callback_data,
task_size(task));

if ((status == REDIS_ERR) || db->context->err) {
LOG_REDIS_ERR(db->context,
"error publishing task in task_table_add_task");
LOG_REDIS_DEBUG(db->context,
"error publishing task in task_table_add_task");
}
}
}
Expand Down Expand Up @@ -617,7 +620,7 @@ void redis_task_table_subscribe(table_callback_data *callback_data) {
(char *) node.id, sizeof(node_id), data->state_filter);
}
if ((status == REDIS_ERR) || db->sub_context->err) {
LOG_REDIS_ERR(db->sub_context, "error in task_table_register_callback");
LOG_REDIS_DEBUG(db->sub_context, "error in task_table_register_callback");
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/common/state/table.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ int64_t table_timeout_handler(event_loop *loop,
if (callback_data->retry.num_retries == 0) {
/* We didn't get a response from the database after exhausting all retries;
* let user know, cleanup the state, and remove the timer. */
LOG_ERR("Table command with timer ID %ld failed", timer_id);
LOG_ERROR("Table command with timer ID %ld failed", timer_id);
if (callback_data->retry.fail_callback) {
callback_data->retry.fail_callback(
callback_data->id, callback_data->user_context, callback_data->data);
Expand Down
6 changes: 3 additions & 3 deletions src/plasma/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ BUILD = build
all: $(BUILD)/plasma_store $(BUILD)/plasma_manager $(BUILD)/plasma_client.so $(BUILD)/example $(BUILD)/libplasma_client.a

debug: FORCE
debug: CFLAGS += -DRAY_COMMON_DEBUG=1
debug: CFLAGS += -DRAY_COMMON_LOG_LEVEL=0
debug: all

clean:
Expand Down Expand Up @@ -35,8 +35,8 @@ common: FORCE
git submodule update --init --recursive
cd ../common; make

# Set the request timeout low for testing purposes.
test: CFLAGS += -DRAY_TIMEOUT=50
# Set the request timeout low and logging level at FATAL for testing purposes.
test: CFLAGS += -DRAY_TIMEOUT=50 -DRAY_COMMON_LOG_LEVEL=4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep the ERROR level, then maybe this should be ERROR. If not, the FATAL is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the level at FATAL because some of the tests test error scenarios, which is why we have so many log messages right now.

# First, build and run all the unit tests.
test: $(BUILD)/manager_tests FORCE
./build/manager_tests
Expand Down
4 changes: 2 additions & 2 deletions src/plasma/malloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,11 @@ int create_buffer(int64_t size) {
return -1;
}
if (unlink(file_name) != 0) {
LOG_ERR("unlink error");
LOG_ERROR("unlink error");
return -1;
}
if (ftruncate(fd, (off_t) size) != 0) {
LOG_ERR("ftruncate error");
LOG_ERROR("ftruncate error");
return -1;
}
return fd;
Expand Down
8 changes: 4 additions & 4 deletions src/plasma/plasma_client.c
Original file line number Diff line number Diff line change
Expand Up @@ -411,13 +411,13 @@ void plasma_disconnect(plasma_connection *conn) {
int plasma_manager_try_connect(const char *ip_addr, int port) {
int fd = socket(PF_INET, SOCK_STREAM, 0);
if (fd < 0) {
LOG_ERR("could not create socket");
LOG_ERROR("could not create socket");
return -1;
}

struct hostent *manager = gethostbyname(ip_addr); /* TODO(pcm): cache this */
if (!manager) {
LOG_ERR("plasma manager %s not found", ip_addr);
LOG_ERROR("plasma manager %s not found", ip_addr);
return -1;
}

Expand All @@ -428,7 +428,7 @@ int plasma_manager_try_connect(const char *ip_addr, int port) {

int r = connect(fd, (struct sockaddr *) &addr, sizeof(addr));
if (r < 0) {
LOG_ERR(
LOG_ERROR(
"could not establish connection to manager with id %s:%d (may have run "
"out of ports)",
&ip_addr[0], port);
Expand Down Expand Up @@ -485,7 +485,7 @@ void plasma_fetch(plasma_connection *conn,
nbytes = recv(conn->manager_conn, (uint8_t *) &reply, sizeof(reply),
MSG_WAITALL);
if (nbytes < 0) {
LOG_ERR("Error while waiting for manager response in fetch");
LOG_ERROR("Error while waiting for manager response in fetch");
success = 0;
} else if (nbytes == 0) {
success = 0;
Expand Down
Loading