Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid modifying log level globally #1944

Merged
merged 3 commits into from
Nov 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions sdk/python/kubeflow/training/api/training_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@
from kubeflow.training.constants import constants
from kubeflow.training.utils import utils

logging.basicConfig(format="%(message)s")
logging.getLogger().setLevel(logging.INFO)
logger = logging.getLogger(__name__)

status_logger = utils.StatusLogger(
header="{:<30.30} {:<20.20} {}".format("NAME", "STATE", "TIME"),
Expand Down Expand Up @@ -222,7 +221,7 @@ def create_job(
f"Failed to create {job_kind}: {namespace}/{job.metadata.name}"
)

logging.info(f"{job_kind} {namespace}/{job.metadata.name} has been created")
logger.debug(f"{job_kind} {namespace}/{job.metadata.name} has been created")

def get_job(
self,
Expand Down Expand Up @@ -771,7 +770,7 @@ def get_job_logs(
replica_index: Optional[int] = None,
follow: bool = False,
timeout: int = constants.DEFAULT_TIMEOUT,
):
) -> Dict[str, str]:
"""Print the training logs for the Job. By default it returns logs from
the `master` pod.

Expand Down Expand Up @@ -801,6 +800,10 @@ def get_job_logs(
timeout: Optional, Kubernetes API server timeout in seconds
to execute the request.

Returns:
Dict[str, str]: A dictionary in which the keys are pod names and the
values are the corresponding logs.

Raises:
ValueError: Job replica type is invalid.
TimeoutError: Timeout to get Job pods.
Expand All @@ -819,6 +822,7 @@ def get_job_logs(
timeout=timeout,
)

logs_dict = {}
if pods and follow:
log_streams = []
for pod in pods:
Expand Down Expand Up @@ -849,7 +853,7 @@ def get_job_logs(
if logline is None:
finished[index] = True
break
logging.info("[Pod %s]: %s", pods[index], logline)
print(f"[Pod {pods[index]}]: {logline}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the logging.
@droctothorpe Do we want to use print here instead of logging ?
Maybe we should keep our logging in SDK consistent ?
For example, KFP uses logging.info also: https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/client/client.py#L470
And they also setup logging here: https://github.com/kubeflow/pipelines/blob/a9279843946183429f6572516acee6523de36e53/sdk/python/kfp/cli/__main__.py#L23

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @andreyvelich! I'm still wrapping my head around some of these nuances so take these answers with a grain of salt (Python logging is surprisingly complex).


Do we want to use print here instead of logging ?

print is more performant and recommended by the Python docs:

image

Source: https://docs.python.org/3/howto/logging.html#when-to-use-logging


And they also setup logging here:
https://github.com/kubeflow/pipelines/blob/a9279843946183429f6572516acee6523de36e53/sdk/python/kfp/cli/__main__.py#L23

Note that they only do this in the CLI. That's because the CLI is a user-facing API so to speak, i.e. it's not meant to be consumed by other libraries that have their own opinions on how logging should be configured. From the Python docs:

It is strongly advised that you do not add any handlers other than NullHandler to your library’s loggers. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements.

Source: https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library

Copy link
Member

@andreyvelich andreyvelich Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that makes sense, thanks for sharing these links!
I agree, that we should have appropriate logger for the SDK (e.g. logging.getLogger(__name__)) to not use root logger, but I still think that we should use logging to log some data for the user.

The problem with print is that user can't identify from which library the output was generated, but with logging we can configure it.

What do you think about this:

  • We can introduce a new parameter to TrainingClient() called verbose, and we can configure this parameter for various levels of logging: https://docs.python.org/3/library/logging.html#levels. We can start with (INFO, WARNING, and ERROR).
  • Depends on this parameter we can configure our logger in the constructor. For example, for verbose=10
self.logger = logging.getLogger(__name__)
if verbose == 20:
  self.logger.setLevel(logging.INFO)
  • And then we are going to use self.logger.info() or self.logger.warning() or self.logger.error() when it is required to print some data.

  • If user doesn't want to see any logs, they can always override it as follows:

logger = logging.getLogger("kubeflow.training.api.training_client")
logger.setLevel(logging.NOTSET)

Or provide verbose=0 as TrainingClient() parameter:

client = TrainingClient(verbose=0)

WDYT @droctothorpe @johnugeorge @kuizhiqing ?

except queue.Empty:
break
elif pods:
Expand All @@ -860,10 +864,12 @@ def get_job_logs(
namespace,
container=constants.JOB_PARAMETERS[job_kind]["container"],
)
logging.info("The logs of pod %s:\n %s", pod, pod_logs)
logs_dict[pod] = pod_logs
except Exception:
raise RuntimeError(f"Failed to read logs for pod {namespace}/{pod}")

return logs_dict

def update_job(
self,
job: constants.JOB_MODELS_TYPE,
Expand Down Expand Up @@ -908,7 +914,7 @@ def update_job(
except Exception:
raise RuntimeError(f"Failed to update {job_kind}: {namespace}/{name}")

logging.info(f"{job_kind} {namespace}/{name} has been updated")
logger.debug(f"{job_kind} {namespace}/{name} has been updated")

def delete_job(
self,
Expand Down Expand Up @@ -950,4 +956,4 @@ def delete_job(
except Exception:
raise RuntimeError(f"Failed to delete {job_kind}: {namespace}/{name}")

logging.info(f"{job_kind} {namespace}/{name} has been deleted")
logger.debug(f"{job_kind} {namespace}/{name} has been deleted")
7 changes: 3 additions & 4 deletions sdk/python/kubeflow/training/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@
from kubeflow.training import models


logging.basicConfig(format="%(message)s")
logging.getLogger().setLevel(logging.INFO)
logger = logging.getLogger(__name__)


class StatusLogger:
Expand All @@ -39,9 +38,9 @@ def __init__(self, header, column_format):

def __call__(self, *values):
if self.first_call:
logging.info(self.header)
logger.debug(self.header)
self.first_call = False
logging.info(self.column_format.format(*values))
logger.debug(self.column_format.format(*values))


class FakeResponse:
Expand Down