Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature-Request] Allow copying attrs from stdlib LogRecord into structlog's msg #253

Closed
chiragjn opened this issue Mar 21, 2020 · 17 comments · Fixed by #365
Closed

[Feature-Request] Allow copying attrs from stdlib LogRecord into structlog's msg #253

chiragjn opened this issue Mar 21, 2020 · 17 comments · Fixed by #365

Comments

@chiragjn
Copy link

chiragjn commented Mar 21, 2020

When ._proxy_to_logger is called, Python's built-in logging calls makeRecord that computes attributes that might be useful for some use cases (the most common example being filename/module and line number)

All of that information is available in ProcessorFormatter's .format at which structlog just copies the msg attribute (event_dict) and drops the rest.

My proposal is to allow copying attributes from LogRecord into the event_dict


EDIT: See #253 (comment) for a better idea


Roughly, that would look like

class ProcessorFormatter(logging.Formatter):

    def __init__(self, ..., copy_logrecord_attrs: Tuple[str] = ()):
        ...
        self.copy_logrecord_attrs = copy_logrecord_attrs

    def format(self, record):
        record = logging.makeLogRecord(record.__dict__)
        try:
            ...
            ed = {attr: getattr(record, attr, '?') for attr in self.copy_logrecord_attrs}
            ed.update(**record.msg.copy())
        except AttributeError:
            ...

Here, I am assuming

  1. It is better to allow values from attached processors to take precedence over copying from LogRecord
  2. The user of the library understands what attributes when copied are serializable

Would love to hear your thoughts on this and if it's acceptable I can send a PR too.

EDIT: After studying the code a bit more, I realized that I can get those LogRecord attributes rendered simply by passing a custom format using fmt keyword arg, but that would mean losing the niceties of using JSONRenderer

@chiragjn
Copy link
Author

Another stupid question, I tried digging through git history but still confused

def format(self, record):
"""
Extract ``structlog``'s `event_dict` from ``record.msg`` and format it.
"""
# Make a shallow copy of the record to let other handlers/formatters
# process the original one
record = logging.makeLogRecord(record.__dict__)
try:
# Both attached by wrap_for_formatter
logger = self.logger if self.logger is not None else record._logger
meth_name = record._name
# We need to copy because it's possible that the same record gets
# processed by multiple logging formatters. LogRecord.getMessage
# would transform our dict into a str.
ed = record.msg.copy()

is the .copy() on line 510 done to avoid any mutation by self.processor? because the comment on 507-509 seems to be taken care by line 501 by creating a new instance of LogRecord

@hynek
Copy link
Owner

hynek commented Mar 25, 2020

Is this related to #209?

@chiragjn
Copy link
Author

chiragjn commented Mar 25, 2020

@hynek Not really, 209 wants to support the extra argument available on stdlib's Logger instance methods

My request is to be able to copy some of these attributes https://github.com/python/cpython/blob/950c6795aa0ffa85e103a13e7a04e08cb34c66ad/Lib/logging/__init__.py#L290-L354 that stdlib's formatter uses on the format template given to it.

Structlog makes a LogRecord, but drops everything except the .msg attr

My use case is to be able to add module and line number to logs, for which I wrote a processor

from typing import Optional, Dict, Any
import structlog

def add_module_and_lineno(logger: structlog.BoundLogger, name: str, event_dict: Dict[str, Any]) -> Dict[str, Any]:
    # noinspection PyProtectedMember
    frame, module_str = structlog._frames._find_first_app_frame_and_name(additional_ignores=[__name__])
    # frame has filename, caller and line number
    event_dict['module'] = module_str
    event_dict['lineno'] = frame.f_lineno
    return event_dict

I was going to submit a PR to make this part of structlog's stdlib but then I thought about customizability, then did some code reading and realised structlog already has all relevant attributes already computed via stdlib's LogRecord init

@hynek
Copy link
Owner

hynek commented Mar 28, 2020

So it's loosely related to #246? I'm sorry but the stdlib code is a corner that I don't know that good myself and have to work myself into it every single time again.

Generally speaking I'm happy to try to preserve as much data as possible, as long as it fits the overall design.

@chiragjn
Copy link
Author

chiragjn commented Mar 29, 2020

I saw the PR you mentioned, but that seems it will only work for logs emitted from non-structlog Logger instances, that too only if that function is connected via foreign_pre_chain on ProcessorFormatter

The processor, I posted above, is a structlog processor and works only for logs emitted using BoundLogger which I am okay with, just that I have to use some protected members and the frame fetching is being done twice, once via logging.Logger._log which will put it into LogRecord object and a second time via my processor which will put it into event_dict.

On some further thinking, I think a better design would be to give a pre_render_hook argument to ProcessorFormatter that allows the user to do whatever they like before Serializer/Renderer (Json, KeyValue, Console, whatever) is called

class ProcessorFormatter(logging.Formatter):

    def __init__(self, ..., pre_render_hook: Optional[Callable] = None):
        ...
        self.pre_render_hook = pre_render_hook

    def format(self, record):
        record = logging.makeLogRecord(record.__dict__)
        ...
        if self.pre_render_hook is not None:
            ed = self.pre_render_hook(record=record, event_dict=ed)
        record.msg = self.processor(logger, meth_name, ed)
        return super(ProcessorFormatter, self).format(record)

Then I can easily do

def add_module_and_lineno(record: LogRecord, event_dict: Dict[str, Any]) -> Dict[str, Any]:
    event_dict['module'] = record.module
    event_dict['lineno'] = record.lineno
    return event_dict

LOGGING = {
    'formatters': {
        'json_formatter': {
            '()': structlog.stdlib.ProcessorFormatter,
            'processor': structlog.processors.JSONRenderer(),
            'pre_render_hook': add_module_and_lineno,
        },
    }
}

Well, this can also be achieved anywhere, even in the renderer (self.processor) if it can get access to the LogRecord object

record.msg = self.processor(logger, meth_name, ed, record)

I can easily inherit from structlog.processors.JSONRenderer and override __call__ for my functionality


I am not sure what design would work best here, thoughts on this?

@DrPyser
Copy link

DrPyser commented Jun 16, 2020

I'd also really like to exploit information included in LogRecords.

Where/when in the processor chain does the LogRecord first becomes available?
Ideally, all useful information would be extracted from the LogRecord as early as possible, allowing formatting/rendering and other processing to remain agnostic to the underlying logging layer?

I've been trying to use ProcessorFormatter with a custom foreign_pre_chain to extract the LogRecord attributes, but it does not seem to apply to loggers immediately:

def process_log_record(logger, method_name, event_dict, *args, **kwargs):
    record: logging.LogRecord = event_dict.get("_record")
    if record:
        event_dict = dict(event_dict, **{
            "module_name": record.module,
            "function_name": record.funcName,
            "line_number": record.lineno,
            "process_name": record.processName,
            "process_id": record.process,
            "file_name": record.pathname,
            "thread_name": record.threadName,
            "thread_id": record.thread,
        })
    return event_dict


def configure():
    structlog.configure(
        processors=[
            structlog.stdlib.filter_by_level,
            structlog.stdlib.add_logger_name,
            structlog.stdlib.add_log_level,
            structlog.stdlib.PositionalArgumentsFormatter(),
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            structlog.processors.UnicodeDecoder(),
            process_log_record,
            structlog.stdlib.ProcessorFormatter.wrap_for_formatter
        ],
        logger_factory=structlog.stdlib.LoggerFactory(),
        wrapper_class=LoggerClass
    )
    # formatter = structlog.stdlib.ProcessorFormatter(
    #     processor=structlog.dev.ConsoleRenderer
    # )
    logging.config.dictConfig({
        "version": 1,
        "disable_existing_loggers": True,
        "formatters": {
            "dev": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": structlog.dev.ConsoleRenderer(),
                "foreign_pre_chain": [process_log_record]
            }
        },
        "handlers": {
            "default": {
                "()": logging.StreamHandler,
                "level": "DEBUG",
                "formatter": "dev"
            }
        },
        "loggers": {
            "": {
                "handlers": ["default"],
                "level": "DEBUG",
                "propagate": True
            }
        }
    })

configure()
structlog.get_logger().log("Test")

@DrPyser
Copy link

DrPyser commented Jun 16, 2020

The first logging calls seemed to be handled by a _FixedFindCallerLogger instance.

@chiragjn
Copy link
Author

@DrPyser
You'll get _record only when using a logging.Logger instance with ProcessFormatter

ed = {"event": record.getMessage(), "_record": record}

for proc in self.foreign_pre_chain or ():
ed = proc(logger, meth_name, ed)

You don't get access to LogRecord instance in processors with structlog.stdlib.BoundLogger because the record instance is only made after calling all the processors, in fact, ProcessFormatter.wrap_for_formatter, the last processor in the chain, decides the args and kwargs to pass to LogRecord.__init__

args, kw = self._process_event(method_name, event, event_kw)
return getattr(self._logger, method_name)(*args, **kw)

Hence, the feature request :)

Test your example with a logging.Logger instance, it should most likely work

logging.get_logger().log("Test")

@neara
Copy link

neara commented Jun 25, 2020

Would absolutely love to have this feature.
I ended up manually extracting the params:

import inspect
import os
import threading

import structlog._frames


def add_standard_logger_params(_logger, _method_name, event_dict):
    record = event_dict.get("_record")
    if record:
        event_dict["funcName"] = record.funcName
        event_dict["thread"] = record.thread
        event_dict["pathname"] = record.pathname
        event_dict["lineno"] = record.lineno
    else:
        frame, _module_str = structlog._frames._find_first_app_frame_and_name(
            additional_ignores=[__name__]
        )
        frame_info = inspect.getframeinfo(frame)
        event_dict["funcName"] = frame_info.function
        event_dict["thread"] = threading.get_ident()
        event_dict["pathname"] = frame_info.filename
        event_dict["lineno"] = frame_info.lineno

    event_dict["process"] = os.getpid()
    return event_dict

But i would love to skip the manual extraction of these parameters and rely on std logging already doing the job.

@hynek
Copy link
Owner

hynek commented Jun 26, 2020

Well, TBH we'll probably add just something like that to structlog. :) The question here is how much control do we need to give the user? Is it OK to always dump all info into the context? I'd rather not have huge if-then-else trees.

@chiragjn
Copy link
Author

chiragjn commented Jun 26, 2020

@hynek Any thoughts on the idea in #253 (comment) ?
Just like processors allow the user access to the event_dict and the LogRecord in a callable and let them do what they want before the actual rendering happens.

Personally I think, if the LogRecord was created before running structlog processors then maybe things would simpler because then processors can access to the LogRecord instance. But that level of changes would be massive. I am sure there must be reasons why things came to be the way they are

logging.Logger -> LogRecord.__init__ -> structlog processors -> renderer
structlog.stdlib.BoundLogger -> structlog processors -> LogRecord.__init__ -> renderer

@neara
Copy link

neara commented Jun 28, 2020

Well, TBH we'll probably add just something like that to structlog. :) The question here is how much control do we need to give the user? Is it OK to always dump all info into the context? I'd rather not have huge if-then-else trees.

I think best case scenario, if it's possible, would be to provide access to LogRecord and let user decide what info should be in the event_dict via custom processor, defined in structlog.configure. It feels a little bit more intuitive to me. But then it will probably cause double initialization of log record (in structlog.stdlib.ProcessorFormatter). I might be wrong about this.

Maybe easiest would be to add a pre_render_chain in structlog.stdlib.ProcessorFormatter and applying it on logger, method_name, event_dict before calling self.processor(...) (similar to how foreign_pre_chain is applied).

This was the way I was thinking to solve this without touching the library.
On the negative side, this is a bit out of order as to why define processors in 2 different places and when to use what.

@pybuche
Copy link

pybuche commented Nov 2, 2020

@hynek looks like @neara proposal could be implemented straight away in the stdlib.py module.
Is there any blocker not to do it?

@hynek
Copy link
Owner

hynek commented Nov 4, 2020

gestures at everything 😞

But mostly because I was preoccupied with attrs lately and couldn't justify spending significant FOSS time on structlog, since it's not part of Tidelift yet (= not enough companies would subscribe = no income).

@hynek
Copy link
Owner

hynek commented Nov 11, 2021

Hi everyone, sorry very much for for the long silence. I did keep having this on my mind and felt sufficiently guilty about it. Given structlog's modularity, I hope everyone was able to find a solution on their own anyways.

Now, I've dived into extracting data from LogRecords and it's…adventurous but doable: #209 (comment)

Now, after re-re-re-re-re-re-reading this thread, my understanding is that the problem is that LogRecords that have been created by wrap_for_formatter are not processed by a chain – correct?

While a pre_render_hook would solve the problem, I suspect it would fit better structlog's design to give it a proper chain. It could be even called pre_chain since it's not decidedly not foreign. We could also add something like record._from_structlog = True that gets deleted at the end to allow for processor re-use.


Does any of this make sense?

@hynek
Copy link
Owner

hynek commented Nov 11, 2021

OK after looking at it, related but better idea:

processor becomes processors and record gets _record and _from_structlog attached.

This separates the pre_chain that has legit special stuff to deal with and "let's extract data from a LogRecord" that's useful for everybody.

@hynek
Copy link
Owner

hynek commented Nov 11, 2021

I have implemented this idea in #365 – please let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants