[Feature-Request] Allow copying attrs from stdlib LogRecord into structlog's msg #253

chiragjn · 2020-03-21T22:02:46Z

When ._proxy_to_logger is called, Python's built-in logging calls makeRecord that computes attributes that might be useful for some use cases (the most common example being filename/module and line number)

All of that information is available in ProcessorFormatter's .format at which structlog just copies the msg attribute (event_dict) and drops the rest.

My proposal is to allow copying attributes from LogRecord into the event_dict

EDIT: See #253 (comment) for a better idea

Roughly, that would look like

class ProcessorFormatter(logging.Formatter):

    def __init__(self, ..., copy_logrecord_attrs: Tuple[str] = ()):
        ...
        self.copy_logrecord_attrs = copy_logrecord_attrs

    def format(self, record):
        record = logging.makeLogRecord(record.__dict__)
        try:
            ...
            ed = {attr: getattr(record, attr, '?') for attr in self.copy_logrecord_attrs}
            ed.update(**record.msg.copy())
        except AttributeError:
            ...

Here, I am assuming

It is better to allow values from attached processors to take precedence over copying from LogRecord
The user of the library understands what attributes when copied are serializable

Would love to hear your thoughts on this and if it's acceptable I can send a PR too.

EDIT: After studying the code a bit more, I realized that I can get those LogRecord attributes rendered simply by passing a custom format using fmt keyword arg, but that would mean losing the niceties of using JSONRenderer

The text was updated successfully, but these errors were encountered:

chiragjn · 2020-03-22T00:52:02Z

Another stupid question, I tried digging through git history but still confused

structlog/src/structlog/stdlib.py

Lines 495 to 510 in 36eeb18

    
               def format(self, record): 
        
                   """ 
        
                   Extract ``structlog``'s `event_dict` from ``record.msg`` and format it. 
        
                   """ 
        
                   # Make a shallow copy of the record to let other handlers/formatters 
        
                   # process the original one 
        
                   record = logging.makeLogRecord(record.__dict__) 
        
                   try: 
        
                       # Both attached by wrap_for_formatter 
        
                       logger = self.logger if self.logger is not None else record._logger 
        
                       meth_name = record._name 
        
                       # We need to copy because it's possible that the same record gets 
        
                       # processed by multiple logging formatters.  LogRecord.getMessage 
        
                       # would transform our dict into a str. 
        
                       ed = record.msg.copy()

is the .copy() on line 510 done to avoid any mutation by self.processor? because the comment on 507-509 seems to be taken care by line 501 by creating a new instance of LogRecord

hynek · 2020-03-25T11:13:59Z

Is this related to #209?

chiragjn · 2020-03-25T14:27:24Z

@hynek Not really, 209 wants to support the extra argument available on stdlib's Logger instance methods

My request is to be able to copy some of these attributes https://github.com/python/cpython/blob/950c6795aa0ffa85e103a13e7a04e08cb34c66ad/Lib/logging/__init__.py#L290-L354 that stdlib's formatter uses on the format template given to it.

Structlog makes a LogRecord, but drops everything except the .msg attr

My use case is to be able to add module and line number to logs, for which I wrote a processor

from typing import Optional, Dict, Any
import structlog

def add_module_and_lineno(logger: structlog.BoundLogger, name: str, event_dict: Dict[str, Any]) -> Dict[str, Any]:
    # noinspection PyProtectedMember
    frame, module_str = structlog._frames._find_first_app_frame_and_name(additional_ignores=[__name__])
    # frame has filename, caller and line number
    event_dict['module'] = module_str
    event_dict['lineno'] = frame.f_lineno
    return event_dict

I was going to submit a PR to make this part of structlog's stdlib but then I thought about customizability, then did some code reading and realised structlog already has all relevant attributes already computed via stdlib's LogRecord init

hynek · 2020-03-28T09:08:16Z

So it's loosely related to #246? I'm sorry but the stdlib code is a corner that I don't know that good myself and have to work myself into it every single time again.

Generally speaking I'm happy to try to preserve as much data as possible, as long as it fits the overall design.

chiragjn · 2020-03-29T13:45:06Z

I saw the PR you mentioned, but that seems it will only work for logs emitted from non-structlog Logger instances, that too only if that function is connected via foreign_pre_chain on ProcessorFormatter

The processor, I posted above, is a structlog processor and works only for logs emitted using BoundLogger which I am okay with, just that I have to use some protected members and the frame fetching is being done twice, once via logging.Logger._log which will put it into LogRecord object and a second time via my processor which will put it into event_dict.

On some further thinking, I think a better design would be to give a pre_render_hook argument to ProcessorFormatter that allows the user to do whatever they like before Serializer/Renderer (Json, KeyValue, Console, whatever) is called

class ProcessorFormatter(logging.Formatter):

    def __init__(self, ..., pre_render_hook: Optional[Callable] = None):
        ...
        self.pre_render_hook = pre_render_hook

    def format(self, record):
        record = logging.makeLogRecord(record.__dict__)
        ...
        if self.pre_render_hook is not None:
            ed = self.pre_render_hook(record=record, event_dict=ed)
        record.msg = self.processor(logger, meth_name, ed)
        return super(ProcessorFormatter, self).format(record)

Then I can easily do

def add_module_and_lineno(record: LogRecord, event_dict: Dict[str, Any]) -> Dict[str, Any]:
    event_dict['module'] = record.module
    event_dict['lineno'] = record.lineno
    return event_dict

LOGGING = {
    'formatters': {
        'json_formatter': {
            '()': structlog.stdlib.ProcessorFormatter,
            'processor': structlog.processors.JSONRenderer(),
            'pre_render_hook': add_module_and_lineno,
        },
    }
}

Well, this can also be achieved anywhere, even in the renderer (self.processor) if it can get access to the LogRecord object

record.msg = self.processor(logger, meth_name, ed, record)

I can easily inherit from structlog.processors.JSONRenderer and override __call__ for my functionality

I am not sure what design would work best here, thoughts on this?

DrPyser · 2020-06-16T19:16:31Z

I'd also really like to exploit information included in LogRecords.

Where/when in the processor chain does the LogRecord first becomes available?
Ideally, all useful information would be extracted from the LogRecord as early as possible, allowing formatting/rendering and other processing to remain agnostic to the underlying logging layer?

I've been trying to use ProcessorFormatter with a custom foreign_pre_chain to extract the LogRecord attributes, but it does not seem to apply to loggers immediately:

def process_log_record(logger, method_name, event_dict, *args, **kwargs):
    record: logging.LogRecord = event_dict.get("_record")
    if record:
        event_dict = dict(event_dict, **{
            "module_name": record.module,
            "function_name": record.funcName,
            "line_number": record.lineno,
            "process_name": record.processName,
            "process_id": record.process,
            "file_name": record.pathname,
            "thread_name": record.threadName,
            "thread_id": record.thread,
        })
    return event_dict


def configure():
    structlog.configure(
        processors=[
            structlog.stdlib.filter_by_level,
            structlog.stdlib.add_logger_name,
            structlog.stdlib.add_log_level,
            structlog.stdlib.PositionalArgumentsFormatter(),
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            structlog.processors.UnicodeDecoder(),
            process_log_record,
            structlog.stdlib.ProcessorFormatter.wrap_for_formatter
        ],
        logger_factory=structlog.stdlib.LoggerFactory(),
        wrapper_class=LoggerClass
    )
    # formatter = structlog.stdlib.ProcessorFormatter(
    #     processor=structlog.dev.ConsoleRenderer
    # )
    logging.config.dictConfig({
        "version": 1,
        "disable_existing_loggers": True,
        "formatters": {
            "dev": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": structlog.dev.ConsoleRenderer(),
                "foreign_pre_chain": [process_log_record]
            }
        },
        "handlers": {
            "default": {
                "()": logging.StreamHandler,
                "level": "DEBUG",
                "formatter": "dev"
            }
        },
        "loggers": {
            "": {
                "handlers": ["default"],
                "level": "DEBUG",
                "propagate": True
            }
        }
    })

configure()
structlog.get_logger().log("Test")

DrPyser · 2020-06-16T19:39:44Z

The first logging calls seemed to be handled by a _FixedFindCallerLogger instance.

chiragjn · 2020-06-17T04:57:36Z

@DrPyser
You'll get _record only when using a logging.Logger instance with ProcessFormatter

structlog/src/structlog/stdlib.py

Line 514 in db1e529

ed = {"event": record.getMessage(), "_record": record}

structlog/src/structlog/stdlib.py

Lines 537 to 538 in db1e529

    
           for proc in self.foreign_pre_chain or (): 
        
               ed = proc(logger, meth_name, ed)

You don't get access to LogRecord instance in processors with structlog.stdlib.BoundLogger because the record instance is only made after calling all the processors, in fact, ProcessFormatter.wrap_for_formatter, the last processor in the chain, decides the args and kwargs to pass to LogRecord.__init__

structlog/src/structlog/_base.py

Lines 186 to 187 in db1e529

    
           args, kw = self._process_event(method_name, event, event_kw) 
        
           return getattr(self._logger, method_name)(*args, **kw)

Hence, the feature request :)

Test your example with a logging.Logger instance, it should most likely work

logging.get_logger().log("Test")

neara · 2020-06-25T15:25:28Z

Would absolutely love to have this feature.
I ended up manually extracting the params:

import inspect
import os
import threading

import structlog._frames


def add_standard_logger_params(_logger, _method_name, event_dict):
    record = event_dict.get("_record")
    if record:
        event_dict["funcName"] = record.funcName
        event_dict["thread"] = record.thread
        event_dict["pathname"] = record.pathname
        event_dict["lineno"] = record.lineno
    else:
        frame, _module_str = structlog._frames._find_first_app_frame_and_name(
            additional_ignores=[__name__]
        )
        frame_info = inspect.getframeinfo(frame)
        event_dict["funcName"] = frame_info.function
        event_dict["thread"] = threading.get_ident()
        event_dict["pathname"] = frame_info.filename
        event_dict["lineno"] = frame_info.lineno

    event_dict["process"] = os.getpid()
    return event_dict

But i would love to skip the manual extraction of these parameters and rely on std logging already doing the job.

hynek · 2020-06-26T13:08:40Z

Well, TBH we'll probably add just something like that to structlog. :) The question here is how much control do we need to give the user? Is it OK to always dump all info into the context? I'd rather not have huge if-then-else trees.

chiragjn · 2020-06-26T14:59:48Z

@hynek Any thoughts on the idea in #253 (comment) ?
Just like processors allow the user access to the event_dict and the LogRecord in a callable and let them do what they want before the actual rendering happens.

Personally I think, if the LogRecord was created before running structlog processors then maybe things would simpler because then processors can access to the LogRecord instance. But that level of changes would be massive. I am sure there must be reasons why things came to be the way they are

logging.Logger -> LogRecord.__init__ -> structlog processors -> renderer
structlog.stdlib.BoundLogger -> structlog processors -> LogRecord.__init__ -> renderer

neara · 2020-06-28T13:13:27Z

Well, TBH we'll probably add just something like that to structlog. :) The question here is how much control do we need to give the user? Is it OK to always dump all info into the context? I'd rather not have huge if-then-else trees.

I think best case scenario, if it's possible, would be to provide access to LogRecord and let user decide what info should be in the event_dict via custom processor, defined in structlog.configure. It feels a little bit more intuitive to me. But then it will probably cause double initialization of log record (in structlog.stdlib.ProcessorFormatter). I might be wrong about this.

Maybe easiest would be to add a pre_render_chain in structlog.stdlib.ProcessorFormatter and applying it on logger, method_name, event_dict before calling self.processor(...) (similar to how foreign_pre_chain is applied).

This was the way I was thinking to solve this without touching the library.
On the negative side, this is a bit out of order as to why define processors in 2 different places and when to use what.

pybuche · 2020-11-02T16:32:01Z

@hynek looks like @neara proposal could be implemented straight away in the stdlib.py module.
Is there any blocker not to do it?

hynek · 2020-11-04T12:08:14Z

gestures at everything 😞

But mostly because I was preoccupied with attrs lately and couldn't justify spending significant FOSS time on structlog, since it's not part of Tidelift yet (= not enough companies would subscribe = no income).

hynek · 2021-11-11T08:01:06Z

Hi everyone, sorry very much for for the long silence. I did keep having this on my mind and felt sufficiently guilty about it. Given structlog's modularity, I hope everyone was able to find a solution on their own anyways.

Now, I've dived into extracting data from LogRecords and it's…adventurous but doable: #209 (comment)

Now, after re-re-re-re-re-re-reading this thread, my understanding is that the problem is that LogRecords that have been created by wrap_for_formatter are not processed by a chain – correct?

While a pre_render_hook would solve the problem, I suspect it would fit better structlog's design to give it a proper chain. It could be even called pre_chain since it's not decidedly not foreign. We could also add something like record._from_structlog = True that gets deleted at the end to allow for processor re-use.

Does any of this make sense?

hynek · 2021-11-11T08:26:26Z

OK after looking at it, related but better idea:

processor becomes processors and record gets _record and _from_structlog attached.

This separates the pre_chain that has legit special stuff to deal with and "let's extract data from a LogRecord" that's useful for everybody.

hynek · 2021-11-11T09:13:05Z

I have implemented this idea in #365 – please let me know what you think.

hynek mentioned this issue Jun 25, 2020

Implement add_function_name() processor in stdlib module #246

Closed

6 tasks

hynek mentioned this issue Jan 7, 2021

New release (20.2) has a typing issue #290

Closed

hynek mentioned this issue Nov 11, 2021

stdlib/ProcessorFormatter: allow for complete chains on top of records #365

Merged

hynek closed this as completed in #365 Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature-Request] Allow copying attrs from stdlib LogRecord into structlog's msg #253

[Feature-Request] Allow copying attrs from stdlib LogRecord into structlog's msg #253

chiragjn commented Mar 21, 2020 •

edited

Loading

chiragjn commented Mar 22, 2020

hynek commented Mar 25, 2020

chiragjn commented Mar 25, 2020 •

edited

Loading

hynek commented Mar 28, 2020

chiragjn commented Mar 29, 2020 •

edited

Loading

DrPyser commented Jun 16, 2020

DrPyser commented Jun 16, 2020

chiragjn commented Jun 17, 2020

neara commented Jun 25, 2020

hynek commented Jun 26, 2020

chiragjn commented Jun 26, 2020 •

edited

Loading

neara commented Jun 28, 2020

pybuche commented Nov 2, 2020

hynek commented Nov 4, 2020

hynek commented Nov 11, 2021

hynek commented Nov 11, 2021

hynek commented Nov 11, 2021

[Feature-Request] Allow copying attrs from stdlib LogRecord into structlog's msg #253

[Feature-Request] Allow copying attrs from stdlib LogRecord into structlog's msg #253

Comments

chiragjn commented Mar 21, 2020 • edited Loading

chiragjn commented Mar 22, 2020

hynek commented Mar 25, 2020

chiragjn commented Mar 25, 2020 • edited Loading

hynek commented Mar 28, 2020

chiragjn commented Mar 29, 2020 • edited Loading

DrPyser commented Jun 16, 2020

DrPyser commented Jun 16, 2020

chiragjn commented Jun 17, 2020

neara commented Jun 25, 2020

hynek commented Jun 26, 2020

chiragjn commented Jun 26, 2020 • edited Loading

neara commented Jun 28, 2020

pybuche commented Nov 2, 2020

hynek commented Nov 4, 2020

hynek commented Nov 11, 2021

hynek commented Nov 11, 2021

hynek commented Nov 11, 2021

chiragjn commented Mar 21, 2020 •

edited

Loading

chiragjn commented Mar 25, 2020 •

edited

Loading

chiragjn commented Mar 29, 2020 •

edited

Loading

chiragjn commented Jun 26, 2020 •

edited

Loading