-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Structured Logging #3
Comments
Need to consider a couple of things:
|
So basically the problem is this:
Therefore any usage by a computer should always use |
One additional issue is the "level of verbosity". This is basically controlled by Right now verbosity only controls Something like:
A similar situation would apply to errors being thrown over the GRPC boundary as discussed here MatrixAI/Polykey#249. I reckon something can be automated if we integrate the design of This would mean something like catch a local agent error, write that out to STDERR on the agent on the RPC handler, then create a new error to be propagated to the GRPC client. We have to separate client errors from server errors like in HTTP where there is 4xx errors vs 5xx errors. 5xx errors would be caught by the exit handler, and put to STDERR. 4xx errors are not, they would be emitted by the RPC handler. We would only log them out to STDERR if the RPC handler handled the error, and also that this is a problem for the agent itself. |
Node v16 should be reporting the error chain nicely now. |
@emmacasolin FYI this issue addresses the idea of how the exceptions (including the error chaining) will eventually be displayed on the CLI or STDERR logging. |
Note that the cause property is actually an ES2022 feature, but nodev16 does support it, but because we haven't fully updated our standards to it, we will forget that it actually exists on the standard errors. |
Any kind of structured logging will be intended for machine consumption and filtering. That's because any logging of server errors may contain a long stack string, and that information is likely to be quite noisy on basic log readers. |
@emmacasolin this would still be needed to ensure that |
Some thoughts... We've been hitting various performance and other network buggy issues which are hard to debug because of concurrency and that asynchronous work are happening out of order. I think we need to improve our structured logging to provide a better observability into the runtime of our program. To do this, I'm thinking that we need to incorporate some new ideas:
Point is flamegraph was only invented by Brendan Gregg back in 2013/2014. But the same idea can be applied to view an entire trace of the entire distributed system. But also works for a local system too. But the tooling isn't really solid though and seems heavyweight for local debugging. So something inbetween might be useful. |
Some clarification on terminology. There's a "standard" for structured logging as exemplified by https://www.structlog.org/. In that structure, it is not using JSON, but using a human readable structure. This issue is more about JSON as a structured logging format. I believe the human readable structure as demonstrated by structlog is a good idea, and we can move our logging format to fit that format eventually. But JSON logging is simply more powerful for any machine analysis of logs. However atm, I'm not sure what the deal is with information like trace #4, and how structlog would handle information like that. It's not human readable to have information like that on one single line (which is the default for most loggers), it's simply too much information. And if we want to preserve human readability, newlines/tabs/columns on the terminal must be used. The question is, is it really possible to enable easy machine parseability while still using whitespace as a formatting operator for human readability on the terminal? Maybe the goals are in conflict. Some more info here: hynek/structlog#407 I think we should go as far as we can to maintain structured logging for both human readability AND machine parseability. In #4 I suggested that if we ever do need to use literal newlines or whitespace operator for formatting, it must always be done with some indentation. That way, one can say that any newline that has But it's a bit more complicated than that. Because you need a "lookbehind" to say that whatever is before the So this should be fine as that terminates the message:
But this should not be considered an end to the message
But this would get confused with:
Where it is 2 messages. So if there are spaces after Then machines should be able to parse individual messages. |
However do note that such usage of newlines are unlikely to be parseable easily, since it depends on logging analysers that know that this can occur. I reckon that it should be possible to code a custom parser for lnav that understands such a formatter though. However it looks like it still isn't entirely a solved problem: tstack/lnav#410 (multi-line matching). This further shows that any usage of multiple lines for a log can only really be usable on the terminal and custom log parsers. But at that point the JSON structure should be used instead. |
@tegefaulkes @emmacasolin follow this for more structured logging research. |
Also I noticed that structlog talks about "context", that almost sounds like opentelemetry's idea of spans. My opinion is that "context" can be done by hooking into It does seem that decorators would be the most expressive way of doing this, as this is basically a sort of AOP (aspect oriented programming) cross-cutting concern. We would want to avoid having to add lots of line-noise, but still be able to drop down to imperative logging for flexible control in edge cases where a decorator just doesn't work. |
Because PK is a decentralised system, we do have calls that propagate from 1 agent to another agent. Debugging this is crucial, but not necessarily possible in production unless we control all the parties to the interaction. Right now there's a bug in the NAT testing PR where signalling involves agent 1 asking the seed node to ask agent 2 to reverse hole punch. If there's a delay anywhere here, we end up with timeout errors. |
Structured graph visualisation with contexts/spans/traces. I think we take a page from the git graph, and use things like: https://github.com/nicoespeon/gitgraph.js. Along with the |
To solve this issue, I'm just going to incorporate json as formatting style. In terms of the actual data being logged, it'd be similar to what is already available under our formatting. I might incorporate similar keywords to what is already available to align. But the contexts and spans would have to be done in a separate issue. A JSON format would be one step towards producing traces JSON data that can then be analysed and visualised. There is a relationship between this and "state" simulacra, but somewhat independent. State simulacra is generally specific to the state being modelled, while tracing is very generic and applies to any contextful based systems. |
@emmacasolin please change up PK to start using the |
Specification
Structured logging is to use JSON on the STDERR logs. This exposes more of the data, and allows us to create a standard for the set of fields to be used.
There's no real official standard for structured logging. But journalctl and syslog defines some common fields. The fields can be similar to our
formatting
that we already use, but this time in a JSON format. Additional data fields can be encoded too.We could put a
json
formatting in somewhere that provides the JSON serialisation.This can be especially useful since we are also looking into adding error chaining, which we have done in a few places. But by creating our own
@matrixai/error
we can implement error chaining. MatrixAI/Polykey#304Additional context
data
which required manual serialisationTasks
The text was updated successfully, but these errors were encountered: