Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerous improvements on logging and metrics #461

Merged
merged 11 commits into from
May 31, 2023
Merged

Conversation

robklg
Copy link
Contributor

@robklg robklg commented May 28, 2023

Fixes #420 (omit multi line output in the logs)
Fixes part one of #81 (although was already handled mostly by Werner)

And lots of other things I came across:

Logging:

  • Improved mappings from storage errors to FTP errors
  • Moved many log messages from INFO to DEBUG level
  • Introduced more user friendly log messages at INFO level
  • Improved the log messages for all commands, making sure the log entries are clear and 'actionable'
  • Introduced nice additional information on transfers (duration, size and transfer speed)
  • Added some error logs when there were none

Metrics:

  • Added a specific counter metric for data transfer commands, with a verdict on whether it's a server or client error
  • Added in-flight count on transferred bytes (also labeled by command) so that we can calculate average transfer speed

Some bug fixes:

  • LIST and NLST weren't returning replies correctly
  • RETR was sending no Reply on the Control channel when there was no Data connection, causing the client to hang
  • Bug with REST (restart) command not working correctly

@robklg
Copy link
Contributor Author

robklg commented May 28, 2023

I will break this up into multiple commits and do a thorough check myself, so no need to review yet.

@robklg robklg force-pushed the robby/logging-and-metrics branch from f4f87dd to 77a5770 Compare May 28, 2023 20:39
@robklg robklg marked this pull request as ready for review May 28, 2023 20:40
@robklg robklg force-pushed the robby/logging-and-metrics branch from dc2e846 to fae5e36 Compare May 29, 2023 19:06
@robklg robklg requested a review from hannesdejager May 30, 2023 19:13
Copy link
Collaborator

@hannesdejager hannesdejager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few suggestions but in general I'm happy. Thanks this is super cool changes.

robklg added 10 commits May 31, 2023 22:18
When RETR can't find the data connection, it must Reply or the client will get stuck.
Part of effort to make INFO logging more useful.

- Introduced new log lines, oriented to the end user
- Include the path where appropriate
- Added a command prefix to most log messages, to clarify the context
Particularly the STAT <path> command can sometimes pollute the log.
We don't want to dump directory listings into the log stream.

In this commit I've totally stripped the multiline output (only indicating the number lines for information). I didn't
really see the need for outputting any of the other multilines either.
When the client couldn't connect to GCS over HTTP, it would return a permanent error.
We should let the client retry, so now we return a transient error instead.
We were not correctly mapping a bunch of errors. Most of these could lead to returning retryable LocalError's to the
client, while they are not retryable.
Added/updated metrics:

- Total session count (note: counter instead of the existing gauge)
- Added total transfers (ftp_transferred_total), it also collapses all errors to a client-error, server-error or permission-error for use in Service Level measurements
- Added sent/received bytes (ftp_sent_bytes, ftp_received_bytes) for all transfers (list, retr, stor) updated 'in flight' for average speed calculation of file and directory listing transfers. (This is implemented with a custom MeasuringWriter)
- Clarified description that backend read byte and file count only count the successful ones

Logging changes:

- Changed ControlChanMiddleware to log events and replies at DEBUG level (better readable INFO messages replace it)
- Log human readable duration, bytes copied and transfer speed for each transfer command (retr, stor and the list commands)
This change makes sure the start position is copied from the session at the moment the RETR or STOR command is
received.
@robklg robklg force-pushed the robby/logging-and-metrics branch from f5edca0 to 8158e1b Compare May 31, 2023 20:20
@robklg robklg force-pushed the robby/logging-and-metrics branch from e7f800d to 82a16cb Compare May 31, 2023 20:34
@robklg robklg merged commit c1c3a73 into master May 31, 2023
@robklg robklg deleted the robby/logging-and-metrics branch May 31, 2023 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't output multi line responses in INFO level log
2 participants