Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SysLog Support for Incremental Reads #956

Open
jsjoberg01 opened this issue Oct 20, 2020 · 15 comments
Open

SysLog Support for Incremental Reads #956

jsjoberg01 opened this issue Oct 20, 2020 · 15 comments

Comments

@jsjoberg01
Copy link

The concept is to use logs for forensic analysis, so yes as debug, but not as an exception more as always there. Problem is, by the time you know you want a log it will be gone if you don’t get it to a persistent, accessible storage place.

Algorithm I’m trying to support is:
On Timer:

  • Extract un-read logs
  • If enough logs have been extracted, package and send

The current API supports reading all the logs in the buffer at once and then clearing the logs (with the race condition that any logs that occur during the reading out will be lost, OR need to lock out all loggers during the read which is not acceptable).

The proposed update to the API is the addition of a single function that instead of setting the Buffer size based of EndIdx, sets the Buffer size based on the difference between the Buffer->LastOffset passed in and the writeIdx. This is a traditional circular buffer interface and is well supported within the code as it stands, just needs an API to initialize the Buffer correctly.

Advantages are:

  • Traditional circular buffer interface so supports a well established model
  • Allows readings of logs without loss of logs or undesirable locking
  • No impact to current code, so introduces no risk to current systems
@jphickey
Copy link
Contributor

The current API supports reading all the logs in the buffer at once and then clearing the logs (with the race condition that any logs that occur during the reading out will be lost, OR need to lock out all loggers during the read which is not acceptable).

Have you identified a race condition in the existing ES syslog implementation here? If so I think that warrants a separate bug ticket for investigation.

@jphickey
Copy link
Contributor

My general recommendation for a persistent syslog is to leverage a PSP-based solution if possible, to write all OS_printf() information into whatever form of persistent storage that your platform offers that is best suited for the task. For instance on Linux these can be piped to the system log daemon - which present it in conjunction with all other system activity (useful) and also should automatically compress/archive old logs per its configuration, or even can be sent off-board to an external archive. This may require some mods to the PSP/OSAL to implement, but it really isn't complicated at all to do.

Note that CFE ES APIs are generally governed by the CFE requirements document, so new functionality and features at this level will require a lot more process to implement (stakeholder buy-in, requirements, rationale/use cases, unit tests, etc). Not that it can't be done .... but solutions that can be localized on a platform via PSP are much easier to implement in this regard.

@jsjoberg01
Copy link
Author

I believe this additional function is covered under the same requirement that drives the CFE_ES_SysLogReadStart_Unsync function, as it is a sister function.

The PSP solution would also lose any cFS logs/events only only capture my application, which is not desirable for forensics. Besides I would rather leverage all the work and coding that has been put into cFS as apposed to doing a one-off solution .

@jsjoberg01
Copy link
Author

The current API supports reading all the logs in the buffer at once and then clearing the logs (with the race condition that any logs that occur during the reading out will be lost, OR need to lock out all loggers during the read which is not acceptable).

Have you identified a race condition in the existing ES syslog implementation here? If so I think that warrants a separate bug ticket for investigation.

I'm not sure it is a bug with the code so much as it is a bug with the model. The code takes out a lock on the shared data and copies important information into the local buffer, so that is good. But, per the model, the only way to know what has been read and what has not been read is to call SysLogClear after all the Reads. That interim time between the initial call to ReadStart and the subsequent SysLogClear could have calls to Append or Write (in a multi-threaded system) and those logs will get lost on the call to SysLogClear. The only way to stop the loss of logs on a multi-threaded system in the current model is to hold the lock during the entire process of reading the logs out and block all loggers.

The model is fine for a single threaded system, or one that can tolerate blocking of all threads.

@jsjoberg01
Copy link
Author

jsjoberg01 commented Oct 20, 2020 via email

@jphickey
Copy link
Contributor

Yes - I do concur that the with the way the "Clear" is actually a separate command from the "Write" that there is an opportunity to miss any messages that get written between these two operations.

I would consider this a bug - whether its a bug in the model or code, it's still a bug - as we the should at least have an option for an atomic (at least WRT to the log) dump+clear command to avoid this hole. Should write a separate issue about this, so we can decide if that window is worth closing or not.

@jphickey
Copy link
Contributor

The PSP solution would also lose any cFS logs/events only only capture my application, which is not desirable for forensics. Besides I would rather leverage all the work and coding that has been put into cFS as apposed to doing a one-off solution .

No, I'm not talking about making a new PSP function, I'm talking about redirecting all output done via OS_printf() - which includes syslogs - to the destination of your choice. This is fairly easy, and it will get you all log messages, and doesn't require modification of ES to do so. You would still be using all of ES log functions as they are - just adding a platform-specific "tee" to also write all your logs to persistent storage, not just the console.

@skliper
Copy link
Contributor

skliper commented Oct 20, 2020

Is this intended for within the context of an App? If so I second Joe's recommendation of taking a look at using events (specifically debug events). They provide both reporting via telemetry and recording to file, along with filtering, support of event types, capability to trigger other actions since they are a message, etc. Syslog is more focused on either before event services is up and running, before the app has registered with event services, or outside the context of an app.

@jsjoberg01
Copy link
Author

" just adding a platform-specific "tee" to also write all your logs to persistent storage, not just the console."

Sorry, by "persistent and accessible" I meant terrestrial storage.

@skliper
Copy link
Contributor

skliper commented Oct 20, 2020

If you do consider using events, the Data Store (DS) app can write the events to files, File Storage (FS) helps with file manipulation, CCSDS File Transfer Protocol App (CF) helps in moving files off the target. Typically cmd/tlm utilities record events sent also if you don't want to store on the target, with tlm being store and foward typically when not available (so no messages lost).

@jsjoberg01
Copy link
Author

In truth this is as a bug against CFE_ES_SysLogReadStart_Unsync. CFE_ES_SysLogReadStart_Unsync works fine when using LogMode_DISCARD, however in LogMode_OVERWRITE it is wrong to base SizeLeft on EndIdx, since the actual size is based on the wrapping WriteIdx. Downside of this view is it causes changes to currently working and fielded code, but it is The Right Answer.

@skliper skliper added the bug label Jan 19, 2021
@skliper skliper added this to the 7.0.0 milestone Jan 19, 2021
@skliper
Copy link
Contributor

skliper commented Jan 20, 2021

In truth this is as a bug against CFE_ES_SysLogReadStart_Unsync. CFE_ES_SysLogReadStart_Unsync works fine when using LogMode_DISCARD, however in LogMode_OVERWRITE it is wrong to base SizeLeft on EndIdx, since the actual size is based on the wrapping WriteIdx. Downside of this view is it causes changes to currently working and fielded code, but it is The Right Answer.

Could you provide the steps to create the bug you are seeing, and actual output showing the issue? SizeLeft based on EndIdx (minus the mangled message part that gets subtracted) seems right to me since it's the "high water mark" even after the wrap. As in there's SizeLeft number of bytes in the log that need to be output (which is true before or after the wrap, as long as the mangled part is subtracted). Note we do functionally test this and errors haven't been reported, so either we are missing a scenario (which happens) or it's working as intended.

@skliper skliper removed this from the 7.0.0 milestone Jan 20, 2021
@jsjoberg01
Copy link
Author

Sorry, I'm not currently in a position to show output.

The steps to reproduce are:

  • Write enough logs to wrap the log buffer (ex. if your buffer supports 32 entries, write 30, read 30, write another 30)
  • Check the SizeLeft, it will be 32, not 30, even though there is only 30 logs "left" since the last read.

Since the buffer is an overwrite/wrapping buffer, the size left should be the number of un-read logs not the total buffer size.

@skliper
Copy link
Contributor

skliper commented Jan 21, 2021

Oh, that the current expected behavior if you don't clear the log (it's a high water mark, not new messages since last read), and if you request another dump it'll do the 30 new ones plus the old 2. Although it's actually an index, not number of entries but that's a different topic. So this request sounds more like an enhancement to change how the log works vs a "bug". Thanks for the background. Your use case is different than the classic use case of a dump being the entire log every time.

@skliper skliper added enhancement and removed bug labels Jan 21, 2021
@skliper
Copy link
Contributor

skliper commented Jan 21, 2021

Another side note - it defaults into overwrite for power on reset and discard for warm. Even though it's set to discard, it'll continue writing over the old messages until hitting the end of the buffer. That's probably less than desirable since the point of changing to discard is to maintain the old messages so they can be dumped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants