-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SysLog Support for Incremental Reads #956
Comments
Have you identified a race condition in the existing ES syslog implementation here? If so I think that warrants a separate bug ticket for investigation. |
My general recommendation for a persistent syslog is to leverage a PSP-based solution if possible, to write all Note that CFE ES APIs are generally governed by the CFE requirements document, so new functionality and features at this level will require a lot more process to implement (stakeholder buy-in, requirements, rationale/use cases, unit tests, etc). Not that it can't be done .... but solutions that can be localized on a platform via PSP are much easier to implement in this regard. |
I believe this additional function is covered under the same requirement that drives the CFE_ES_SysLogReadStart_Unsync function, as it is a sister function. The PSP solution would also lose any cFS logs/events only only capture my application, which is not desirable for forensics. Besides I would rather leverage all the work and coding that has been put into cFS as apposed to doing a one-off solution . |
I'm not sure it is a bug with the code so much as it is a bug with the model. The code takes out a lock on the shared data and copies important information into the local buffer, so that is good. But, per the model, the only way to know what has been read and what has not been read is to call SysLogClear after all the Reads. That interim time between the initial call to ReadStart and the subsequent SysLogClear could have calls to Append or Write (in a multi-threaded system) and those logs will get lost on the call to SysLogClear. The only way to stop the loss of logs on a multi-threaded system in the current model is to hold the lock during the entire process of reading the logs out and block all loggers. The model is fine for a single threaded system, or one that can tolerate blocking of all threads. |
Moved into comments section of Issue 956 so we aren’t duplicating effort 😊
Jon Sjoberg
EO Vista
42 Nagog Park, Acton, MA 01720
Cell: 978-857-9314
From: Joseph Hickey <notifications@github.com>
Sent: Tuesday, October 20, 2020 12:49 PM
To: nasa/cFE <cFE@noreply.github.com>
Cc: Jon Sjoberg <jon.sjoberg@eovista.com>; Author <author@noreply.github.com>
Subject: [External]Re: [nasa/cFE] SysLog Support for Incremental Reads (#956)
[External]
My general recommendation for a persistent syslog is to leverage a PSP-based solution if possible, to write all OS_printf() information into whatever form of persistent storage that your platform offers that is best suited for the task. For instance on Linux these can be piped to the system log daemon - which present it in conjunction with all other system activity (useful) and also should automatically compress/archive old logs per its configuration, or even can be sent off-board to an external archive. This may require some mods to the PSP/OSAL to implement, but it really isn't complicated at all to do.
Note that CFE ES APIs are generally governed by the CFE requirements document, so new functionality and features at this level will require a lot more process to implement (stakeholder buy-in, requirements, rationale/use cases, unit tests, etc). Not that it can't be done .... but solutions that can be localized on a platform via PSP are much easier to implement in this regard.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://r.xdref.com/?id=09KGnLhZ001620&from=noreply@github.com&to=jon.sjoberg@eovista.com&url=https://github.com/nasa/cFE/issues/956#issuecomment-712998215>, or unsubscribe<https://r.xdref.com/?id=09KGnLhZ001620&from=noreply@github.com&to=jon.sjoberg@eovista.com&url=https://github.com/notifications/unsubscribe-auth/AROK7IPSDGGZNPJ737JO4VDSLW5RBANCNFSM4SYOUQ7A>.
[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "#956 (comment)", "url": "#956 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
|
Yes - I do concur that the with the way the "Clear" is actually a separate command from the "Write" that there is an opportunity to miss any messages that get written between these two operations. I would consider this a bug - whether its a bug in the model or code, it's still a bug - as we the should at least have an option for an atomic (at least WRT to the log) dump+clear command to avoid this hole. Should write a separate issue about this, so we can decide if that window is worth closing or not. |
No, I'm not talking about making a new PSP function, I'm talking about redirecting all output done via |
Is this intended for within the context of an App? If so I second Joe's recommendation of taking a look at using events (specifically debug events). They provide both reporting via telemetry and recording to file, along with filtering, support of event types, capability to trigger other actions since they are a message, etc. Syslog is more focused on either before event services is up and running, before the app has registered with event services, or outside the context of an app. |
Sorry, by "persistent and accessible" I meant terrestrial storage. |
If you do consider using events, the Data Store (DS) app can write the events to files, File Storage (FS) helps with file manipulation, CCSDS File Transfer Protocol App (CF) helps in moving files off the target. Typically cmd/tlm utilities record events sent also if you don't want to store on the target, with tlm being store and foward typically when not available (so no messages lost). |
In truth this is as a bug against CFE_ES_SysLogReadStart_Unsync. CFE_ES_SysLogReadStart_Unsync works fine when using LogMode_DISCARD, however in LogMode_OVERWRITE it is wrong to base SizeLeft on EndIdx, since the actual size is based on the wrapping WriteIdx. Downside of this view is it causes changes to currently working and fielded code, but it is The Right Answer. |
Could you provide the steps to create the bug you are seeing, and actual output showing the issue? SizeLeft based on EndIdx (minus the mangled message part that gets subtracted) seems right to me since it's the "high water mark" even after the wrap. As in there's SizeLeft number of bytes in the log that need to be output (which is true before or after the wrap, as long as the mangled part is subtracted). Note we do functionally test this and errors haven't been reported, so either we are missing a scenario (which happens) or it's working as intended. |
Sorry, I'm not currently in a position to show output. The steps to reproduce are:
Since the buffer is an overwrite/wrapping buffer, the size left should be the number of un-read logs not the total buffer size. |
Oh, that the current expected behavior if you don't clear the log (it's a high water mark, not new messages since last read), and if you request another dump it'll do the 30 new ones plus the old 2. Although it's actually an index, not number of entries but that's a different topic. So this request sounds more like an enhancement to change how the log works vs a "bug". Thanks for the background. Your use case is different than the classic use case of a dump being the entire log every time. |
Another side note - it defaults into overwrite for power on reset and discard for warm. Even though it's set to discard, it'll continue writing over the old messages until hitting the end of the buffer. That's probably less than desirable since the point of changing to discard is to maintain the old messages so they can be dumped. |
The concept is to use logs for forensic analysis, so yes as debug, but not as an exception more as always there. Problem is, by the time you know you want a log it will be gone if you don’t get it to a persistent, accessible storage place.
Algorithm I’m trying to support is:
On Timer:
The current API supports reading all the logs in the buffer at once and then clearing the logs (with the race condition that any logs that occur during the reading out will be lost, OR need to lock out all loggers during the read which is not acceptable).
The proposed update to the API is the addition of a single function that instead of setting the Buffer size based of EndIdx, sets the Buffer size based on the difference between the Buffer->LastOffset passed in and the writeIdx. This is a traditional circular buffer interface and is well supported within the code as it stands, just needs an API to initialize the Buffer correctly.
Advantages are:
The text was updated successfully, but these errors were encountered: