-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error reporting #414
Comments
option 1 seems OK. I understand the appeal of option 2 but it seems to add complexity, with no real benefit - our system is just not that error-complex. I dislike the idea of an automatic reset_lasterror(), except maybe in only one case only: sentence_create(). The idea is that the user could make a whole bunch of API calls, and then check if there was an error, at the end. That error code would only get reset when starting with a new sentence. |
After trying to implement both, option (2) is indeed much more complicated, and has no added functionality. Regarding resetting messages: In order to see that the new error message has the needed functionality, I will try to write a simple GUI parser application. |
An old trick that I've seen with C interfaces is to have a stack of errors, and the user has to pop the stack until empty, to get them all. That way, you can return multiple error conditions, if needed. The user, of course, won't know wexactly where they might have occurred, but that is their fault, not ours. |
My current implementation has a API to get all errors at once, so usual usage doesn't need too much efforts, and also has an API to fetch the messages/errors one by one. It does that by internally concatenating all the messages to one string, and providing a vector of pointers into this strings for fetching individual errors. This is essentially equivalent to the said stacked errors, but in a thought that the usual usage will be to print all the stacked messages at once (because the current library may emit several messages per API call). However, a similar alternate implementation may have just the error stacked, and an API call option to concatenate them. I will try both implementation to see which is simpler and easier to use. |
Hi Linas, I need you comments in order to refine the API. After that I will also implement a Python bindings. Because the said error reporting method implementation is compatible by default to the current way, languages without these bindings will just function as before. I'm not sure about several implementation details. General implementation details:
Instead, I suggest to use a symbolic severity level, i.e. to use something like: Longer names are also possible, like
New API functions:
link_public_api(linkgrammar_error_handler)
linkgrammar_set_error_handler(linkgrammar_error_handler, linkgrammar_error_data *);
link_public_api(char *)
linkgrammar_format_error(linkgrammar_error *lge);
link_public_api(int)
linkgrammar_print_errors(linkgrammar_error_handler, linkgrammar_error_data *);
link_public_api(int)
linkgrammar_clear_errors(void); New API data definitions:
typedef enum
{
Fatal = 1,
Error,
Warn,
Info,
Debug,
None
} linkgrammar_error_severity;
typedef struct linkgrammar_error {
/* err_ctxt ec; */
linkgrammar_error_severity sev;
const char *msg;
} linkgrammar_error;
typedef int (*linkgrammar_error_handler)(linkgrammar_error *, linkgrammar_error_data *);
typedef struct linkgrammar_error_data {
linkgrammar_error_severity stdout_sev;
void *data;
} linkgrammar_error_data;
const char *linkgrammar_error_level[] =
{
"Fatal", "Error", "Warning", "Info", "Debug", "None", NULL
}; |
To have shorter names, I think its ok to just use I do NOT like the If there is an error, then there needs to be some way of passing useful information about it: e.g. if there is a bug in a dictionary, then the error message needs to say "error at line number NN in dictionary DD" where DD is either a file-name, or we pass a pointer to So, the original |
As usual, I guess I was not clear enough. I will try to clarify it:
(BTW, if desired, it is easy to transparently add a source-code function, file and line for the originating message. This can be added later with a local change in the error functions.)
Think of an editor like Hence To sum up:
Its only component that is "not minimal" and unused for now is the error stack. I didn't find a good use for it, since the callback function method (along with its API-user data per function) seems to me flexible enough for all purposes. BTW, I used
However, I see now that there are already many corpus/sence/cluster functions that start with |
oh OK,.. then why not have just |
Initially I did it this way. However, I wanted to consolidate the Debug/Trace facility with the error facility, and to provide a way to have the Info/Debug/Trace messages (and even the Warning messages, which are essentially informational in the LG library) output by default on stdout so they line-up with the results, while the real error messages (Error/fatal) still output in the traditional manner to (EDIT) stderr. The above is the current default behavior, but if the API user would like to change it, to that end I added the Some other questions remain that I need to resolve (for details see my initial message from today).
|
On Tue, Nov 8, 2016 at 7:22 PM, Amir Plivatsky notifications@github.com
? But the callback already gets the error severity level in the first argument -- why would it needs a second copy?
? where is that?
numerically is usually easiest, that way, you can just do a I'm going for the "principle of least surprise" -- this API should resemble other popular C api's as much as possible, so that new programmers can just look at it, and guess what it does, without having to carefully study it. That's why I like the I think that almost everyone uses enums for error levels. I don't really care if the enums are strings or ints. I don't really know what "best practices" are for C code error reporting these days. I sort-of knew what they were 20 years ago, but lost touch.
since its now in the public api, I guess that would be needed...
sure. you don't have to do that, but it is commonly done...
Yes! they should be combined. Again -- "best practices", "principle of least surprise", and just the right amount of "keep it simple". I think Einstein said "as simple as possible, but no simpler".
Yes, this seems to be typical. |
The callback function is called as following:
It is currently defined as static function, and not directly provided in the API.
An integer value is especifically good for easy > or < comparisons.
Maybe something very complex like in libssl, which also allows error stack manipulation and fine-grain error formatting, or more simple like in libvirt. A good error facility may even allow setting error handler per error type, with inheritance etc. I think all of that is too complex for our simple needs so I tried to implement a minimal error facility (the error stack doesn't belong to a minimal implementation and I'm not sure how much it is useful, but the code for it is small and already implemented). (While mentioning an error type (above) - I thought that maybe an error type member can be added to So a remaining question is whether |
The idea of issue #420 is an API to the results of the tokenizer. I would like to implement it just after new error handling facility. I also had several proposals of how to define the whole tokenizer operation tokenizer in a definition file - which is a kind of API to define a tokenizer (one idea is to use regexp-like definitions - partially implemented - differently than the demo in the current code). Another idea, which may be very appealing, is to define the tokens type order by LG grammar. After all, which token type can come after which one is a kind of grammar. |
OK, I understand what you are doing, now. I guess its OK.
Yeah, that's probably needed. |
Hi Linas, Here is the current status of the new error handling code:
Current C API:
Also the following is in the public API:
Regarding Current Python API: I also created Python bindings for the new error facility, include a test case (which is not simple).
I started to write A C GUI (GTK+ based, with a minimal change in the current link-parser program) that needs an error facility like the one I added, and can serve as a test if it is adequate for the job. No other language bindings yet, but there is nothing to hurry since the new error facility is totally compatible to the current code (absolutely no changes in link-parser). Internally (defined in
TLS is defined by |
I completed these change (improve error reporting). Among the fixes:
Features of the improve error reporting facility:
I propose the following line in ChangeLog:
I also can add a file with summarizes the internal API of the new error facility (a short version of my message above). I would like to send this change as PR right away, because I already had to handle several merge conflicts due to you recent changes. (The strangest one was on Changelog, which I haven't changed yet). P.S. |
Hi Lians,
The discussion started at PR #400. Here is the relevant section:
In order that the library will be able to get embedded seamlessly in applications (like grammar checkers or GUIs) it should not print such error messages, but instead only provide an error indication, and put the error in a variable (a per-thread variable if multi-threaded), to be retrieved by a new function link-grammar_error(). Such an arrangement is found in other libraries.
I thought of two different ways to implement such a scheme:
There will be several function to retrieve the errors, one per object (or one function can be used if the "messages" member is always the first one in the corresponding structure).
But option (2) has a problem: If the dictionary couldn't be opened, there is no object in which to store the error due to which it couldn't be opened (there can be many). My proposed solution for that is use NULL as the object when the returned dict is NULL (and in that case retrieve the error from a global variable).
Another thing to note is that most of the API functions, including each of the parse_option_get_*() functions, will have to start with a call to reset_lasterror(), so errors will not get accumulated over API calls - there can be several error messages from the same API call, so they are allowed to be accumulated per API call, and to be returned at once by get_linkgrammar_lasterror().
I already implemented most of (1), but then thought again on (2), so I need you guidance regarding what to actually implement.
Since I intend to initially make it completely compatible with the current way of error reporting (to the extent that no change will be needed in existing applications) there is no need to do things totally the "right" way from the start. We can just declare the new error reporting way as "experimental" for several releases until it get stabilized.
The text was updated successfully, but these errors were encountered: