Refactor and generalize warning system to message system #338

sim642 · 2021-08-31T16:08:14Z

This PR refactors #255 further.

Changes

Replace string-based warnings_table with record-based Messages.Table which keeps all metadata.
Generalize from warnings to messages with different meaning and severity (error, warning, info, debug, success).
Remove may/must certainty and use warning/error respectively instead.
Replace warning type with a list of tags: category (previously warning type) or CWE.
Make message text mandatory, remove inconvenient () argument.
Replace string argument with Pretty format. This makes it much nicer to insert data into message text, a la Printf and the existing trace functions.
Remove warn, which used to contain location but deduplicate without it, and replace it with warn_noloc, which doesn't contain location at all.
Rename warn_each, which has current location, to warn as the most common case.
Add context information to all messages (if possible) via the global Messages.current_context (a la Tracing.current_loc etc). These context are completely abstract (Obj.t) though, so they're not very useful with their integer (hash) representation, but that's a separate more general issue.
Removes unused tracking (not tracing!) mechanism.
Removes unsound bailwith/Bailure.
Reimplement message printing using Format and its semantic tags for terminal colors.
Add json-messages result type for exporting all messages (with their metadata) to external tools, e.g. IDE integration.

TODO

Update documentation
Manually tune to_yojson of messages further? Not now, if needed then later on master.
Use OCaml format strings for message functions like tracing functions? Currently those functions require concatenating a string as argument to insert dynamic data into the message.
What to do about warn vs warn_each? See Refactor and generalize warning system to message system #338 (comment).

…less of dbg.debug

sim642 · 2021-09-02T09:06:14Z

While doing this, I discovered that warn vs warn_each is much more subtle than I expected. I had thought that warn_each would warn with location and warn would just warn generally without location. But that's not the case, because warn output also contained a location (!Tracing.current_loc).

So all warnings have a location, but the non-each vs each just determines how they're deduplicated. warns are deduplicated just by the message, so if multiple locations give the same exact message, only the first (with its location) is shown. warn_eachs are deduplicated by the pair (message, location), so the only time anything is ignored is when the same location gives the same warning multiple times.

Since both kinds include a location, it feels like that should be the default and called warn. The suffix "each" is very non-descriptive (each what?). We might also want warnings/messages without location (e.g. configuration problems etc). Currently this isn't possible (or rather, such warnings will just have locUnknown displayed, which isn't very user-friendly).

So my question is: do we actually want the previous warn behavior that includes location but deduplicated just by text? It's not very clear to me what this weird behavior is for.

The current refactored state preserves all the old behavior for all message severities, not just warnings, but while we're at it, it would also make sense to rethink this. Currently the message record has two different locations to accommodate the old behavior (loc: location option and print_loc: location).

vogler · 2021-09-02T19:41:33Z

src/analyses/base.ml

@@ -803,13 +801,13 @@ struct
        match (eval_rv a gs st n) with
        | `Address adr ->
          (if AD.is_null adr
-           then M.warn_each ~must:true ~warning:(M.Warning.Behavior.Undefined.nullpointer_dereference ()) ()
+           then M.error_each ~category:M.Category.Behavior.Undefined.nullpointer_dereference ~tags:[M.Tag.CWE 476] "Must dereference NULL pointer"


Maybe it makes sense to abstract warnings that have multiple calls in order to avoid inconsistencies.
A nullpointer_dereference warning should include its CWE and text.

I'm not sure how necessary this would be because there shouldn't be duplication in warnings in the first place: each one should be emitted by an analysis responsible for it. And even here there are subtle differences between the two: the error is for a must-dereference (mentioned in the message) and the warning is for a may-dereference (mentioned in the message).

And coupling categories with CWEs etc is precisely what we want to avoid in some other places, for example integer overflows. The complete messaging for those is not yet there, but the idea is as follows. Unlike signed overflows, unsigned overflows are not undefined behavior, but they are still considered under the same CWE. While signed overflows should also have an undefined behavior category tag in addition to the integer overflow one. And to be extra precise, there's separate CWEs for overflow and underflow.

So all the subtle variations intentionally should have different tags (and probably messages).

And even here there are subtle differences between the two: the error is for a must-dereference (mentioned in the message) and the warning is for a may-dereference (mentioned in the message).

The text then duplicates the information given by warn/error.

And coupling categories with CWEs etc is precisely what we want to avoid in some other places, for example integer overflows.

Makes sense, but there you can still have different calls.

I'm not sure how necessary this would be because there shouldn't be duplication in warnings in the first place: each one should be emitted by an analysis responsible for it.

The two calls here are identical besides warn/error.
There may be several analysis reporting the same warning.
Maybe it's not that common. I just spotted this bit and imagined it may be a source of duplicated code.
But it's also not clear how to best abstract it. Could introduce warn/error functions that take pre-defined message records.
It may also be worth thinking about whether it's viable to define all warnings/errors (not debug) in one place (similar to defaults.ml) and then just reference them. This would avoid duplication and give a list of everything we report.

The text then duplicates the information given by warn/error.

This is true for must/may, but it's probably for the user's benefit, such they don't have to deduce it from whether it's an error or just a warning. The exact difference between errors and warnings might be analysis-dependent though and more general distinction than must/may. Attaching must/may to every message also doesn't make sense.

There may be several analysis reporting the same warning.

The fact that different analyses report the same thing sounds like a duplication in the first place though.

But it's also not clear how to best abstract it. Could introduce warn/error functions that take pre-defined message records.
It may also be worth thinking about whether it's viable to define all warnings/errors (not debug) in one place (similar to defaults.ml) and then just reference them.

This would be a possible direction indeed if we want to very strictly enumerate all our warnings, a la some compilers or analyzers which specify unique IDs for each warning type. But given that most of the messages still have unknown category and haven't been even properly reviewed or categorized broadly, I think that's too much of a stretch for now.

vogler · 2021-09-02T20:04:20Z

We might also want warnings/messages without location (e.g. configuration problems etc). Currently this isn't possible (or rather, such warnings will just have locUnknown displayed, which isn't very user-friendly).

👍

The suffix "each" is very non-descriptive (each what?).
So my question is: do we actually want the previous warn behavior that includes location but deduplicated just by text? It's not very clear to me what this weird behavior is for.

My assumption was that it was intended as a 'fix warning and re-analyze' mode of using goblint.
For big codebases there may be a lot of warnings. But OTOH it's then also not ideal to re-analyze.
For this use-case it does not make sense to keep it, since one can filter the list of warnings afterwards.
Another use-case may be warning about some state transition where one is not interested in all the sources (e.g. became multi-threaded, unsound, ?), but this also seems like it can be filtered after.

sim642 · 2021-09-03T06:54:02Z

For this use-case it does not make sense to keep it, since one can filter the list of warnings afterwards.
Another use-case may be warning about some state transition where one is not interested in all the sources (e.g. became multi-threaded, unsound, ?), but this also seems like it can be filtered after.

Indeed, deduplication by just message could be done as preprocessing for message display whereever needed. I guess it'd even be possible to implement a transformation that doesn't completely drop secondary messages but replaces them all with a single grouped message.

Although even for going multithreaded, just reporting the first analyzed location is a bit weird because you may have two branches, which both go multithreaded, but then you'd only see a warning about one of them doing that, while it might look like through the other branch you don't.
So to be reasonable, the warning deduplication would have to be somehow flow-sensitive and track what must have been warned already. That already sounds fairly advanced though since warnings would be kept in the domain (?).

vogler · 2021-09-03T08:18:17Z

So to be reasonable, the warning deduplication would have to be somehow flow-sensitive and track what must have been warned already. That already sounds fairly advanced though since warnings would be kept in the domain (?).

Yes, these 'dependent' warnings were another idea for this.
I agree, without a warning lattice, warn is not really useful and we can remove it in favor of warn_each.

vesalvojdani · 2021-09-03T08:48:55Z

Yes, this was a bad design decision. All sorts of filtering and ranking should happen after the analysis. Glancing over current usages of M.warn versus M.warn_each, it seems we are fairly random about it. I can see a couple of sensible usages, such as M.warn "Write to unknown address: privatization is unsound, but in many cases warn is wrongly used.

I was wondering if we should have some specific tag for "state transition" to assist future plans of filtering/grouping warnings, so then you could add that tag when getting rid of the warn cases, but since warn/warn_each are not used consistently and deliberately, we'll have to think that through anyway...

sim642 · 2021-09-03T10:07:29Z

I agree, without a warning lattice, warn is not really useful and we can remove it in favor of warn_each.

Removing the old warn, it would also make sense to rename warn_each -> warn, such that the simplest warning function would by default be with the current location from within the analysis, which should be the most common case for warnings. It would also get rid of the cryptic "each" in the name.

But if we want to then have a properly locationless warnings (unlike the old warnings which would print locUnknown somehow), then what should the function for that be named?

I guess it also might be possible to make warn do the same thing (not print/store location) if Tracing.current_loc is locUnknown or something. That would make warning completely automatic based on whether its before or during the analysis.
I'm a bit worried though because some inserted nodes/edges might still have locUnknown during the analysis, so warning at those would behave unusually.

sim642 · 2021-09-06T09:50:58Z

But if we want to then have a properly locationless warnings (unlike the old warnings which would print locUnknown somehow), then what should the function for that be named?

I called it warn_noloc for now, but it's nowhere used. All the old semi-location-sensitive warnings now use locations properly, but it's easy to switch them to warn_noloc on a per-case basis.

vogler · 2021-09-10T09:01:57Z

Removing the old warn, it would also make sense to rename warn_each -> warn, such that the simplest warning function would by default be with the current location from within the analysis, which should be the most common case for warnings. It would also get rid of the cryptic "each" in the name.

Yes, that's what I meant.

That would make warning completely automatic based on whether its before or during the analysis.

That would be the nicest solution.
For now, instead of warn_noloc, could warn not have an optional argument noloc?

sim642 · 2021-09-10T09:22:36Z

For now, instead of warn_noloc, could warn not have an optional argument noloc?

It would be more verbose though because that would be an optional bool argument, which must be called like ~noloc:true or unit, which must be called like ~noloc:(). Both seem a bit annoying.

Also it seemingly allows for a confusing combination where you can call it with ~noloc:true ~loc:someLoc, which doesn't really make sense.

sim642 added 30 commits August 25, 2021 11:16

Remove practically unused tracking (not tracing!)

b466336

Remove unused Config.experimental

27a3ea7

Remove gccwarn option

36063d2

Remove Messages.waitWhat and StopTheWorld

f4035f6

Remove unused Messages.warn_urgent

d41b6e5

Remove unused Messages.soundness variable

219ef11

Remove confusing Messages.warnings variable and print warnings regard…

dedea29

…less of dbg.debug

Add warnings to warning_table regardless of options

5e86bdb

Add more general Message record, use for deduplication

6c44903

Deduplicate message string construction

5689bb4

Add context to message as Obj.t

676ca5c

Add messages list to keep order

ac2667b

Remove warn_all usage outside Messages

e72cba5

Move common code to warn_all

57b0eb9

Remove report_error

c788fac

Simplify Messages.debug

448be7b

Remove loc argument on warn_all

231d0bb

Replace warn context argument with Messages.current_context

f5ddc4c

Deduplicate should_warn checks

33aefa7

Add severity to messages

8a95383

Replace must warnings with error severity messages

e858524

Consider Error as warning for regression testing

74f0c01

Remove warning certainty

4679fdf

Remove Debug warning type which is now covered by Debug severity instead

b2de8d9

Use Debug severity for Messages.debug

757a7b9

Add should_warn to Message

db214e5

Deduplicate Messages internal functions

3f1911e

Make warning message text mandatory

e8cd360

Remove unit argument from warning

2d4ebcc

Inline some internal Messages functions

7ee34ca

sim642 added 8 commits September 1, 2021 11:04

Remove useless () arguments from message category convenience functions

b41b33d

Add documentation for refactored message system

0372f53

Remove outdated warning documentation

f9bec05

Move messaging up in documentation navigation

effdb8b

Use List.concat instead of BatList.concat_map for OCaml 4.09

f1b40ab

Fix MessageUtil.colors_on always checking stdout

9e6dd95

Use pretty instead of show for warnings in Deadlock

d0497eb

Remove useless unit statements in MessageCategory

0260ea4

vogler reviewed Sep 2, 2021

View reviewed changes

sim642 added 7 commits September 6, 2021 12:21

Remove non-each message functions

b3e30c8

Remove print_loc from message

23302ef

Remove _each suffix from message functions

452a0a3

Add _noloc message functions for future use

84a09d8

Add info severity message functions

4e7aecd

Update messaging documentation about locations

8f0aaf2

Merge branch 'master' into messages-refactor

268d651

sim642 merged commit 6c35eca into master Sep 6, 2021

sim642 deleted the messages-refactor branch September 6, 2021 13:59

sim642 mentioned this pull request Sep 6, 2021

Use message system for output #344

Closed

4 tasks

sim642 mentioned this pull request Sep 10, 2021

Fix overflow and dead branch warnings in Juliet tests #346

Closed

6 tasks

michael-schwarz mentioned this pull request Mar 21, 2022

Categorize Remaining Warnings #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor and generalize warning system to message system #338

Refactor and generalize warning system to message system #338

sim642 commented Aug 31, 2021 •

edited

Loading

sim642 commented Sep 2, 2021

vogler Sep 2, 2021

sim642 Sep 3, 2021

vogler Sep 3, 2021 •

edited

Loading

sim642 Sep 3, 2021

vogler commented Sep 2, 2021

sim642 commented Sep 3, 2021

vogler commented Sep 3, 2021

vesalvojdani commented Sep 3, 2021

sim642 commented Sep 3, 2021

sim642 commented Sep 6, 2021

vogler commented Sep 10, 2021

sim642 commented Sep 10, 2021

Refactor and generalize warning system to message system #338

Refactor and generalize warning system to message system #338

Conversation

sim642 commented Aug 31, 2021 • edited Loading

Changes

TODO

sim642 commented Sep 2, 2021

vogler Sep 2, 2021

Choose a reason for hiding this comment

sim642 Sep 3, 2021

Choose a reason for hiding this comment

vogler Sep 3, 2021 • edited Loading

Choose a reason for hiding this comment

sim642 Sep 3, 2021

Choose a reason for hiding this comment

vogler commented Sep 2, 2021

sim642 commented Sep 3, 2021

vogler commented Sep 3, 2021

vesalvojdani commented Sep 3, 2021

sim642 commented Sep 3, 2021

sim642 commented Sep 6, 2021

vogler commented Sep 10, 2021

sim642 commented Sep 10, 2021

sim642 commented Aug 31, 2021 •

edited

Loading

vogler Sep 3, 2021 •

edited

Loading