Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve -MG #903

Open
ISSOtm opened this issue Jul 4, 2021 · 23 comments
Open

Improve -MG #903

ISSOtm opened this issue Jul 4, 2021 · 23 comments
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM

Comments

@ISSOtm
Copy link
Member

ISSOtm commented Jul 4, 2021

-MG only reports new dependencies one by one, it would be much better to do so in batches: generating more dependencies per step reduces the amount of RGBASM invocations, which is better.

Some testing with GCC shows that it reports all missing deps, and treats all missing files as empty. (#if, #ifdef etc. are evaluated normally.)

This could in theory cause some problems:

Data:
INCBIN "file.bin"
IF @ - Data == 0
    FAIL "Wut, the file is empty?"
ENDC

...to which I suggest aborting normally on any fatal error. This would be similar to the current behavior, but probably less common. Note however that GCC honors #error, still producing the deps, but returning 1 (so the first make errors out, but further ones work correctly). Maybe this could be better handled by skipping all conditional blocks entirely, still producing fatal errors encountered along the way?

... Overall, this sounds incredibly fragile, and I'm really not confident in this change, but it's largely a necessity if we want

@ISSOtm ISSOtm added enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM labels Jul 4, 2021
@Rangi42
Copy link
Contributor

Rangi42 commented Jul 4, 2021

"skipping all conditional blocks entirely" wouldn't allow cases like this:

IF DEF(_GOLD)
	INCBIN "gfx/pokemon/eevee/front_gold.dimensions"
ELIF DEF(_SILVER)
	INCBIN "gfx/pokemon/eevee/front_silver.dimensions"
ENDC

@Rangi42 Rangi42 added this to the v0.5.2 milestone Jul 4, 2021
@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 4, 2021

Then how should they be handled? Conditionally evaluating all blocks won't work either for example due to macro arg expansion...

(As a side note, I feel like we're fighting an uphill battle regarding this due to not being able to run -M as a preprocessing step...)

@Rangi42
Copy link
Contributor

Rangi42 commented Jul 4, 2021

I'm not sure; pret uses a custom scan_includes tool that ignores all rgbasm semantics and just scans for lines starting with INCLUDE/INCBIN. (Which doesn't even let us do things like INCLUDE "{VERSION}/foo.inc" or INCBIN STRCAT("build/", "bar.bin").)

The initial report sparking this gave this example:

section "x", ROM0
incbin "a"
incbin "b"

Without -M -MG both incbins complain about missing files; with -M -MG the first is printed as a dependency but the second isn't. (sect_BinaryFile calls fstk_FindFile calls printDep; then sect_BinaryFile can't open the file so complains and sets failedOnMissingInclude = true; I'm actually not sure why the second incbin isn't printed anyway by the if (generatedMissingIncludes) printDep(path); clause in fstk_FindFile.)

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 5, 2021

That's because failedOnMissingInclude causes YYACCEPT in parser.y

@aaaaaa123456789
Copy link
Member

Dependency listing is one of those things where accurate output is more important than fast output. If a project relies on dependency output in its build system, inaccurate output might cause a file not to be rebuilt, leaving stale code/data in the final binary.

Scanning assembly files for inclusion statements is a trivial task if anyone wants to build a faster but simpler tool. (I once wrote a scan_includes equivalent in all of half an hour, testing included. In pure ISO C17.) RGBASM should be as correct as possible in this aspect — projects that want the faster behavior can easily implement it.

@daid
Copy link
Contributor

daid commented Jul 5, 2021

Scanning assembly files for inclusion statements is a trivial task if anyone wants to build a faster but simpler tool.

Unless you have to deal with rgbds macros and equs.

I currently first run rgbasm and then append the resulting dependency file with a grep + sed to add all the extra incbin statements as well, it's ugly, but it works in the common case.

RGBASM should be as correct as possible in this aspect — projects that want the faster behavior can easily implement it.

Depends on your definition of "correct". From my perspective, the current behavior is incorrect. I ask a file for it's dependencies, and it does not list all dependencies.

I had earlier questions about -MP behavior, and the reply I got back then was "current behavior matches gcc". And for this specific issue, the behavior does not match gcc. With -MG gcc lists all includes, and process all defines it does have, and if any #error is encountered it aborts with an error.

Maybe this could be better handled by skipping all conditional blocks entirely

That sounds like a really bad plan.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 5, 2021

Unless you have to deal with rgbds macros and equs.

The counter-argument is "we don't use those, so our tool doesn't account for them". This is valid, as "this specific codebase doesn't use these features" lets you optimize for their lack. However, they introduce additional dependencies (typically Python or a C compiler), so removing their need would be useful.

I ask a file for it's dependencies, and it does not list all dependencies. [...] With -MG gcc lists all includes, and process all defines it does have, and if any #error is encountered it aborts with an error.

You are correct, but this is by necessity. C dependency generation is performed by the preprocessor, which only cares about its own syntax:

% cat c.c
#include "a.h"

rofl 1 { "ah yes" }?$;:
% gcc -MM -MG -MP c.c
c.o: c.c a.h

a.h:

And in addition, the preproc's syntax is not context-sensitive; that is, a C file can be correctly parsed (by the preproc) even when some of the files to be included are missing. (I believe #if etc. blocks cannot "straddle" files, but correct me on that if I'm wrong.)

RGBASM is not so lucky due to EQUS.

% cat a.asm
INCLUDE "{INC}"
PRINTLN 16 tiles
% cat b.inc
tiles EQUS "* 16"
% cat c.inc
% rgbasm -DINC=b.inc a.asm
$100
% rgbasm -DINC=c.inc a.asm
ERROR: a.asm(2)
    syntax error, unexpected identifier
error: Assembly aborted (1 error)!

The rest of the syntax can largely be handled: treat "required" yet non-existent variables as 0, ignore macros, etc. But the lack of an EQUS can (and likely will) introduce parsing errors.

Even then, since macros can generate INCLUDEs, dependencies can be hidden even when not transitive (a.asm defines a macro ninja that INCLUDE "c.inc", includes b.inc which defines samurai EQUS "ninja", then a.asm calls samurai as if it were a macro. Contrived? Certainly.)
The presence or absence of an INCBIN can be easily tested for as well, since the file's size is not known, therefore PC must advance by some number. This can further alter the chain in non-obvious ways, more than just FAIL or tripping asserts (assert @ - DataStart == $1234, "File does not have expected size"). How should those be handled?

That sounds like a really bad plan.

Since it appears that ignoring all IFs is wrong, then that should be restricted to all IFs whose expression generates a syntax error, I guess. Though introducing more behavioral differences between normal parsing and after a dependency turned out to be missing is a can of worms that I'm definitely not opening.

All in all, the lack of robustness in RGBASM programs is why the current behavior is what it is: after a failed INCBIN, and even moreso a failed INCLUDE, here be dragons. Only one dependency is currently listed per invocation out of necessity, not laziness. Even reporting syntax errors (gcc -MM -MG still errors out while printing nothing if the file contains #if defined(kaboom!) is not reliable, so reporting FAILs as well does not sound like a good idea.

This could be improved while staying relatively simple and safe: only abort on the first conditional block or fatal error after a missing dep, which would at least account for a chain of INCBINs not followed by strict assertions, and some INCLUDE chains.

@aaaaaa123456789
Copy link
Member

A fully accurate mode could be introduced by adding a flag that simply assembles the input into /dev/null and outputs the list of files it read. This will obviously be very slow, but also 100% correct (bar any randomness in the input) for any method that needs that, such as @daid's tool.

I'm not sure of whether this would be worth it, though.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 5, 2021

I've just explained that it is not possible to assemble the input without the dependencies in the general case.

To do so in one go, RGBASM would need to be able to invoke the command responsible for generating the file, and proceed once the file has been generated. (This would flip the paradigm from RGBASM telling make what it needs to RGBASM invoking make itself, though I've proposed this idea long ago and it was shot down by ax6.)

@daid
Copy link
Contributor

daid commented Jul 5, 2021

Only one dependency is currently listed per invocation out of necessity, not laziness.

I disagree, as without -MG the parsing continues after a missing incbin. So it can continue, but decides not to. For some extreme edge case reasons.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 5, 2021

Parsing continues after a failed INCBIN as "current state known to be incorrect, let's try reporting more errors for this batch"; -MG instead tries to avoid being incorrect, so it aborts immediately. Again, I agree this is worth changing, what do you think about the changes I suggested above?

@daid
Copy link
Contributor

daid commented Jul 5, 2021

what do you think about the changes I suggested above?

Sorry, could you be more specific to which changes you are referring? (there is a lot of info in this thread, and my brain is a bit slow today, so most likely my fault for not seeing the exact changes you are referring to)

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 5, 2021

This could be improved while staying relatively simple and safe: only abort on the first conditional block or fatal error after a missing dep, which would at least account for a chain of INCBINs not followed by strict assertions, and some INCLUDE chains.

and

I've just explained that it is not possible to assemble the input without the dependencies in the general case.

To do so in one go, RGBASM would need to be able to invoke the command responsible for generating the file, and proceed once the file has been generated. (This would flip the paradigm from RGBASM telling make what it needs to RGBASM invoking make itself, though I've proposed this idea long ago and it was shot down by ax6.)

@daid
Copy link
Contributor

daid commented Jul 5, 2021

Ah, that first suggestion on aborting on a fatal error after a missing deb most likely solve most common cases. You could still craft very specific examples where this gives the "wrong" result.

data:
incbin "a"
IF data - @ == 0
incbin "b"
ENDC

(Invoking make from rgbasm sounds like a bad idea indeed, as it breaks with how everything else in the world does things, yes, it could potentially work, but it opens up a whole world of new problems)

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 5, 2021

Ah, that first suggestion on aborting on a fatal error after a missing deb most likely solve most common cases. You could still craft very specific examples where this gives the "wrong" result.

data:
incbin "a"
IF data - @ == 0
incbin "b"
ENDC

This kind of "wrong result" is why I'm suggesting aborting on the first conditional following a missing dep as well.

(Invoking make from rgbasm sounds like a bad idea indeed, as it breaks with how everything else in the world does things, yes, it could potentially work, but it opens up a whole world of new problems)

The reason why I brought it up again is because I have seen a tool do this at my job.

@Rangi42
Copy link
Contributor

Rangi42 commented Jul 27, 2021

only abort on the first conditional block or fatal error after a missing dep

This sounds too vulnerable to false positives. For example:

TitleGFX::
INCBIN "title.2bpp.lz"
TitleTilemap::
if DEF(_GOLD)
INCBIN "title_gold.tilemap.lz"
else
INCBIN "title_silver.tilemap.lz"
endc

I think it should be the user's responsibility to not write INCLUDEs/INCBINs which (a) make hasn't already generated and (b) conditionally determine subsequent INCLUDEs/INCBINs. So the user could choose to write something like this:

Data:
INCBIN "file.bin"
IF (@ - Data == 0) && !DEF(MAKEDEPEND)
    FAIL "Wut, the file is empty?"
ENDC

and then be sure to pass -D MAKEDEPEND along with -MG.

(Or, add a built-in MAKEDEPEND variable defined if -M was passed.)

Regardless of how/whether we change -MG behavior, there's still an inconsistency with how it currently runs.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 27, 2021

I agree it's very vulnerable, but it's a conservative and easy to implement option. It may be worth implementing that behavior, and observing the results; given its relative simplicity, it should be easy to revert to switch to something more complex.

@Rangi42
Copy link
Contributor

Rangi42 commented Jul 27, 2021

Maybe so, although I'd like to avoid releasing it in one version and switching to something else in the next.

Also regarding -M: gcc suppresses warnings when that is passed, rgbasm could do the same.

Edit: I think this snippet from polishedcrystal would already fail with that behavior: (it uses scan_includes.c, not rgbasm -M, but that's just for performance reasons; either ought to work)

MaxStatSparkleGFX:
INCBIN "gfx/stats/sparkle.2bpp"

EVChartPals:
if !DEF(MONOCHROME)
...

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 27, 2021

Maybe so, although I'd like to avoid releasing it in one version and switching to something else in the next.

Why? The behavior is pretty much unspecified, and the feature sees little enough use that I think we can get away with some mild behavioral changes for now.

Also regarding -M: gcc suppresses warnings when that is passed, rgbasm could do the same.

Only because -M alone, unlike -MM, only runs the preproc stage, skipping generation of many warnings altogether. (It's possible that preproc warnings get suppressed as well.) [EDIT: To clarify, I agree this would be nice, but it would require buffering all warning generation to the end of assembly, which I'm not keen on doing at least yet.]

Edit: I think this snippet from polishedcrystal would already fail with that behavior:

MaxStatSparkleGFX:
INCBIN "gfx/stats/sparkle.2bpp"

EVChartPals:
if !DEF(MONOCHROME)
...

Well, it would only batch the first few the first time, but on the second run, after all deps before the if are generated, it would proceed normally past the conditional. Still an improvement, though I agree improvable at least in theory.

@Rangi42
Copy link
Contributor

Rangi42 commented Jul 27, 2021

I'm used to pret's way of doing things, so maybe this is wrong; but I would expect most INCBINs to not be preparing for later conditionals, and most conditionals to be unrelated to any prior INCBINs. And when a conditional does depend on a prior INCLUDE, it's most likely a file that already exists, e.g. INCLUDE "constants.inc".

Would it make sense as a new flag?

  • Without -MG, a missing included file will immediately error out.
  • With -MG, a missing included file will be added as a dependency; but this is vulnerable to the cases discussed above (which might be rare or could manually handle with -D MAKEDEPEND) including the issue where "the first is printed as a dependency but the second isn't" (needs fixing).
  • With "-Mg", a missing included file will be added as a dependency, but a subsequent IF, FAIL, or ASSERT FATAL will immediately error out.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 27, 2021

I really don't think it's an issue not to print everything at once, but batching should be done whenever possible to improve performance.

I'd really like to avoid an extra flag, because that's a strong commit to backwards compat; as a last-ditch kludge, I can suggest computing the expressions to IFs while suppressing errors (via a flag), and if the expression was not successfully evaluated, then exit this iteration early, otherwise proceed normally.

This covers most conditionals suggested by the above, and allows detecting whether a file has been generated via DEF(), which may be useful. The kludge can go away once we have #663.

@Rangi42
Copy link
Contributor

Rangi42 commented Jul 27, 2021

That kludge sounds fragile in the same ways as my no-lazy-expressions implementation of short-circuiting && and || (#665). How about just fixing the "failedOnMissingInclude causes YYACCEPT" issue for now, and once lazy evaluation exists, then do "-MG exits if an IF expression can't be evaluated"?

@ISSOtm
Copy link
Member Author

ISSOtm commented Jul 27, 2021

Worth trying, though I'd wait until #849 is done to have some testing.

@Rangi42 Rangi42 removed this from the v0.9.0 milestone Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM
Projects
None yet
Development

No branches or pull requests

4 participants