Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Functions to read the contents of text and binary files #933

Open
Rangi42 opened this issue Nov 11, 2021 · 8 comments
Open
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM

Comments

@Rangi42
Copy link
Contributor

Rangi42 commented Nov 11, 2021

INCLUDE and INCBIN can include a file's contents directly as assembly code or data, but there are times when it would be nice to access a file's contents arbitrarily. For example, you might want to do something different depending on the size of the file; or reuse a tilemap with some offset added to all the bytes; or use part of a file depending on an rgbasm calculation; or so on. This currently has to be handled by including files generated with custom tools according to particular Makefile rules.

If we had arbitrary-length strings (#650) and arrays (#67), this could be accomplished with just two functions:

  • READFILE("file.txt") would return a string of the file's characters
  • READBIN("file.dat") would return an array of the file's bytes

For both functions, if the file doesn't exist they would abort with a "No such file" error, or under -M dependency-generating mode they would act consistent with INCLUDE and INCBIN after #903 gets resolved.

Note that READFILE would read the file's bytes as-is, which may not be valid UTF-8. It's already possibly to create a non-UTF-8 .asm file with arbitrary bytes in a string literal, so this is not a new issue. Some functions like STRCAT and STRIN wouldn't care about invalid UTF-8, others like STRLEN and STRSUB would print a "Invalid UTF-8 byte" error.

@Rangi42 Rangi42 added this to the v0.5.3 milestone Nov 11, 2021
@Rangi42 Rangi42 added enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM labels Nov 11, 2021
@Rangi42
Copy link
Contributor Author

Rangi42 commented Nov 11, 2021

#67 (comment) discusses some possible syntaxes and issues with native arrays. If we ended up not adding them, another approach to reading binary file content would be to just read it as a string too, and just trust the user not to do anything with the string that expects valid UTF-8.

This would require adding a few byte-based functions (STRBYTELEN and STRBYTESUB at least) to use on "binary strings". String functions like STRCAT and STRRPL currently don't care about how a string is encoded, so they could be used on binary strings too...

...except that even they are implemented with C's own strings functions, so would behave incorrectly if the binary string has any $00 bytes. Even the ongoing PR #650 to have dynamic strings would probably still have that limitation.

(I'm also not really a fan of adding this kind of implicit limitation, where the notion of "valid string" vs "binary string" is all up to the user's understanding and has no type-level enforcement in the rgbasm language. It's also less flexible than arrays, which could store any numbers, not just bytes. But I'm bringing it up because arrays have issues of their own, and would be a large feature to add just for the same of READBIN.)

@Rangi42
Copy link
Contributor Author

Rangi42 commented Nov 12, 2021

C++ std::string does support storing NUL bytes, so presumably if we used C++ std lib algorithms instead of C's string.h functions, they would work as expected for "binary strings".

#include <iostream>
#include <string>
int main() {
    std::string s("hello");
    std::cout << s.size() << " " << s << std::endl;
    s[2] = '\0';
    std::cout << s.size() << " " << s << std::endl;
    return 0;
}
5 hello
5 helo

@Rangi42
Copy link
Contributor Author

Rangi42 commented Nov 12, 2021

Currently pokegold uses a custom tool to measure the size of a 5x5, 6x6, or 7x7 PNG image and encode that as $55, $66, or $77. This could replace its INCBIN "front.dimensions" lines with a simpler INCDIMENSIONS "front.2bpp" macro (named that way so scan_includes could easily be adapted to handle it):

MACRO INCDIMENSIONS
    def n = STRBYTELEN(READBIN(\1)) ; or STRLEN(READFILE(\1)) pending #938
    if n == 5 * 5 tiles
        db $55
    elif n == 6 * 6 tiles
        db $66
    elif n == 7 * 7 tiles
        db $77
    else
        fail "Invalid front pic size for \1"
    endc
ENDM

(pokecrystal can't do this because its front pics have extra animation tiles.)

@Rangi42
Copy link
Contributor Author

Rangi42 commented Nov 12, 2021

READFILE could also take optional start and length arguments like INCBIN. This would be more efficient for use cases that only need part of a file, and would let pokecrystal do this (with INCDIMENSIONS "front.png" instead of INCDIMENSIONS "front.2bpp"):

MACRO INCDIMENSIONS
    pushc
    setcharmap ascii
    DEF png_ihdr_width EQUS READFILE(\1, 16, 4)
    DEF n = STRSUB("{png_ihdr_width}", 1, 1) << 24 | \
            STRSUB("{png_ihdr_width}", 2, 1) << 16 | \
            STRSUB("{png_ihdr_width}", 4, 1) <<  8 | \
            STRSUB("{png_ihdr_width}", 5, 1)
    if n == 5 * TILE_LENGTH
        db $55
    elif n == 6 * TILE_LENGTH
        db $66
    elif n == 7 * TILE_LENGTH
        db $77
    else
        fail "Invalid front pic size for \1"
    endc
    PURGE png_ihdr_width
    popc
ENDM

@Rangi42
Copy link
Contributor Author

Rangi42 commented Dec 14, 2023

We could compare this proposal with WLA-DX's .FOPEN, etc directives.

.fopen "data.bin" fp
.fsize fp t
.repeat t
.fread fp d
.db d+26
.endr
.undefine t, d

@ISSOtm
Copy link
Member

ISSOtm commented Dec 16, 2023

I don't like exposing arbitrary file I/O because it's just complicated:

  • Try eagerly reading all of the file, and potentially OOM for large files that could've been processed piecemeal, or block infinitely from e.g. pipes;
  • Try exposing piecemeal read functions (à la WLA-DX), and shift all of the complexity onto the user, and still occasionally block infinitely.

...so for the time being, I still prefer people building ad-hoc tools and solutions when such needs arise. Please understand that I'm not trying to be a grouch raining on everyone's parade, but also reining in what I fear to be scope creep.

Note

I am however leaving this issue open so that people can still voice any support, use cases, or design suggestions.

@Rangi42
Copy link
Contributor Author

Rangi42 commented Dec 16, 2023

Try eagerly reading all of the file, and potentially OOM for large files that could've been processed piecemeal, or block infinitely from e.g. pipes;

Don't we have exactly this theoretical problem already, with INCBIN? I'd expect a READFILE function to work just like that, except putting the data in an array, not directly in the ROM.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 16, 2023

In theory yes, but not really because INCBIN means that all of the file's contents must be dumped into the ROM anyway, and we have to allocate that space anyway for a few reasons.

@Rangi42 Rangi42 removed this from the v0.9.0 milestone Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM
Projects
None yet
Development

No branches or pull requests

2 participants