Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lists/arrays #67

Open
Sanqui opened this issue May 5, 2015 · 15 comments
Open

Lists/arrays #67

Sanqui opened this issue May 5, 2015 · 15 comments
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM

Comments

@Sanqui
Copy link
Member

Sanqui commented May 5, 2015

MOVES EQU [$24, $3a]

    db LENGTH(MOVES)
    db MOVES[0], MOVES[1]

Is there any chance for something like this to happen at all? Is it even a good idea?

Should be made to work for strings too.

POKEMON_NAMES EQU ["BULBASAUR", "IVYSAUR"]

    db "So you want ", POKEMON_NAMES[STARTER1], "?@"
@Sanqui Sanqui changed the title Lists Arrays May 6, 2015
@Sanqui Sanqui changed the title Arrays Lists/arrays May 6, 2015
@AntonioND AntonioND added enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM labels Apr 2, 2018
@Rangi42
Copy link
Contributor

Rangi42 commented Jan 2, 2021

An alternative with post-0.4.2 {interpolation} is to define many symbols with the "array index" as a suffix.

list_equ: MACRO
x EQUS "\1"
i = 0
SHIFT
REPT _NARG
; define `list_equs` too with EQUS here
{x}#{d:i} EQU \1 ; could have used '_' but '#' is valid in names
i = i + 1
SHIFT
ENDR
LENGTH_{x} EQU i
PURGE x
ENDM

	list_equ MOVES, $24, $3a

	db LENGTH_MOVES
	db MOVES#0, MOVES#1

	list_equs POKEMON_NAMES, "BULBASAUR", "IVYSAUR"

	db "So you want {POKEMON_NAMES#{d:STARTER1}}?@"

pokered's macros/scripts/maps.asm works somewhat like this, e.g. def_warps followed by many warps, and then def_warps_to iterates over all the defined warps. (Those macros will be simplified with the next rgbds release.)

def_warps: MACRO
REDEF _NUM_WARPS EQUS "_NUM_WARPS_\@"
	db _NUM_WARPS
_NUM_WARPS = 0
ENDM

warp: MACRO
	db \2, \1, \3, \4
REDEF _WARP_TO_NUM_{d:{_NUM_WARPS}} EQUS "warp_to \1, \2, _WARP_TO_WIDTH"
_NUM_WARPS = _NUM_WARPS + 1
ENDM

def_warps_to: MACRO
_WARP_TO_WIDTH = \1_WIDTH
	FOR N, _NUM_WARPS
		_WARP_TO_NUM_{d:N}
	ENDR
ENDM


	def_warps
	warp 14,  0, 5, LAST_MAP
	warp 14,  2, 1, SS_ANNE_1F

	def_warps_to VERMILION_DOCK

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 2, 2021

Here are some macros for working with lists (using features of post-0.5.0 rgbasm master): numeric lists https://pastebin.com/Ucngn8Pt and string lists https://pastebin.com/WMrJdSKk

	list MOVES, $24, $3a
	db LENGTH_MOVES
	db MOVES#1, MOVES#2

	slist POKEMON_NAMES, "BULBASAUR", "IVYSAUR"
	db "So you want {POKEMON_NAMES#{d:STARTER1}}?@"
	slist MONS
	slist_item "Squirtle"
	slist_item "Bulbasaur"
	slist_item "Charmander"
	slist_sort MONS
	slist_println MONS ; ["Bulbasaur", "Charmander", "Squirtle"]
	println LENGTH_MONS ; $3

	slist_copy STARTERS, MONS
	slist_append STARTERS, "Pikachu", "Eevee"
	slist_println STARTERS ; ["Bulbasaur", "Charmander", "Squirtle", "Pikachu", "Eevee"]
	slist_replace STARTERS, "Pikachu", "Raichu"
	println "{STARTERS#4}" ; Raichu

	slist GEN2MONS, "Chikorita", "Cyndaquil", "Totodile"
	slist_delete STARTERS, 4
	slist_remove STARTERS, "Eevee"
	slist_extend STARTERS, GEN2MONS
	slist_purge GEN2MONS
	slist_println STARTERS ; ["Bulbasaur", "Charmander", "Squirtle", "Chikorita", "Cyndaquil", "Totodile"]

	slist_purge MONS
	assert !DEF(LENGTH_MONS)
	slist MONS, "Rattata", "Pidgey", "Pidgey", "Spearow", "Pidgey", "Spearow"
	slist_find N, MONS, "Pidgey"
	println N ; $2
	slist_rfind N, MONS, "Pidgey"
	println N ; $5
	slist_count N, MONS, "Pidgey"
	println N ; $3
	slist_remove_all MONS, "Pidgey"
	slist_set MONS, 2, "Fearow"
	slist_insert MONS, 3, "Raticate"
	slist_reverse MONS
	slist_println MONS ; ["Spearow", "Raticate", "Fearow", "Rattata"]

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 2, 2021

That actually demonstrates a possible use case for REDEF EQU versus using SET: it would prevent the user from directly changing a constant, while letting the appropriate macros conveniently do it.

__len = LENGTH_\1
PURGE LENGTH_\1
LENGTH_\1 EQU __len + 1
PURGE __len

; vs

REDEF LENGTH_\1 EQU LENGTH_\1 + 1

@ISSOtm
Copy link
Member

ISSOtm commented Apr 20, 2021

Bump @Sanqui, what do you think of Rangi's suggested alternatives?

@Sanqui
Copy link
Member Author

Sanqui commented Apr 22, 2021

Well, it's sort of hacky, would have been nice to have some syntactical sugar for this whole thing, but it would work for me, I think. I can't help but wonder if dictionaries would be possible with this approach too, to possibly enable loading entire JSON-like structures.

@ISSOtm
Copy link
Member

ISSOtm commented Apr 22, 2021

Struct support has been requested (#98), is shimmed, but has native support planned because rgbds-structs is a big pile of spaghetti hacks.

@Rangi42
Copy link
Contributor

Rangi42 commented Nov 11, 2021

We were discussing native arrays in #rgbds. They would be useful as a return value from a hypothetical READBIN function to read the bytes of a file (as opposed to a READFILE reading the contents as a string, since the functions dealing with strings expect UTF-8 and would terminate on $00 bytes.)

Some possible syntaxes for array literals:

  • Brackets: [1, 2, 3] would be concise and familiar, but might be grammatically ambiguous (I'm not sure, since unlike strings, arrays wouldn't be usable as relocexprs)
  • Function: ARRAY(1, 2, 3) would be easy to implement and not introduce new syntax or punctuation, but is verbose
  • Sigil: #[1, 2, 3] would be unambiguous and concise, but looks weird

db, dw, and dl should work with arrays just like with strings, applying to each element of the array.

Most string functions would usefully have array counterparts, though I don't know if they should start with "ARRAY" or just "ARR" (I think "ARRAY" is more readable):

  • ARRAYLEN(arr): Returns the length of arr.
  • ARRAYVAL(arr, i): Returns the ith value in arr (1-indexed for consistency with strings; this would support negative indexes too like STRSUB). (Other names: ARRAYITEM, ARRAYNUM, ARRAYELEM, ARRAYAT?)
  • ARRAYCAT(arrs...): Concatenates arrs.
  • ARRAYIN(arr, val): Returns the first position of val in arr, or zero if it's not present.
  • ARRAYRIN(arr, val): Returns the last position of val in arr, or zero if it's not present.
  • ARRAYSUB(arr, pos, len): Returns a sub-array of arr, like arr[pos:pos+len] in Python.

Macros like in list.asm could take care of more advanced array manipulation, like counting a value, removing/replacing the first/last/all of a value, sorting, reversing, etc. If any of them are found to be particularly useful, they can always be added in a later release. (Even the ARRAYIN/ARRAYRIN functions could be omitted, since for loops are sufficient and they might be rarely used.)

A question: how to assign an identifier to an array? DEF arr EQUA [1, 2, 3]?

There's a notable difference between arrays and strings. If you have DEF s EQUS "hello", you can't do STRLEN(s) because of string expansion; you have to do STRLEN("{s}"). This involves more typing but prevents you at the grammar level from saying STRLEN(x) for any identifier x which could be a number, label, undefined, etc. On the other hand, if you've defined arr as an array (somehow), I'm not sure how it should behave:

  • Should ARRAYLEN(arr) just work? Then would we rely on a runtime error/abort if you do ARRAYLEN(x) for some numeric/string/etc x?
  • Or should "array equates" act like string equates and expand during lexing? (I'd rather they not, we don't need to add more lexer-time special behavior.)
  • Maybe arrays should have a separate namespace from other identifiers? They could all start with #, not just literals. So DEF #arr2 EQUA ARRAYCAT(#arr1, #[4, 5, 6]) would be grammatical and DEF arr EQUA #[1, 2, 3] would not. But that could look bad with #s everywhere.

This proposal also doesn't address arrays of strings, which could be at least as useful. Example: have an array of all monster names, and in texts discussing MON_FOO, concatenate the value of the MON_FOOth entry of MON_NAMES. And some table of monster names would just be for i, ARRAYLEN(MON_NAMES) / db ARRAYVAL(MON_NAMES, i+1) / endr.

@aaaaaa123456789
Copy link
Member

A few unsorted comments on your comment:

  • The only meaningful thing you can do with an array of strings is emit those strings, and that's already doable with EQUS expansion. There's no immediate need to support them.
  • I wouldn't oppose to arrays having a namespace of their own. If they don't have one, then yes, they should act as their own data type and be accepted directly as arguments to functions that take array expressions. (What is an array expression, anyway? That would be an interesting question to answer.)
  • If arrays and strings are separate, it would be nice to be able to convert between them. This only requires three functions: STRCHARS(str) that converts a valid string into an array of UTF-8 codepoints, ARRAYSTR(arr) that does the opposite, and STRENCODE(str) that converts a string through the charmap into the raw data it would output. (This last function has no inverse, as charmap conversions aren't in general reversible.) Note the difference between STRCHARS (which is invertible and has a known encoding, making it ideal for metaprogramming) and STRENCODE (non-invertible and using the target's encoding, making it ideal for data generation).
  • 1-based indexing for arrays is evil and it should never be even considered. Even if strings use it. There's no reason to make that mistake twice, and it's a meme in programming circles for a reason.

@Rangi42
Copy link
Contributor

Rangi42 commented Nov 12, 2021

  • An array expression is either an array literal or a built-in function call that returns an array. Just like how in parser.y a string is a T_STRING or a call to T_OP_STRSUB, T_OP_STRCAT, etc.
  • Those sound like reasonable functions, though I'd call them STRARRAY (so STRARRAY("<PK>") => ARRAY($3C, $50, $4B, $3E)), ARRAYSTR (so ARRAYSTR(ARRAY($41, $42, $43)) => "ABC"), and CHARARRAY (so CHARARRAY("<PK>") => ARRAY($E1) since charmap "<PK>", $e1). (Since the current STR* functions take strings and the CHAR* functions dealing with charmap values.) Although I'm not certain all or any of those are necessary, at least not in an initial release with basic MVP arrays. Assuming arrays are added at all. STRCHARS/STRARRAY can be accomplished with a FOR loop and STRSUB, STRENCODE/CHARARRAY with a FOR loop and CHARSUB, and ARRAYSTR sounds suspicious since array values might not all be valid Unicode code points.
  • I wish rgbasm had zero-indexing from the beginning, but it doesn't, and would much rather have STRSUB etc act consistently with ARRAYSUB etc. It's not without precedent: plenty of languages use 1-indexing, including many math-oriented ones (Fortran, Matlab, Mathematica, R, Julia).

@aaaaaa123456789
Copy link
Member

aaaaaa123456789 commented Nov 12, 2021 via email

@Rangi42
Copy link
Contributor

Rangi42 commented Nov 12, 2021

ARRAYCAT is potentially redundant, if we allow the ARRAY "constructor" to automatically flatten arrays. So you have DEF a1 EQUA ARRAY(1,2,3), then DEF a2 EQUA ARRAY(a1,4,5,6,ARRAY(7,8,9),10), and then a2 is ARRAY(1,2,3,4,5,6,7,8,9,10).

@Rangi42
Copy link
Contributor

Rangi42 commented Nov 12, 2021

A less serious but not entirely joking suggestion: once we have user-defined functions, we could add ARRAYMAP(arr, fn) to apply fn to each element of arr, and ARRAYFILTER(arr, fn) to select only the elements of arr for which fn returns nonzero/true. Or even ARRAYREDUCE(arr, fn, init=0) to apply a reducing function (e.g. if DEF plus(x, y) = x + y, then ARRAYREDUCE([1,2,3], plus) == 6, and ARRAYREDUCE([], plus, 42) == 42).

@aaaaaa123456789
Copy link
Member

aaaaaa123456789 commented Nov 12, 2021 via email

@Rangi42 Rangi42 added this to the v0.8.0 milestone Nov 3, 2023
@Rangi42 Rangi42 removed this from the v0.9.0 milestone Aug 6, 2024
@Rangi42
Copy link
Contributor

Rangi42 commented Aug 21, 2024

Updated syntax for the very basic arrays in the original feature request:

MACRO def_array
	def \1#LEN equ _NARG - 1
	for idx, \1#LEN
		def argi = idx + 2
		def \1#{d:idx} equ \<argi>
	endr
ENDM

	def_array MOVES, $24, $3a
	db MOVES#LEN
	db MOVES#0, MOVES#1

MACRO def_str_array
	def \1#LEN equ _NARG - 1
	for idx, \1#LEN
		def argi = idx + 2
		def \1#{d:idx} equs \<argi>
	endr
ENDM

	def_str_array POKEMON_NAMES, "BULBASAUR", "IVYSAUR"
	def STARTER1 equ 0
	db "So you want {POKEMON_NAMES#{d:STARTER1}}?@"

Extensions like appending to an array are also pretty easy to implement:

MACRO append_array
	def \1#{d:\1#LEN} equ \2
	redef \1#LEN equ \1#LEN + 1
ENDM

	append_array MOVES, $f0
	assert MOVES#LEN == 3
	assert MOVES#2 == $f0

MACRO append_str_array
	def \1#{d:\1#LEN} equs \2
	redef \1#LEN equ \1#LEN + 1
ENDM

	append_str_array POKEMON_NAMES, "VENUSAUR"
	assert POKEMON_NAMES#LEN == 3
	assert !strcmp("{POKEMON_NAMES#2}", "VENUSAUR")

(The possibilities for advanced features -- concatenating, searching, sorting, reversing, function-mapping, shuffling -- get increasingly more like a "real programming language" than "genuinely useful utilities for assembly metaprogramming", and I'm not sure it's worth dedicating language syntax/code/testing/maintenance to them.)

@Rangi42
Copy link
Contributor

Rangi42 commented Aug 22, 2024

Are there any examples of lists/arrays in other assemblers? Prior art that we could get inspiration and use cases from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM
Projects
None yet
Development

No branches or pull requests

5 participants