⎕ucs 10 uglies still not fixed in ]link for character data #240
Replies: 6 comments 19 replies
-
Thanks for bringing this up, @bernecky |
Beta Was this translation helpful? Give feedback.
-
I'm slowly drifting in the direction of adding extensions to declare a particular kind of transformation on serialisation and deserialisation. I wonder whether anyone really uses "charmat"s anymore, but a .aplcv (char vec) and .aplvv (vector of vectors) or perhaps .strings extension to make it a little APL-agnostic seem to make sense as something we could add as a standard component in the next round of Link development. I think it would be very unfortunate to do anything other than "normalise" new lines to ⎕UCS 10, if that doesn't work well in current APL we should work on how APL handles it. |
Beta Was this translation helpful? Give feedback.
-
I would maintain that the differences between different types of dfns is trivial compared to the difference between a simple vector with embedded newlines and a vector of vectors. The only alternative to using different extensions seems to be some kind of extension to the array notation, which allows you to directly write text variables without any decorators - is that possible? |
Beta Was this translation helpful? Give feedback.
-
Without offering any opinion as to which way Link should be improved
I'll just do what Gil is asking.
We realised immediately that files containing
---------------
['phil''s'
'char'
'matrix']
------------
('phil''s'
'char'
'list')
-------
and
--------
'phil''s/rtext/rstring'
-----------
would be unacceptable given that array notation is supposed to be an
improvement and the data is explicit requiring only a couple of APL
primitives to turn the content of an ordinary text file into any of
them. Hence our three additional file extensions: .charlist, .charmat
and .charstring.
The outcome for strings of mixed delimiters is as follows
cr nl←⎕UCS 13 10
str←'phil''s',cr,nl,'char',cr,'string'
file str.charstring contains
-------------
phil's
\nchar
string
----------
str←'phil''s',nl,cr,'char',cr,'string'
file str.charstring contains
-------------
phil's\n
char\nstring
----------
We use ⎕ED, ⎕NPUT and ⎕NGET so the default delimiter is left to Dyalog
to dictate but acre uses CR as does Dyalog. The round trip is successful.
Adding "\r" in a .charstring file results in an additional break in the
string.
Beyond this my research is lacking as it's seemed to work as expected.
yours
Phil
…On 2021-03-16 09:25, Gilgamesh Athoraya wrote:
@bernecky <https://github.com/bernecky> 's input on his use cases is
welcome here and @PhilLast <https://github.com/PhilLast> might have some
more insight into the issues encountered with implementing the .char*
extensions in Acre.
My gut feeling is that the common scenarios of working with plain text
files outside the APL editor that in APL materialize as either matrix or
vector of vectors or as a simple string would be good. If you want a
matrix or vector of vectors in APL where items (cells in matrix or lines
in vov) contain embedded EOL characters then you have to resort to array
notation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#240 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIGXUEEQBCQIPUCNY22Z7Q3TD4PZFANCNFSM4ZHI2KCA>.
|
Beta Was this translation helpful? Give feedback.
-
Erratum: The second example should have been:
…----------
str←'phil''s',nl,cr,'char',nl,'string'
file str.charstring contains
-------------
phil's\n
char\nstring
----------
while
----------
str←'phil''s',nl,cr,'char',cr,'string'
file str.charstring /would/ contain
-------------
phil's\n
char
string
----------
P
On 2021-03-16 11:36, Phil Last wrote:
Without offering any opinion as to which way Link should be improved
I'll just do what Gil is asking.
We realised immediately that files containing
---------------
['phil''s'
'char'
'matrix']
------------
('phil''s'
'char'
'list')
-------
and
--------
'phil''s/rtext/rstring'
-----------
would be unacceptable given that array notation is supposed to be an
improvement and the data is explicit requiring only a couple of APL
primitives to turn the content of an ordinary text file into any of
them. Hence our three additional file extensions: .charlist, .charmat
and .charstring.
The outcome for strings of mixed delimiters is as follows
cr nl←⎕UCS 13 10
str←'phil''s',cr,nl,'char',cr,'string'
file str.charstring contains
-------------
phil's
\nchar
string
----------
str←'phil''s',nl,cr,'char',cr,'string'
file str.charstring contains
-------------
phil's\n
char\nstring
----------
We use ⎕ED, ⎕NPUT and ⎕NGET so the default delimiter is left to Dyalog
to dictate but acre uses CR as does Dyalog. The round trip is successful.
Adding "\r" in a .charstring file results in an additional break in the
string.
Beyond this my research is lacking as it's seemed to work as expected.
yours
Phil
On 2021-03-16 09:25, Gilgamesh Athoraya wrote:
> @bernecky <https://github.com/bernecky> 's input on his use cases is
> welcome here and @PhilLast <https://github.com/PhilLast> might have
> some more insight into the issues encountered with implementing the
> .char* extensions in Acre.
>
> My gut feeling is that the common scenarios of working with plain text
> files outside the APL editor that in APL materialize as either matrix
> or vector of vectors or as a simple string would be good. If you want
> a matrix or vector of vectors in APL where items (cells in matrix or
> lines in vov) contain embedded EOL characters then you have to resort
> to array notation.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#240 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AIGXUEEQBCQIPUCNY22Z7Q3TD4PZFANCNFSM4ZHI2KCA>.
>
>
|
Beta Was this translation helpful? Give feedback.
-
Ah, I see. My apologies for using the term "designed". Bob |
Beta Was this translation helpful? Give feedback.
-
I conflated two distinct issues into one. Mea culpa. Here is one of them:
And then, as they say in Russia, it got worse:
I am trying to use ]link to keep things under git. Here is
what a similar file looks like under bash, when maintained
by ]link:
The real data looks like this:
This is on a linux box, running the latest ]link, from the Ides of March: e626cb7
The treatment of LF (NL) makes it effectively impossible to use a normal text editor, or even
a grep that wants to know about NL, to operate on files resulting from ]link. They are also ugly as sin.
Uglier, actually.
Beta Was this translation helpful? Give feedback.
All reactions