You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is also the case in Plinth, for instance $$(compile [|| "\255" ||]) :: CompiledCode BuiltinByteString results in #c3bf. This is because GHC converts a String into a Literal containing a UTF-8 encoded ByteString. In this particular case, this is what the plugin sees: unpackCStringUtf8# "\195\191".
This is a problem, not only because of the inconsistency between ByteString and BuiltinByteString, but more importantly because it means there is no easy way to construct a BuiltinByteString literal in Plinth, where some bytes are between 128 and 255. Thus, it is better to adopt the ByteString's behavior.
To do so, we'll need to update both the IsString instance of BuiltinByteString, as well as updating the plugin.
Updating the instance is easy, because the plugin does not compile the unfolding of the instance, but instead has special logic to handle it directly. So we can write any Haskell code in the instance, which means we can simply delegate to ByteString's instance.
Regarding the plugin, we'll need to update the special logic that handles BuiltinByteString's IsString instance. There are two places, here and here. The first deals with stringToBuiltinByteString, and the second deals with fromString @BuiltinByteString.
In both places, we currently use stringExprContent to extract the literal ByteString from the CoreExpr. We need to update this extraction logic to detect unpackCStringUtf8# bs. Instead of returning bs, it should return fromString @ByteString . Text.unpack . Text.decodeUtf8' $ bs. Note that the extraction logic for BuiltinString should stay the same.
Property based tests should be added verifying that fromString for ByteString and BuiltinByteString behave consistently, as well as String and BuiltinString.
The text was updated successfully, but these errors were encountered:
This is a breaking change for anyone currently using the IsString instance, but probably nobody is? But anyway: if we're not going to fix it we should probably get rid of it, it's just a footgun.
This will work for actual overloaded string literals, but won't work for uses of fromString on other strings. That's fine - we never handle such things anyway since we can't deal with values of type String.
It seems kind of awkward to me to have this in the frontend but have no way of doing the corresponding Text/ByteString conversions in actual UPLC. It's hard to fix that without adding new builtins, though, which we probably don't want to do.
Have we considered asking people to create bytestring literals from lists of integers? e.g. [0x01, 0x02]. We could potentially even make this nice by using OverloadedLists.
The
IsString
instance of Haskell's ByteString converts a String to ByteString by:On the other hand, the
IsString
instance ofBuiltinByteString
uses UTF-8 encoding, which means it is only consistent withByteString
for Chars <= 127:This is also the case in Plinth, for instance
$$(compile [|| "\255" ||]) :: CompiledCode BuiltinByteString
results in#c3bf
. This is because GHC converts a String into aLiteral
containing a UTF-8 encodedByteString
. In this particular case, this is what the plugin sees:unpackCStringUtf8# "\195\191"
.This is a problem, not only because of the inconsistency between
ByteString
andBuiltinByteString
, but more importantly because it means there is no easy way to construct aBuiltinByteString
literal in Plinth, where some bytes are between 128 and 255. Thus, it is better to adopt theByteString
's behavior.To do so, we'll need to update both the
IsString
instance ofBuiltinByteString
, as well as updating the plugin.Updating the instance is easy, because the plugin does not compile the unfolding of the instance, but instead has special logic to handle it directly. So we can write any Haskell code in the instance, which means we can simply delegate to
ByteString
's instance.Regarding the plugin, we'll need to update the special logic that handles
BuiltinByteString
'sIsString
instance. There are two places, here and here. The first deals withstringToBuiltinByteString
, and the second deals withfromString @BuiltinByteString
.In both places, we currently use
stringExprContent
to extract the literal ByteString from theCoreExpr
. We need to update this extraction logic to detectunpackCStringUtf8# bs
. Instead of returningbs
, it should returnfromString @ByteString . Text.unpack . Text.decodeUtf8' $ bs
. Note that the extraction logic forBuiltinString
should stay the same.Property based tests should be added verifying that
fromString
forByteString
andBuiltinByteString
behave consistently, as well asString
andBuiltinString
.The text was updated successfully, but these errors were encountered: