Skip to content

Commit

Permalink
Standardize builtin Bytes type
Browse files Browse the repository at this point in the history
 * Currently only base-16 represenation
 * No functions to do something meaningful with Bytes values

Fixes dhall-lang#1179
  • Loading branch information
mmhat committed Feb 10, 2023
1 parent d605c7a commit 6adfaa5
Show file tree
Hide file tree
Showing 11 changed files with 169 additions and 13 deletions.
30 changes: 25 additions & 5 deletions standard/Parser.hs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ import Data.Text (Text)
import Data.Void (Void)
import Numeric.Natural (Natural)
import Prelude hiding (exponent, takeWhile)
import Text.Megaparsec.Char (char)
import Text.Megaparsec.Char (char, string)

import Syntax
( Builtin(..)
Expand Down Expand Up @@ -58,6 +58,8 @@ import Text.Megaparsec
import qualified Control.Monad.Combinators.NonEmpty as Combinators.NonEmpty
import qualified Crypto.Hash as Hash
import qualified Data.ByteArray.Encoding as ByteArray.Encoding
import qualified Data.ByteString.Char8 as ByteString8
import qualified Data.ByteString.Base16 as Base16
import qualified Data.Char as Char
import qualified Data.List.NonEmpty as NonEmpty
import qualified Data.Map as Map
Expand Down Expand Up @@ -448,6 +450,19 @@ interpolation = do
textLiteral :: Parser TextLiteral
textLiteral = doubleQuoteLiteral <|> singleQuoteLiteral

bytesBase16Literal :: Parser ByteString
bytesBase16Literal = do
string "0x\""

chunks <- many (satisfy hexDig)

char '"'

return (Base16.decodeBase16Lenient $ ByteString8.pack chunks)

bytesLiteral :: Parser ByteString
bytesLiteral = bytesBase16Literal

reservedKeywords :: [Text]
reservedKeywords =
[ "if"
Expand Down Expand Up @@ -542,8 +557,8 @@ forallKeyword = void "forall"
forallSymbol :: Parser ()
forallSymbol = void ""

forall :: Parser ()
forall = forallSymbol <|> forallKeyword
forall_ :: Parser ()
forall_ = forallSymbol <|> forallKeyword

with :: Parser ()
with = void "with"
Expand Down Expand Up @@ -584,6 +599,7 @@ builtin =
<|> _Integer
<|> _Double
<|> _Text
<|> _Bytes
<|> _List
<|> _Date
<|> _TimeZone
Expand Down Expand Up @@ -682,6 +698,9 @@ _Double = do "Double"; return Double
_Text :: Parser Builtin
_Text = do "Text"; return Text

_Bytes :: Parser Builtin
_Bytes = do "Bytes"; return Bytes

_List :: Parser Builtin
_List = do "List"; return List

Expand Down Expand Up @@ -1474,7 +1493,7 @@ hash = do
let base16 = Text.Encoding.encodeUtf8 (Text.pack hexDigits)

bytes <- case ByteArray.Encoding.convertFromBase Base16 base16 of
Left string -> fail string
Left err -> fail err
Right bytes -> return (bytes :: ByteString)

case Hash.digestFromByteString bytes of
Expand Down Expand Up @@ -1564,7 +1583,7 @@ expression =

return (foldr (\(x, mA, a) -> Let x mA a) b bindings)
)
<|> (do forall
<|> (do forall_

whsp

Expand Down Expand Up @@ -1972,6 +1991,7 @@ primitiveExpression =
<|> (do n <- naturalLiteral; return (NaturalLiteral n))
<|> (do n <- integerLiteral; return (IntegerLiteral n))
<|> (do t <- textLiteral; return (TextLiteral t))
<|> (do x <- bytesLiteral; return (BytesLiteral x))
<|> (do "{"

whsp
Expand Down
2 changes: 2 additions & 0 deletions standard/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ s'' : A multi-line `Text` literal with only one line
''
ss
s'' : A multi-ine `Text` literal with more than one line
0x"0123456789abcdef" : A base16-encoded `Bytes` literal
```

You will see this notation in judgments that perform induction on lists,
Expand Down
9 changes: 9 additions & 0 deletions standard/alpha-normalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,15 @@ alphaNormalize (TextLiteral (Chunks xys₀ z)) = TextLiteral (Chunks xys₁ z)
```


───────────────────────────────────────────
0x"0123456789abcdef" ↦ 0x"0123456789abcdef"


```haskell
alphaNormalize (BytesLiteral xs) = BytesLiteral xs
```


───────
{} ↦ {}

Expand Down
24 changes: 24 additions & 0 deletions standard/beta-normalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -1058,6 +1058,30 @@ betaNormalize (Builtin TextShow ) = Builtin TextShow
betaNormalize (Builtin TextReplace) = Builtin TextReplace
```

## `Bytes`

The `Bytes` type is in normal form:


─────────────
Bytes ⇥ Bytes


```haskell
betaNormalize (Builtin Bytes) = Builtin Bytes
```

An `Bytes` literal is in normal form:


───────────────────────────────────────────
0x"0123456789abcdef" ⇥ 0x"0123456789abcdef"


```haskell
betaNormalize (BytesLiteral xs) = BytesLiteral xs
```

## `List`

The `List` type-level function is in normal form:
Expand Down
56 changes: 53 additions & 3 deletions standard/binary.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,10 @@ matching their identifier.
encode(Text) = "Text"


───────────────────────
encode(Bytes) = "Bytes"


─────────────────────
encode(List) = "List"

Expand Down Expand Up @@ -410,6 +414,7 @@ encode (Builtin Natural ) = TString "Natural"
encode (Builtin Integer ) = TString "Integer"
encode (Builtin Double ) = TString "Double"
encode (Builtin Text ) = TString "Text"
encode (Builtin Bytes ) = TString "Bytes"
encode (Builtin List ) = TString "List"
encode (Builtin Date ) = TString "Date"
encode (Builtin Time ) = TString "Time"
Expand Down Expand Up @@ -1050,6 +1055,21 @@ odd elements being strings and the even ones being interpolated expressions.
Note: this means that the first and the last encoded elements are always strings,
even if they are empty strings.

### `Bytes`

Encode `Bytes` literals by decoding it from its encoded representation to a CBOR
Major-2 byte string:


base16decode(0x"0123456789abcdef") = rawBytes
──────────────────────────────────────────────────
encode(0x"0123456789abcdef") = [ 33, b"rawBytes" ]


```haskell
encode (BytesLiteral xs) = TList [ TInt 33, TBytes xs ]
```

### `assert`

An assertion encoded its own annotation (regardless of whether the annotation is
Expand Down Expand Up @@ -1079,6 +1099,7 @@ Imports are encoded as a list where the first three elements are always:
* The import type is `0` for when importing a Dhall expression (the default)
* The import type is `1` for importing `as Text`
* The import type is `2` for importing `as Location`
* The import type is `3` for importing `as Bytes`

For example, if an import does not specify an integrity check or import type
then the CBOR expression begins with:
Expand Down Expand Up @@ -1203,10 +1224,19 @@ instead of `0`:
encode(import as Text) = [ 24, x, 1, xs… ]


If you import `as Location`, then the third element encoding the import type is `2`
If you import `as Bytes`, then the third element encoding the import type is `3`
instead of `0`:


encode(import) = [ 24, x, 0, xs… ]
───────────────────────────────────────────
encode(import as Bytes) = [ 24, x, 3, xs… ]


If you import `as Location`, then the third element encoding the import type is
`2` instead of `0`:


encode(import) = [ 24, x, 0, xs… ]
──────────────────────────────────────────
encode(import as Location) = [ 24, x, 2, xs… ]
Expand All @@ -1223,6 +1253,7 @@ encode (Import importType₀ importMode₀ hash₀) =
importMode₁ = case importMode₀ of
Code -> TInt 0
RawText -> TInt 1
RawBytes -> TInt 3
Location -> TInt 2

importType₁ = case importType₀ of
Expand Down Expand Up @@ -1354,7 +1385,7 @@ encode (With e₀ (k₀ :| ks₀) v₀) = TList [ TInt 29, e₁, TList (k₁ : k

k₁ = encodeKey k₀
ks₁ = map encodeKey ks₀

encodeKey DescendOptional = TInt 0
encodeKey (Label k) = TString k
```
Expand Down Expand Up @@ -1555,6 +1586,10 @@ a built-in identifier if it matches any of the following strings:
decode("Text") = Text


─────────────────────────
decode("Bytes") = Bytes


─────────────────────────
decode("List") = List

Expand Down Expand Up @@ -1972,6 +2007,16 @@ Decode a CBOR array beginning with a `18` as a `Text` literal:
decode([ 18, "a", b₁, "c", d₁, "e", …, "x", y₁, "z" ]) = "a${b₀}c${d}e…x${y₀}z"


### `Bytes`

Decode a CBOR array beginning with a `33` as a `Bytes` literal:


base16encode(rawBytes) = 0x"0123456789abcdef"
──────────────────────────────────────────────────
decode([ 33, b"rawBytes" ]) = 0x"0123456789abcdef"


### `assert`

Decode a CBOR array beginning with a `19` as an `assert`:
Expand All @@ -1984,7 +2029,7 @@ Decode a CBOR array beginning with a `19` as an `assert`:

### Imports

Decode a CBOR array beginning with a `24` as an import
Decode a CBOR array beginning with a `24` as an import.

The decoding rules are the exact opposite of the encoding rules:

Expand Down Expand Up @@ -2034,6 +2079,11 @@ The decoding rules are the exact opposite of the encoding rules:
decode([ 24, x, 1, xs… ]) = import as Text


decode([ 24, x, 0, xs… ]) = import
───────────────────────────────────────────
decode([ 24, x, 3, xs… ]) = import as Bytes


decode([ 24, x, 0, xs… ]) = import
──────────────────────────────────────────────
decode([ 24, x, 2, xs… ]) = import as Location
Expand Down
8 changes: 8 additions & 0 deletions standard/dhall.abnf
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,11 @@ interpolation = "${" complete-expression "}"

text-literal = (double-quote-literal / single-quote-literal)

; %x22 = '"'
bytes-base16-literal = "0x" %x22 *HEXDIGIT %x22

bytes-literal = bytes-base16-literal

; RFC 5234 interprets string literals as case-insensitive and recommends using
; hex instead for case-sensitive strings
;
Expand Down Expand Up @@ -942,6 +947,9 @@ primitive-expression =
; '"ABC"'
/ text-literal
;
; '0x"01234567689abcdef"'
/ bytes-literal
;
; "{ foo = 1 , bar = True }"
; "{ foo : Integer, bar : Bool }"
/ "{" whsp [ "," whsp ] record-type-or-literal whsp "}"
Expand Down
23 changes: 20 additions & 3 deletions standard/imports.md
Original file line number Diff line number Diff line change
Expand Up @@ -677,11 +677,28 @@ If an import ends with `as Text`, import the raw contents of the file as a


Carefully note that `"s"` in the above judgment is a Dhall `Text` literal. This
implies that if you an import an expression as `Text` and you also protect the
import with a semantic integrity check then the you encode the string literal
as a Dhall expression and then hash that. The semantic integrity check is not a
implies that if you import an expression as `Text` and you also protect the
import with a semantic integrity check then you encode the string literal
as a Dhall expression and hash that. The semantic integrity check is not a
hash of the raw underlying text.

If an import ends with `as Bytes`, import the raw contents of the file as a
`Bytes` value instead of importing the file a Dhall expression:


parent </> import₀ = import₁
canonicalize(import₁) = child
referentiallySane(parent, child)
Γ(child) = 0x"0123456789abcdef" using responseHeaders ; Read the raw contents of the file
corsCompliant(parent, child, responseHeaders)
─────────────────────────────────────────────────────────────
(Δ, parent) × Γ ⊢ import₀ as Bytes ⇒ 0x"0123456789abcdef" ⊢ Γ


Similar to the `as Text` import, note that `0x"0123456789abcdef"` in the above
judgment is a Dhall `Bytes` literal and the same observation concerning semantic
integraty checks applies: The semantic integrity check is not a hash of the raw
underlying byte string but the one of the encoded Dhall expression.

If an import ends with `as Location`, import its location as a value of type
`< Local : Text | Remote : Text | Environment : Text | Missing >` instead of
Expand Down
9 changes: 9 additions & 0 deletions standard/shift.md
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,15 @@ shift d x m (TextLiteral (Chunks xys₀ z)) = TextLiteral (Chunks xys₁ z)
t₁ = shift d x m t₀
```


───────────────────────────────────────────────────────
↑(d, x, m, 0x"0123456789abcdef") = 0x"0123456789abcdef"


```haskell
shift _d _x _m (BytesLiteral xs) = BytesLiteral xs
```

───────────────────
↑(d, x, m, {}) = {}

Expand Down
6 changes: 4 additions & 2 deletions standard/standard.cabal
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cabal-version: >=1.10
cabal-version: >=2.0
name: standard
version: 1.0.0
synopsis: Literate Haskell semantics for Dhall
Expand All @@ -10,12 +10,12 @@ build-type: Simple

library
build-depends: base
, base16
, bytestring
, cborg
, containers
, cryptonite
, half
, markdown-unlit
, megaparsec
, memory
, parser-combinators
Expand All @@ -39,6 +39,8 @@ library
, GeneralizedNewtypeDeriving
, OverloadedStrings
, TypeFamilies
, TypeOperators
build-tool-depends: markdown-unlit:markdown-unlit
ghc-options: -pgmL markdown-unlit -Wall -Werror

executable dhall
Expand Down
9 changes: 9 additions & 0 deletions standard/substitution.md
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,15 @@ substitute (TextLiteral (Chunks xys₀ z)) x n e = TextLiteral (Chunks xys₁ z)
```


────────────────────────────────────────────────────
0x"0123456789abcdef"[x@n ≔ e] = 0x"0123456789abcdef"


```haskell
substitute (BytesLiteral xs) _x _n _e = BytesLiteral xs
```


────────────────
{}[x@n ≔ e] = {}

Expand Down
Loading

0 comments on commit 6adfaa5

Please sign in to comment.