Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CDDL grammar correction for RFC8610 #61

Merged
merged 43 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
37643c2
feat: example files
apskhem Jan 12, 2024
217c16f
refactor: rename test files
apskhem Jan 15, 2024
40bda27
refactor: rename from rfc9615 to rfc9165
apskhem Jan 15, 2024
45dfa8f
feat: error display for testing multiple files
apskhem Jan 15, 2024
19aa070
feat: sort file reading
apskhem Jan 15, 2024
b94d969
fix: whitespace problem for ',', braces, and //
apskhem Jan 16, 2024
b0607d3
feat: group elements initial unit test
apskhem Jan 16, 2024
4267afd
fix: comments
apskhem Jan 16, 2024
10884a6
fix: grammar correction for comments
apskhem Jan 16, 2024
f2c9e22
fix: grammar correcttion according to abnf
apskhem Jan 17, 2024
5dd9c20
fix: remove genericarg from typename and groupname
apskhem Jan 17, 2024
1c0bf32
fix: unit tests
apskhem Jan 17, 2024
1e7d1ce
feat: initial group_elements test cases
apskhem Jan 17, 2024
8e1cc71
fix: put atomic test back
apskhem Jan 17, 2024
7c9fa06
revert: deleted grammar and tests
apskhem Jan 18, 2024
f0cf1e6
fix: correct test cases
apskhem Jan 19, 2024
3c2bbd3
feat: type decl tests
apskhem Jan 22, 2024
0682361
feat: type decl tests
apskhem Jan 22, 2024
a2d738c
feat: setup type decl test
apskhem Jan 22, 2024
a3410f6
feat: type1 test cases
apskhem Jan 22, 2024
68e61b4
feat: composition testing
apskhem Jan 22, 2024
0c5942a
feat: rules test cases
apskhem Jan 22, 2024
0754589
feat: rules test cases
apskhem Jan 23, 2024
58ec3ac
feat: all examples from rfc8610
apskhem Jan 23, 2024
e72029e
feat: add rule level tests
apskhem Jan 24, 2024
4cc1736
refactor: use general passes and fails function in unit test
apskhem Jan 24, 2024
e1bf0ba
fix: error msg for cddl
apskhem Jan 24, 2024
d391c81
fix: cddl test file name reading
apskhem Jan 24, 2024
6da42d3
fix: cspell
apskhem Jan 24, 2024
dd7f8a4
chore: lintfix
apskhem Jan 24, 2024
07eed72
refactor: cddl filter fn
apskhem Jan 24, 2024
edf532e
refactor: reset
apskhem Jan 24, 2024
4a77203
fix: pub(crate) for unit tests
apskhem Jan 25, 2024
321aaaa
chore: fmtfix
apskhem Jan 25, 2024
3689b59
refactor: don't dry util functions
apskhem Jan 25, 2024
0dca0dc
fix: disable lint warning
apskhem Jan 25, 2024
dd11923
chore: fmtfix
apskhem Jan 25, 2024
05f863b
refactor: change common path
apskhem Jan 25, 2024
29bef0e
feat: common
apskhem Jan 25, 2024
188cb15
refactor: remove tmp vars
apskhem Jan 25, 2024
5140506
feat: common consts
apskhem Jan 26, 2024
353fa21
fix: add tmp allow dead code
apskhem Jan 26, 2024
ac3de3a
fix: cspell
apskhem Jan 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 32 additions & 14 deletions hermes/crates/cbork/cddl-parser/src/grammar/cddl_test.pest
Original file line number Diff line number Diff line change
Expand Up @@ -5,38 +5,56 @@

// cspell: words intfloat hexfloat

/// Test Expression for the S Rule.
/// Test Expression for the `group` Rule.
group_TEST = ${ SOI ~ group ~ EOI }

/// Test Expression for the `grpchoice` Rule.
grpchoice_TEST = ${ SOI ~ grpchoice ~ EOI }

/// Test Expression for the `grpent` Rule.
grpent_TEST = ${ SOI ~ grpent ~ EOI }

/// Test Expression for the `memberkey` Rule.
memberkey_TEST = ${ SOI ~ memberkey ~ EOI }

/// Test Expression for the `bareword` Rule.
bareword_TEST = ${ SOI ~ bareword ~ EOI }

/// Test Expression for the `optcom` Rule.
optcom_TEST = ${ SOI ~ optcom ~ EOI }

/// Test Expression for the `occur` Rule.
occur_TEST = ${ SOI ~ occur ~ EOI }

/// Test Expression for the `S` Rule.
S_TEST = ${ SOI ~ S ~ EOI }

/// Test Expression for the COMMENT Rule.
/// Test Expression for the `COMMENT` Rule.
COMMENT_TEST = { SOI ~ COMMENT* ~ EOI }

/// Test expression for the URL_BASE64 Rule.
URL_BASE64_TEST = { SOI ~ URL_BASE64 ~ EOI }

/// Test expression to the id Rule.
/// Test expression to the `id` Rule.
id_TEST = ${ SOI ~ id ~ EOI}

/// Test expression to the bytes Rule.
/// Test expression to the `bytes` Rule.
bytes_TEST = ${ SOI ~ bytes ~ EOI}

/// Test expression to the text Rule.
/// Test expression to the `text` Rule.
text_TEST = ${ SOI ~ text ~ EOI}

/// Test expression to the uint Rule.
/// Test expression to the `uint` Rule.
uint_TEST = ${ SOI ~ uint ~ EOI}

/// Test expression to the int Rule.
/// Test expression to the `int` Rule.
int_TEST = ${ SOI ~ int ~ EOI}

/// Test expression to the intfloat Rule.
/// Test expression to the `intfloat` Rule.
intfloat_TEST = ${ SOI ~ intfloat ~ EOI}

/// Test expression to the hexfloat Rule.
/// Test expression to the `hexfloat` Rule.
hexfloat_TEST = ${ SOI ~ hexfloat ~ EOI}

/// Test expression to the number Rule.
/// Test expression to the `number` Rule.
number_TEST = ${ SOI ~ number ~ EOI}

/// Test expression to the value Rule.
/// Test expression to the `value` Rule.
value_TEST = ${ SOI ~ value ~ EOI}
127 changes: 52 additions & 75 deletions hermes/crates/cbork/cddl-parser/src/grammar/rfc_8610.pest
Original file line number Diff line number Diff line change
Expand Up @@ -5,73 +5,75 @@
// cspell: words genericarg rangeop ctlop grpchoice memberkey bareword hexfloat intfloat
// cspell: words SCHAR BCHAR PCHAR SESC FFFD Characterset Visiable

cddl = {
cddl = ${
SOI
~ S ~ rule+
~ S ~ (rule ~ S)+
~ EOI
}

rule = {
( typename ~ assignt ~ type)
| ( groupname ~ assigng ~ grpent)
// -----------------------------------------------------------------------------
// Rules
rule = ${
(typename ~ S ~ assignt ~ S ~ type)
| (groupname ~ S ~ assigng ~ S ~ grpent)
}

typename = ${ id ~ genericparm? }
groupname = ${ id ~ genericparm? }
typename = { id }
groupname = { id }

assignt = { "=" | "/=" }
assigng = { "=" | "//=" }

genericparm = { "<" ~ id ~ ( "," ~ id )* ~ ">" }
genericarg = { "<" ~ type1 ~ ( "," ~ type1)* ~ ">" }

type = { type1 ~ ( S ~ "/" ~ type1)* }

type1 = { type2 ~ ( S ~ ( rangeop | ctlop ) ~ type2)? }
genericparm = ${ "<" ~ S ~ id ~ S ~ ("," ~ S ~ id ~ S)* ~ ">" }
genericarg = ${ "<" ~ S ~ type1 ~ S ~ ("," ~ S ~ type1 ~ S)* ~ ">" }

typename_arg = ${ typename ~ genericarg? }
groupname_arg = ${ groupname ~ genericarg? }
// -----------------------------------------------------------------------------
// Type Declaration
type = ${ type1 ~ (S ~ "/" ~ S ~ type1)* }

tag6 = ${ "#" ~ "6" ~ ("." ~ uint)? ~ "(" ~ S ~ type ~ S ~ ")" }
tag_generic = ${ "#" ~ ASCII_DIGIT ~ ("." ~ uint)? }
type1 = ${ type2 ~ (S ~ (rangeop | ctlop) ~ S ~ type2)? }

type2 = {
type2 = ${
value
| typename_arg
| ( "(" ~ type ~ ")" )
| ( "{" ~ group ~ "}" )
| ( "[" ~ group ~ "]" )
| ( "~" ~ typename_arg )
| ( "&" ~ "(" ~ group ~ ")" )
| ( "&" ~ groupname_arg )
| tag6
| tag_generic
| typename ~ genericarg?
| ("(" ~ S ~ type ~ S ~ ")")
| ("{" ~ S ~ group ~ S ~ "}")
| ("[" ~ S ~ group ~ S ~ "]")
| ("~" ~ S ~ typename ~ genericarg?)
| ("&" ~ S ~ "(" ~ S ~ group ~ S ~ ")")
| ("&" ~ S ~ groupname ~ genericarg?)
| ("#" ~ "6" ~ ("." ~ uint)? ~ "(" ~ S ~ type ~ S ~ ")")
| ("#" ~ ASCII_DIGIT ~ ("." ~ uint)?)
| "#"
}

rangeop = { "..." | ".." }
ctlop = ${ "." ~ id }

group = { grpchoice ~ ( S ~ "//" ~ grpchoice)* }
// -----------------------------------------------------------------------------
// Group Elements
group = ${ grpchoice ~ (S ~ "//" ~ S ~ grpchoice)* }

grpchoice = { ( grpent ~ ","? )* }
grpchoice = ${ (grpent ~ optcom)* }

grpent = ${
( (occur ~ S)? ~ (memberkey ~ S)? ~ type )
| ( (occur ~ S)? ~ groupname ~ genericarg? )
| ( (occur ~ S)? ~ "(" ~ S ~ group ~ S ~ ")" )
((occur ~ S)? ~ (memberkey ~ S)? ~ type)
| ((occur ~ S)? ~ groupname ~ genericarg?)
| ((occur ~ S)? ~ "(" ~ S ~ group ~ S ~ ")")
}

memberkey = {
( type1 ~ "^"? ~ "=>" )
| ( bareword ~ ":" )
| ( value ~ ":" )
memberkey = ${
(type1 ~ S ~ ("^" ~ S)? ~ "=>")
| ((value | bareword) ~ S ~ ":")
}

bareword = { id }

/// Optional Comma - Note eligible for producing pairs as this might be useful for linting
optcom = { S ~ ("," ~ S)? }

occur = {
( uint? ~ "*" ~ uint? )
(uint? ~ "*" ~ uint?)
| "+"
| "?"
}
Expand All @@ -82,7 +84,7 @@ occur = {
/// All Literal Values
value = { number | text | bytes }

/// Literal Numbers - A float if it has fraction or exponent; int otherwise
/// Literal Numbers - A float if it has fraction or exponent; int otherwise
number = { hexfloat | intfloat }

/// Hex floats of the form -0x123.abc0p+12
Expand All @@ -103,58 +105,33 @@ int = ${ "-"? ~ uint }

/// Unsigned Integers
uint = ${
( ASCII_NONZERO_DIGIT ~ ASCII_DIGIT* )
| ( "0x" ~ ASCII_HEX_DIGIT+ )
| ( "0b" ~ ASCII_BIN_DIGIT+ )
(ASCII_NONZERO_DIGIT ~ ASCII_DIGIT*)
| ("0x" ~ ASCII_HEX_DIGIT+)
| ("0b" ~ ASCII_BIN_DIGIT+)
| "0"
}

/// Literal Text
text = ${ "\"" ~ SCHAR* ~ "\"" }

/// Literal Bytes - Note CDDL Spec incorrectly defines b64''.
bytes = ${ bytes_hex | bytes_b64 | bytes_text }
bytes_hex = ${ "h" ~ "'" ~ HEX_PAIR* ~ "'" }
bytes_b64 = ${ "b64" ~ "'" ~ URL_BASE64 ~ "'" }
bytes_text = ${ "'" ~ BCHAR* ~ "'" }
/// Literal Bytes.
stevenj marked this conversation as resolved.
Show resolved Hide resolved
bytes = ${ bsqual? ~ "'" ~ BCHAR* ~ "'" }
bsqual = { "h" | "b64" }

// -----------------------------------------------------------------------------
// Simple multiple character sequences

/// identifier, called the `name` in the CDDL spec.
id = ${
group_socket |
type_socket |
name
}

/// Special form of a name that represents a Group Socket.
group_socket = ${ "$$" ~ ( ( "-" | "." )* ~ NAME_END )* }
/// Special form of a name that represents a Type Socket.
type_socket = ${ "$" ~ ( ( "-" | "." )* ~ NAME_END )* }
/// General form of a name.
name = ${ NAME_START ~ ( ( "-" | "." )* ~ NAME_END )* }

/// A pair of hex digits. (Must always have even numbers of hex digits.)
HEX_PAIR = _{ S ~ ASCII_HEX_DIGIT ~ S ~ ASCII_HEX_DIGIT ~ S }

/// Whitespace is allowed and is ignored.
/// This token will keep the whitespace, so it will need to handled when converted to binary.
URL_BASE64 = _{ S ~ ( URL_BASE64_ALPHA ~ S)* ~ URL_BASE64_PAD? }
/// identifier, called the `name` in the CDDL spec.
id = ${ NAME_START ~ (("-" | ".")* ~ NAME_END)* }
stevenj marked this conversation as resolved.
Show resolved Hide resolved


// -----------------------------------------------------------------------------
// Characters, Whitespace and Comments

S = _{ WHITESPACE* }
WHITESPACE = _{ " " | "\t" | NEWLINE }
WHITESPACE = _{ " " | "\t" | COMMENT | NEWLINE }
stevenj marked this conversation as resolved.
Show resolved Hide resolved
COMMENT = _{ ";" ~ (PCHAR | "\t")* ~ NEWLINE }

// URL Base64 Characterset.
URL_BASE64_ALPHA = _{ ASCII_ALPHA | ASCII_DIGIT | "-" | "_" }
// Optional Padding that goes at the end of Base64.
URL_BASE64_PAD = _{ "~" }

apskhem marked this conversation as resolved.
Show resolved Hide resolved
// Identifier Name Character sets.

/// A name can start with an alphabetic character (including "@", "_", "$")
Expand All @@ -172,12 +149,12 @@ PCHAR = _{ ASCII_VISIBLE | UNICODE_CHAR }
SCHAR = _{ SCHAR_ASCII_VISIBLE | UNICODE_CHAR | SESC }

/// The set of characters valid for a byte string.
BCHAR = _{ BCHAR_ASCII_VISIBLE | UNICODE_CHAR | SESC | NEWLINE }
BCHAR = _{ '\u{20}'..'\u{26}' | '\u{28}'..'\u{5B}' | '\u{5D}'..'\u{10FFFD}' | SESC | NEWLINE }
apskhem marked this conversation as resolved.
Show resolved Hide resolved

/// Escaping code to allow invalid characters to be used in text or byte strings.
SESC = ${ "\\" ~ (ASCII_VISIBLE | UNICODE_CHAR) }

/// All Visiable Ascii characters.
/// All Visible Ascii characters.
ASCII_VISIBLE = _{ ' '..'~' }

/// Ascii subset valid for text strings.
Expand All @@ -187,4 +164,4 @@ SCHAR_ASCII_VISIBLE = _{ ' '..'!' | '#'..'[' | ']'..'~' }
BCHAR_ASCII_VISIBLE = _{ ' '..'&' | '('..'[' | ']'..'~' }

/// Valid non ascii unicode Characters
UNICODE_CHAR = _{ '\u{80}'..'\u{10FFFD}' }
UNICODE_CHAR = _{ '\u{80}'..'\u{10FFFD}' }
31 changes: 14 additions & 17 deletions hermes/crates/cbork/cddl-parser/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ pub mod rfc_8610 {
pub struct RFC8610Parser;
}

pub mod rfc_9615 {
pub mod rfc_9165 {
pub use pest::Parser;

#[derive(pest_derive::Parser)]
#[grammar = "grammar/rfc_8610.pest"]
#[grammar = "grammar/rfc_9615.pest"]
#[grammar = "grammar/rfc_9165.pest"]
pub struct RFC8610Parser;
}

Expand All @@ -34,7 +34,7 @@ pub mod cddl {

#[derive(pest_derive::Parser)]
#[grammar = "grammar/rfc_8610.pest"]
#[grammar = "grammar/rfc_9615.pest"]
#[grammar = "grammar/rfc_9165.pest"]
#[grammar = "grammar/cddl_modules.pest"]
pub struct RFC8610Parser;
}
Expand All @@ -45,7 +45,7 @@ pub mod cddl_test {
// Parser with DEBUG rules. These rules are only used in tests.
#[derive(pest_derive::Parser)]
#[grammar = "grammar/rfc_8610.pest"]
#[grammar = "grammar/rfc_9615.pest"]
#[grammar = "grammar/rfc_9165.pest"]
#[grammar = "grammar/cddl_modules.pest"]
#[grammar = "grammar/cddl_test.pest"] // Ideally this would only be used in tests.
pub struct CDDLTestParser;
Expand All @@ -55,9 +55,9 @@ pub mod cddl_test {
pub enum Extension {
/// RFC8610 ONLY limited parser.
RFC8610Parser,
/// RFC8610 and RFC9615 limited parser.
RFC9615Parser,
/// RFC8610, RFC9615, and CDDL modules.
/// RFC8610 and RFC9165 limited parser.
RFC9165Parser,
/// RFC8610, RFC9165, and CDDL modules.
CDDLParser,
}

Expand All @@ -68,7 +68,7 @@ pub const POSTLUDE: &str = include_str!("grammar/postlude.cddl");
#[derive(Debug)]
pub enum AST<'a> {
RFC8610(Pairs<'a, rfc_8610::Rule>),
RFC9615(Pairs<'a, rfc_9615::Rule>),
RFC9165(Pairs<'a, rfc_9165::Rule>),
CDDL(Pairs<'a, cddl::Rule>),
}

Expand All @@ -77,8 +77,8 @@ pub enum AST<'a> {
pub enum CDDLErrorType {
/// An error related to RFC 8610 extension.
RFC8610(Error<rfc_8610::Rule>),
/// An error related to RFC 9615 extension.
RFC9615(Error<rfc_9615::Rule>),
/// An error related to RFC 9165 extension.
RFC9165(Error<rfc_9165::Rule>),
/// An error related to CDDL modules extension.
CDDL(Error<cddl::Rule>),
}
Expand Down Expand Up @@ -126,10 +126,10 @@ pub fn parse_cddl<'a>(
.map(AST::RFC8610)
.map_err(CDDLErrorType::RFC8610)
},
Extension::RFC9615Parser => {
rfc_9615::RFC8610Parser::parse(rfc_9615::Rule::cddl, input)
.map(AST::RFC9615)
.map_err(CDDLErrorType::RFC9615)
Extension::RFC9165Parser => {
rfc_9165::RFC8610Parser::parse(rfc_9165::Rule::cddl, input)
.map(AST::RFC9165)
.map_err(CDDLErrorType::RFC9165)
},
Extension::CDDLParser => {
cddl::RFC8610Parser::parse(cddl::Rule::cddl, input)
Expand All @@ -139,9 +139,6 @@ pub fn parse_cddl<'a>(
};

result.map(Box::new).map_err(|e| {
println!("{e:?}");
println!("{e}");

Box::new(CDDLError::from(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CDDLError type is redundant in this case. And we can replace with the CDDLErrorType (previously rename it to CDDLError).
Also returning Box here does not make any sense and, we can return just a CDDLError here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are issues with lifetimes when data in the AST is not boxed, and this can include the error.
But @apskhem can validate this.

Copy link
Collaborator Author

@apskhem apskhem Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had a technical problem here to have both CDDLError and CDDLErrorType due to the generic trait for rfc_8610::Rule, rfc_9165::Rule, and cddl::Rule cannot be dynamically storing into the struct. It raised this error:

Screenshot 2024-01-08 at 19 03 38

})
}
Expand Down
Loading
Loading