-
Notifications
You must be signed in to change notification settings - Fork 115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass!
@@ -0,0 +1,106 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
License blocks in all files, please.
extern crate peg; | ||
|
||
fn main() { | ||
peg::cargo_build("src/edn.rustpeg"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add .gitattributes
and a modeline to have the .rustpeg
file marked as Rust?
Details:
https://github.com/github/linguist/blob/master/README.md#using-gitattributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fairly explicitly not rust though. It happens to have a use
keyword that works like Rust's, along with vaguely Rust like code blocks (except that the braces must match, even in strings), and comments (except that they're 'C' style not Rust style, i.e. multi-line comments don't nest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I suppose I was thinking that Rust syntax highlighting, indenting, etc. would be better than nothing? Up to you.
There are some syntax highlighting plugins for it:
https://github.com/treycordova/rustpeg.vim
with filetype set filetype=rust.rustpeg
, so perhaps just add that as a Vim modeline and hope Linguist figures it out?
use std::cmp::{Ordering, Ord, PartialOrd}; | ||
use ordered_float::OrderedFloat; | ||
|
||
/// We're using BTree{Set, Map} rather than Hash{Set, Map} because the BTree variants implement Hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ///
, add a little bit of explanatory documentation about what this enum is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree (and I've made it better) That said, we don't really have a defined API yet, so I'm OK with better docs being lower priority.
} | ||
} | ||
|
||
// TODO: There has to be a better way to do `as i32` for Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not after FromPrimitive
was removed. You can unsafely transmute, or you can use e.g., the num
crate's automatically derived version, but it basically turns into this.
/// (unlike the Hash variants which don't in order to preserve O(n) hashing time which is hard given | ||
/// recurrsive data structures) | ||
/// See https://internals.rust-lang.org/t/implementing-hash-for-hashset-hashmap/3817/1 | ||
/// TODO: We should probably Box the collection types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Collection types are effectively boxes already, no? (That is, a Vec
is a relatively small copyable structure…)
Do you have some interesting heterogeneous trait object issue in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well TBH I'm copying @jimblandy who uses Object(Box<HashMap<String, Json>>)
in his JSON example (see enum chapter). Except that I note the reality is different.
@@ -0,0 +1,770 @@ | |||
// TODO: Can't we do this just for tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this file in edn/tests/
?
|
||
#[export] | ||
integer -> Value = i:$( sign? digit+ ) { | ||
Value::Integer(i.parse::<i32>().unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i32
is probably wrong. Clojure automatically walks the numeric tower:
user=> {:foo 12345423857458454}
{:foo 12345423857458454}
user=> {:foo 123454238574584543434343434343}
{:foo 123454238574584543434343434343N}
but at the very least we will want to support microsecond timestamps, which are bigger than a 32-bit signed int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not good enough at Rust enough to be able to make more than a stab at the right answer here.
The trivial thing would be s/i32/i64/g "because 64 bits should be enough for anyone". But then because it wouldn't, s/i64/i128/g or s/i64/BigInt/g which suddenly gets weird and or slow.
We could have the parser auto select the smallest type that fits. Except that it seems awkward if a parse of a well defined data based on a known schema results in different types depending on the data.
My currently thought is that i64 is safe, predictable, easily converted and good enough for 99.9% of cases, and that we should fail to parse for numbers bigger than i64. And that when this becomes a problem we allow global configuration of the number type to use, and when this becomes cumbersome we allow some fancy case-by-case configuration.
Or Rust might do something fancy to help us with this problem that I'm not aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i64
is a good default, and we might consider adding a BigInt
alongside it — even if we only stub out the enum case at this point, and don't parse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The edn format docs say:
64-bit (signed integer) precision is expected. An integer can have the suffix N to indicate that arbitrary precision is desired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
#[test] | ||
fn test_symbol() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also test the symbols .
and $
.
|
||
keyword_char_initial = ":" | ||
// TODO: More chars here? | ||
keyword_char_subsequent = [a-z] / [A-Z] / [0-9] / "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future correction: both keywords and symbols can contain /
, but only once, and result in a namespaced keyword. That is, :foo/bar
has namespace "foo"
and name "bar"
.
(There are similar rules around .
, which divides up the namespace into a hierarchy, but can also appear as part of the name.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,7 @@ | |||
# Generated by Cargo | |||
# will have compiled files and executables | |||
/target/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file shouldn't be needed - our toplevel gitignore should handle these paths (or be updated to handle them)
I've been using https://github.com/Marwes/combine to parse an EDN AST into a yet more specialized transaction AST. One of the nice things about While thinking about how to provide pleasing error messages that include input positions, I realized that I want my EDN input to includes its input positions as well. (Either that or I want to parse my own text, so that I can provide meaningful error messages that include input positions and ranges.) This may not be something we want to handle at this time, but how hard is it to include input ranges using |
In answer to ncalexan's question about retaining input positions, I think the answer is to include some sort of line+col / offset structure alongside each Value. I don't have a concrete plan, but I believe it should work. |
The parser mostly works and has a decent test suite. It parses all the queries issued by tofino-user-agent with some caveats. Known flaws: * No support for tagged elements, comments, discarded elements or "'" * Incomplete support for escaped characters in strings and the range of characters that are allowed in keywords and symbols * Possible whitespace handling problems
|
||
license = "Apache-2.0" | ||
repository = "https://github.com/mozilla/datomish" | ||
description = "EDN Parser for Datomish" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two 'Datomish' to replace.
@@ -0,0 +1,2 @@ | |||
# barnardsstar | |||
An experimental EDN parser for Datomish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
|
||
#[export] | ||
integer -> Value = i:$( sign? digit+ ) { | ||
Value::Integer(i.parse::<i64>().unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that unwrap
here might be a bad thing — if I provide a query like
[:find ?foo :in $ :where [_ _ 12345678901234567890123456789012345678901234567890]]
— forgetting the 'N' — the parser will panic. Now, we could build a strategy that always handles panics in the parser, allowing us to avoid error handling, but can we instead signal a failure to parse at this point?
@@ -8,4 +8,17 @@ | |||
// CONDITIONS OF ANY KIND, either express or implied. See the License for the | |||
// specific language governing permissions and limitations under the License. | |||
|
|||
pub mod keyword; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've unrooted the existing Keyword
module with this change…
include!(concat!(env!("OUT_DIR"), "/edn.rs")); | ||
} | ||
|
||
fn main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be in a lib.rs
.
Float(OrderedFloat<f64>), | ||
Text(String), | ||
Symbol(String), | ||
Keyword(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this should be Keyword(Keyword)
, with the second being a reference to the richer namespace/name-containing Keyword
type defined in keyword.rs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… but that's #154. So roll on for now.
The parser mostly works and has a decent test suite. It parses all the queries issued by the Tofino UAS, with some caveats. Known flaws: * No support for tagged elements, comments, discarded elements or "'". * Incomplete support for escaped characters in strings and the range of characters that are allowed in keywords and symbols. * Possible whitespace handling problems.
The parser mostly works and has a decent test suite. It parses all the queries issued by the Tofino UAS, with some caveats. Known flaws: * No support for tagged elements, comments, discarded elements or "'". * Incomplete support for escaped characters in strings and the range of characters that are allowed in keywords and symbols. * Possible whitespace handling problems.
The parser mostly works and has a decent test suite. It parses all the
queries issued by tofino-user-agent with some caveats. Known flaws:
characters that are allowed in keywords and symbols