Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ParseData type #508

Merged
merged 27 commits into from
Sep 17, 2024
Merged

Add ParseData type #508

merged 27 commits into from
Sep 17, 2024

Conversation

lionel-
Copy link
Contributor

@lionel- lionel- commented Sep 9, 2024

This new type wraps the data frame of parse information generated by the R parser when:

  • A source file environment is provided to the parser (which is the case at R level when getOption("keep.source") is TRUE).
  • getOption("keep.parse.data") is TRUE (the default).

This parse data is interesting for the task of determining expression boundaries in a string because it is written by the parser even in the case of an incomplete or invalid input. Unfortunately there is no information about parse failures that we can use, but knowing the boundaries of valid expressions will prove very useful as it allows the following strategy:

  • Use parse data to figure out boundaries of complete expressions.
  • If there was an error, do a binary search with R_ParseVector() to figure out the boundaries of incomplete and invalid inputs in the last expression.

Progress towards posit-dev/positron#1326.

This work is supported by the following infrastructure improvements that are also part of this PR:

  • New DataFrame type that validates its input. Once constructed, the following guarantees are provided:

    • List storage and "data.frame" class
    • Dimensions and valid names
    • All columns are vectors and the same size as the data frame
  • There is a new Error::MissingColumnError returned from DataFrame::col() when an expected column is missing.

  • New harp::assert_class() returns a new Error::UnexpectedClass.

  • New harp::unreachable!() macro returns an Error::AnyhowError.

  • New .inherits(), .class(), and .dim() methods for RObject.

  • New coercion methods from &RObject to Vec<T> used to validate column types. These use a generic helper. I tried to integrate this helper in the Vector trait but couldn't solve compiler errors. Alternatively we could generate them with vector_macros but unless it saves a of work or substantially reduces the chance of undry typos we're probably better off with explicit code.

  • The iter() method is no longer generated by vector_macros, it's now part of the Vector trait. This allows generic implementations to create vector iterators.

  • The TryFrom methods for SEXP -> impl Vector now create empty vectors if passed NULL. This in turn is used in &RObject -> Vec<T> (this was necessary because the existing conversion method for RObject -> Vec<i32> now uses the new helper for &RObject and it supports NULL inputs, which seems reasonable enough for a conversion method).

@lionel- lionel- requested a review from DavisVaughan September 9, 2024 14:36
Comment on lines 26 to 33
// SAFETY: Protected by `list`
let obj = RObject::view(sexp);

let dim = unsafe { harp::df_dim(obj.sexp) }?;
let nrow = dim.num_rows as usize;
let ncol = list.obj.length() as usize;

let Some(names) = obj.names() else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels kind of weird. Can you not get names() out of a List? That seems to be the only need for the view()

Copy link
Contributor Author

@lionel- lionel- Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety note is about why we store an RObject view in DataFrame.obj.

I reworded a little to consistently use list.obj and moved the view later for construction.

Comment on lines 128 to +130
impl PartialEq for RObject {
// FIXME: At call sites, not obvious that this is doing identity comparisons.
// Can we require explicit `FOO.sexp == BAR.sexp`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like that

crates/harp/src/object.rs Outdated Show resolved Hide resolved
crates/harp/src/vector/list.rs Outdated Show resolved Hide resolved
@lionel- lionel- merged commit b7985c6 into main Sep 17, 2024
3 checks passed
@lionel- lionel- deleted the feature/parse-data branch September 17, 2024 15:14
@github-actions github-actions bot locked and limited conversation to collaborators Sep 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants