Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate locations throughout the parser #790

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 27 additions & 23 deletions src/ast/ddl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ use sqlparser_derive::{Visit, VisitMut};

use crate::ast::value::escape_single_quote_string;
use crate::ast::{display_comma_separated, display_separated, DataType, Expr, Ident, ObjectName};
use crate::location::Located;
use crate::tokenizer::Token;

/// An `ALTER TABLE` (`Statement::AlterTable`) operation
Expand All @@ -46,12 +47,12 @@ pub enum AlterTableOperation {
/// `DROP CONSTRAINT [ IF EXISTS ] <name>`
DropConstraint {
if_exists: bool,
name: Ident,
name: Located<Ident>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of wrapping this in Located<>?

Another alternative would be to add a location field to the Ident struct like

pub struct Ident {
    /// The value of the identifier without quotes.
    pub value: String,
    /// The starting quote if any. Valid quote characters are the single quote,
    /// double quote, backtick, and opening square bracket.
    pub quote_style: Option<char>,
    /// location in the source tree. <----- New Field
    location: Location,
}

Both will be breaking changes, but I am thinking to a future where there is location information on all structs -- if all fields all wrapped in Location<..> it is going to be a lot of replicated Location<> in the code

If they each have a location field then the location will only be added by the parser (though the tests will need updating too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. For some background context, I've been working on a SQL compiler that uses the sqlparser-rs library as our parser. We added these locations to improve the error messages in the compiler.

We actually started by embedding the location in Ident, but there are places where we actually want to store the Ident itself (e.g. a map from Ident to some binder state about it), and in those places we don't care about the Location. There are also places where we want to do things like convert a string into an Ident, and if location is a field on the Ident, we'll have to make it empty/unknown. This isn't an issue in-and-of-itself, it just means that we won't get the typesystem's help to enforce whether or not an Ident has a location.

A second benefit to wrapping in Located is that we can write a bunch of functions that take Located<T> (e.g. error messages) and use the location (yes, this could be accomplished with a Located trait too). It's essentially like a smart pointer.

All of that said, I'm happy to consider another approach — I only have perspective from the one project I've been working on and want to ensure that the contributions are aligned with the long term of the community around this repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an issue in-and-of-itself, it just means that we won't get the typesystem's help to enforce whether or not an Ident has a location.

this makes sense -- I guess I am worried about the implications on downstream crates of changing all fields to Location -- specifically when used in match expressions there are lots of patterns like https://github.com/apache/arrow-datafusion/blob/350cb47289a76e579b221fe374e4cf09db332569/datafusion/sql/src/expr/function.rs#L168-L184

Adding Located will require all such locations to add a new level of Located. If we addlocationas a field then at least where there are matches with default match arms ..` no changes will be required.

match sql {
  FunctionArgs::Named {
    name, 
    arg: ...
    .. // <-- can be used to ignore location
  }
   ...

with located wouldn't this end up looking something like

match sql {
  FunctionArgs::Named {
    name: Located { name, ... }, 
    arg: Located { ... }
    .. // <-- can be used to ignore location
  }
   ...

In terms of converting String to Ident what about something like an optional location? Like

pub struct Ident {
    pub value: String,
    pub quote_style: Option<char>,
    location: Option<Location>,
}

And then have

impl FromStr for Ident { 
...
 Ident { 
 ... 
  location: None.
}

(yes, this could be accomplished with a Located trait too).

I agree this would be a better approach

Copy link
Contributor Author

@ankrgyl ankrgyl Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with located wouldn't this end up looking something like

match sql {
  FunctionArgs::Named {
    name: Located { name, ... }, 
    arg: Located { ... }
    .. // <-- can be used to ignore location
  }
   ...

I don't think so — but apologies if I'm missing something. Located is exactly like a smart pointer (e.g. like Box), so you can write

match sql {
  FunctionArgs::Named {
    name, 
    arg,
}
...

and then "pretend" name is just an Ident, except that you can also call .location() on it.

In terms of converting String to Ident what about something like an optional location? Like

Yes that's true, but the problem is that now you don't have the typesystem's help to know whether or not something has a location. For example, imagine you put these idents into a map where you "don't care" about locations. You'd need Hash, Eq, etc. to ignore the location field. However, there may be a separate use case where you do care about locations — what do you do then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your thoughtful reply @ankrgyl -- I plan to give this a try in the next few days. I am sorry I just don't have very much focus time to devote to sqlparser at the moment

This PR is definitely on my list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I actually open sourced the project I'm working on that uses sqlparser-rs this weekend: https://github.com/qscl/queryscript. It's not ready for real use yet, but you can see the fork of sqlparser, and if you feel compelled, how we use it in the code (e.g. the VSCode uses Locations to show errors in the exact right locations). Happy to chat further about it offline too if helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for my lack of communication here.

Basically I have (obviously) extremely limited time for maintenance of this crate and for that I apologize.

Given the limited maintenance bandwidth, I am very hesitant to accept a change that causes downstream churn given the possibility of requiring non trivial effort to fix here.

Basically, to merge a PR like this I would like to see what changes are required to an existing project that uses AST matching extensively. Any project that uses sqlparser-rs would do (though I am personally very familiar with https://github.com/apache/arrow-datafusion)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So specifically, I want to see a PR to an existing project that pins sqlparser to this branch and shows the tests / CI passing as a demonstration of how much downstream impact it would have)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted as a top-level comment, whoops! #790 (comment)

cascade: bool,
},
/// `DROP [ COLUMN ] [ IF EXISTS ] <column_name> [ CASCADE ]`
DropColumn {
column_name: Ident,
column_name: Located<Ident>,
if_exists: bool,
cascade: bool,
},
Expand All @@ -75,25 +76,28 @@ pub enum AlterTableOperation {
},
/// `RENAME [ COLUMN ] <old_column_name> TO <new_column_name>`
RenameColumn {
old_column_name: Ident,
new_column_name: Ident,
old_column_name: Located<Ident>,
new_column_name: Located<Ident>,
},
/// `RENAME TO <table_name>`
RenameTable { table_name: ObjectName },
// CHANGE [ COLUMN ] <old_name> <new_name> <data_type> [ <options> ]
ChangeColumn {
old_name: Ident,
new_name: Ident,
old_name: Located<Ident>,
new_name: Located<Ident>,
data_type: DataType,
options: Vec<ColumnOption>,
},
/// `RENAME CONSTRAINT <old_constraint_name> TO <new_constraint_name>`
///
/// Note: this is a PostgreSQL-specific operation.
RenameConstraint { old_name: Ident, new_name: Ident },
RenameConstraint {
old_name: Located<Ident>,
new_name: Located<Ident>,
},
/// `ALTER [ COLUMN ]`
AlterColumn {
column_name: Ident,
column_name: Located<Ident>,
op: AlterColumnOperation,
},
}
Expand Down Expand Up @@ -272,8 +276,8 @@ impl fmt::Display for AlterColumnOperation {
pub enum TableConstraint {
/// `[ CONSTRAINT <name> ] { PRIMARY KEY | UNIQUE } (<columns>)`
Unique {
name: Option<Ident>,
columns: Vec<Ident>,
name: Option<Located<Ident>>,
columns: Vec<Located<Ident>>,
/// Whether this is a `PRIMARY KEY` or just a `UNIQUE` constraint
is_primary: bool,
},
Expand All @@ -283,16 +287,16 @@ pub enum TableConstraint {
/// [ON UPDATE <referential_action>] [ON DELETE <referential_action>]
/// }`).
ForeignKey {
name: Option<Ident>,
columns: Vec<Ident>,
name: Option<Located<Ident>>,
columns: Vec<Located<Ident>>,
foreign_table: ObjectName,
referred_columns: Vec<Ident>,
referred_columns: Vec<Located<Ident>>,
on_delete: Option<ReferentialAction>,
on_update: Option<ReferentialAction>,
},
/// `[ CONSTRAINT <name> ] CHECK (<expr>)`
Check {
name: Option<Ident>,
name: Option<Located<Ident>>,
expr: Box<Expr>,
},
/// MySQLs [index definition][1] for index creation. Not present on ANSI so, for now, the usage
Expand All @@ -305,13 +309,13 @@ pub enum TableConstraint {
/// Whether this index starts with KEY (true) or INDEX (false), to maintain the same syntax.
display_as_key: bool,
/// Index name.
name: Option<Ident>,
name: Option<Located<Ident>>,
/// Optional [index type][1].
///
/// [1]: IndexType
index_type: Option<IndexType>,
/// Referred column identifier list.
columns: Vec<Ident>,
columns: Vec<Located<Ident>>,
},
/// MySQLs [fulltext][1] definition. Since the [`SPATIAL`][2] definition is exactly the same,
/// and MySQL displays both the same way, it is part of this definition as well.
Expand All @@ -332,9 +336,9 @@ pub enum TableConstraint {
/// Whether the type is followed by the keyword `KEY`, `INDEX`, or no keyword at all.
index_type_display: KeyOrIndexDisplay,
/// Optional index name.
opt_index_name: Option<Ident>,
opt_index_name: Option<Located<Ident>>,
/// Referred column identifier list.
columns: Vec<Ident>,
columns: Vec<Located<Ident>>,
},
}

Expand Down Expand Up @@ -490,7 +494,7 @@ impl fmt::Display for IndexType {
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct ColumnDef {
pub name: Ident,
pub name: Located<Ident>,
pub data_type: DataType,
pub collation: Option<ObjectName>,
pub options: Vec<ColumnOptionDef>,
Expand Down Expand Up @@ -526,7 +530,7 @@ impl fmt::Display for ColumnDef {
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct ColumnOptionDef {
pub name: Option<Ident>,
pub name: Option<Located<Ident>>,
pub option: ColumnOption,
}

Expand Down Expand Up @@ -559,7 +563,7 @@ pub enum ColumnOption {
/// }`).
ForeignKey {
foreign_table: ObjectName,
referred_columns: Vec<Ident>,
referred_columns: Vec<Located<Ident>>,
on_delete: Option<ReferentialAction>,
on_update: Option<ReferentialAction>,
},
Expand Down Expand Up @@ -611,8 +615,8 @@ impl fmt::Display for ColumnOption {
}
}

fn display_constraint_name(name: &'_ Option<Ident>) -> impl fmt::Display + '_ {
struct ConstraintName<'a>(&'a Option<Ident>);
fn display_constraint_name(name: &'_ Option<Located<Ident>>) -> impl fmt::Display + '_ {
struct ConstraintName<'a>(&'a Option<Located<Ident>>);
impl<'a> fmt::Display for ConstraintName<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
if let Some(name) = self.0 {
Expand Down
Loading