Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding parametrized queries #93

Closed
edublancas opened this issue Jan 26, 2023 · 19 comments · Fixed by #137
Closed

adding parametrized queries #93

edublancas opened this issue Jan 26, 2023 · 19 comments · Fixed by #137
Assignees
Labels
stash Label used to categorize issues that will be worked on next

Comments

@edublancas
Copy link

edublancas commented Jan 26, 2023

many people are already familiar with parametrized queries since they're available in dbt and airflow, so we should add that. possibly with jinja?

%%sql --params {"corpus_name": "hamlet", "limit": 10}
SELECT word, SUM(word_count) as count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = {{corpus_name}}
GROUP BY word
ORDER BY count DESC
LIMIT {{limit}}

This is how BigQuery does it (but with a different syntax).

I'm unsure if the --params {"corpus_name": "hamlet", "limit": 10} if the best choice. perhaps something simpler like -p corpus_name hamlet -p limit 10? similar to what we do in ploomber-engine

@idomic
Copy link

idomic commented Jan 27, 2023

I like this.
I think that it should be inline vs passing everything under a flag?
It can be from variables in the notebooks for v1, I think I already saw a similar solution somewhere.

@edublancas
Copy link
Author

I think that it should be inline vs passing everything under a flag?

you mean something like this?

limit = 10
corpus_name = "something"
%%sql 
SELECT word, SUM(word_count) as count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = {{corpus_name}}
GROUP BY word
ORDER BY count DESC
LIMIT {{limit}}

@idomic
Copy link

idomic commented Jan 27, 2023

Exactly.
Kind of taking this to the next level:

from string import Template

template = Template("""
SELECT *
FROM my_data
LIMIT $limit
""")

limit_one = template.substitute(limit=1)
limit_two = template.substitute(limit=2)
%sql $limit_one

@edublancas
Copy link
Author

edublancas commented Jan 27, 2023

cool, I'd say there should be two modes. one where the user explicitly says "go grab my global scope" and another one where they pass parameters explicitly.

Here, we grab existing limit and corpus_name from existing scope (shorter version: %%sql -u):

%%sql --use-globals
SELECT word, SUM(word_count) as count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = {{corpus_name}}
GROUP BY word
ORDER BY count DESC
LIMIT {{limit}}

here, we pass values explicitly:

%%sql  -p limit 1 -p corpus_name stuff
SELECT word, SUM(word_count) as count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = {{corpus_name}}
GROUP BY word
ORDER BY count DESC
LIMIT {{limit}}

Mixing them will use the existing scope and override any passed values:

%%sql  -p limit 1 --use-globals
SELECT word, SUM(word_count) as count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = {{corpus_name}}
GROUP BY word
ORDER BY count DESC
LIMIT {{limit}}

thoughts?

@idomic
Copy link

idomic commented Jan 27, 2023

We can do this mix, I don't see why we need --use-globals though.
@yafimvo @tonykploomber @Alex-Monahan thoughts on this feature?

@edublancas
Copy link
Author

We can do this mix, I don't see why we need --use-globals though.

following the zen of python 😁: Explicit is better than implicit.

@idomic
Copy link

idomic commented Jan 27, 2023

lol sure.
Let's gather more feedback.

@edublancas
Copy link
Author

edublancas commented Jan 27, 2023

and, we could even combine it with the --save feature:

%%sql --save some_name --no-execute
SELECT word, SUM(word_count) as count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = {{corpus_name}}
GROUP BY word
ORDER BY count DESC
LIMIT {{limit}}

Then:

%sql -p limit 10 -p corpus_name one --with some_name
%sql -p limit 100 -p corpus_name another --with some_name

@idomic
Copy link

idomic commented Jan 27, 2023

That could be one example for the docs, some_name should probably be some_query.
Probably the no-execute will always be on in that scenario.

@edublancas
Copy link
Author

That could be one example for the docs, some_name should probably be some_query.

agree

Probably the no-execute will always be on in that scenario.

unless you use --use-globals

@edublancas edublancas added the stash Label used to categorize issues that will be worked on next label Feb 7, 2023
@tonykploomber
Copy link

tonykploomber commented Feb 17, 2023

@edublancas

For the --params, the abbreviation -p is conflict with our current --persist

-p / --persist Create a table name in the database from the named DataFrame ([example](https://jupysql.ploomber.io/en/latest/api/magic-sql.html#create-table))

Any other better naming suggestion?

@tonykploomber tonykploomber linked a pull request Feb 17, 2023 that will close this issue
3 tasks
@edublancas
Copy link
Author

let's use capital p: -P

@tonykploomber
Copy link

Let's think about if we want to keep this
Since: https://jupysql.ploomber.io/en/latest/intro.html?highlight=variable#variable-substitution

@edublancas
Copy link
Author

I looked at the existing behavior: it needs to be more consistent (see here) and support some new features. So let's go ahead and implement it with jinja2.Template.

I think the current implementation is happening here:

result = conn.session.execute(txt, user_namespace)

so we need to:

  • deprecate the existing implementation (see note below)
  • implement with jinja2
  • ensure we can re-render snippets when we change a parameter value in a snippet with the --with option (see here)

what's interesting about the existing implementation is that the original author decided to pass the user's namespace implicitly, so I think let's do it that way; once that's in, we can decide if we add the --use-globals and --parameter options.

example of the new api:

parameter = "value"
%%sql
select * from table where parameter = '{{parameter}}'

should resolve to:

select * from table where parameter = '{{value}}'

deprecation

I think we should keep the existing behavior for a bit to allow users to migrate.

I think we can detect queries with the existing behavior with some regex that looks for :something $something or {something}, and if so, show a warning so they use {{something}} (and show our slack link so they can ask for help)

@tonykploomber
Copy link

tonykploomber commented Feb 21, 2023

I think the original way they parsed the variable was happening earlier before our execution function:

Give this example:

# Suppose we execute following calls in notebook:
test_x = 3
%sql SELECT * FROM my_data limit $test_x

In the def execution function, where our main sql execute handler

    def execute(self, line="", cell="", local_ns={}):
        # skip
        print ("line: ", line)
        # print -> SELECT * FROM my_data limit 3
        command = SQLCommand(self, user_ns, line, cell)
        # args.line: contains the line after the magic with all options removed
        args = command.args

After some researches I think Ipython already parsed the line string with :variable or {variable}

Deprecation

Probably we need to find the way to control that parsing part (either throwing some warning message, or disable the :variable or {variable} in the future)

Add Double Curly Bracelets parsing support

In current codebase, if we execute something like:

test_x = 3
SELECT * FROM my_data limit '{{test_x}}'

It will become

    def execute(self, line="", cell="", local_ns={}):
        # skip
        print ("line: ", line)
        # print -> SELECT * FROM my_data limit '{test_x}'
        command = SQLCommand(self, user_ns, line, cell)
        # args.line: contains the line after the magic with all options removed
        args = command.args

We can still parse the SELECT * FROM my_data limit '{test_x}' to
SELECT * FROM my_data limit 3 to fulfill the feature, but I am not sure if this is a good approach because:

  1. Ipython already has the logic to handle the parsing part natively, the best approach is we can see & control the original parsing logic in Ipython then modify to our way
  2. Since the sql clause parsed from '{{test_x}}' to '{test_x}' (cause original parsing behavior exists), we need to parse again from '{test_x}' to 3, prob by high-level-api in jinja2 with variable_start_string and variable_end_string

Need more research on this...

@tonykploomber
Copy link

tonykploomber commented Feb 22, 2023

Update 2/22/23:

Design Proposal

There is a decorator called @no_var_expand which can be attached to def execute(self, line="", cell="", local_ns={}): function

Probably it's better to turn-off the default variable parsing since we want to have a full-control to the original cmd, the case like:

Since the sql clause parsed from '{{test_x}}' to '{test_x}'

would be odd & hard to handle after

After the variable parsing is turned off by our hand, we can write our custom parsers to handle the variable by ourself. There will be two cases:

  1. Neither --use-globals and --parameter is provided, which means we should still provide the legacy variable substitution usage. We use our custom parser (A) to handle ${variable} & :variable in line or cell string
  2. Either --use-globals and --parameter is provided, we use our custom parser (B) to handle {{variable}} in line or cell string

Custom parser A: only handle the ${variable} & :variable case, we might reuse the var_expand func in IPython
Custom parser B: only handle the {{variable}} case, we can just use jinja2 template.render function to parse the string

Finally, I think when the user provides something like ${variable} & :variable along with --use-globals and --parameter, like %sql --use-global SELECT * FROM my_table limit ${my_limit}, this mixin usage should not be allowed in this design

@edublancas Kinda complicate...but lmk what you think

@idomic
Copy link

idomic commented Feb 22, 2023

For both 1 & 2, we can add the feature, support it for the required minimum version and deprecate it in the next major, as per our deprecation policy. I think if there are no major design issues/concerns we should move to talk over an actual PR.

@edublancas
Copy link
Author

Alright, so I think let's break this into a few steps:

  1. Let's add the no_expand_var so IPython doesn't expand automatically. I think this won't break anything since we are already calling expand_var here
  2. add code to detect when people are using {a}, $a, :a and display a deprecation warning (note that :a is coming from SQLAlchemy
  3. Add the jinja2 templating logic. Let's first start without adding any extra args (so no --params and no --use-globals), just render values from the existing scope, you can get those from user_ns, this will expand '{{a}}'

questions @tonykploomber ?

@tonykploomber
Copy link

@edublancas
Sounds good to me now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stash Label used to categorize issues that will be worked on next
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants