Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some duckdb sql cannot run in jupysql #172

Closed
wonb168 opened this issue Feb 28, 2023 · 6 comments
Closed

some duckdb sql cannot run in jupysql #172

wonb168 opened this issue Feb 28, 2023 · 6 comments

Comments

@wonb168
Copy link

wonb168 commented Feb 28, 2023

  1. multi lines, must leave a blank after $$?
%sql create table test_table(sqlstr text);
%%sql
insert into test_table values ($$create table t(id int)
,code text$$)

Parser Error: syntax error at or near "create"
LINE 1: insert into test_table values ($create table t(id int
BUT, leave a blank between $$ and create, will OK

  1. can not run regexp_split_to_array?
%%sql 
select regexp_split_to_array(a.parts,','),regexp_split_to_array(b.parts,',')
from v_ddl a
inner join v_ddl2 b on a.word3=b.word3 and a.sqlstr<>b.sqlstr

raise ERROR:

Cell In[49], line 4
    select regexp_split_to_array(a.parts,','),regexp_split_to_array(b.parts,',')
           ^
SyntaxError: invalid syntax

BUT, this sql have no prolem in duckdb cli or in python.

@wonb168 wonb168 changed the title some sql cannot run in jupysql some duckdb sql cannot run in jupysql Feb 28, 2023
@edublancas
Copy link

Ok, so the problem with the first example is that $something is interpreted as variable substitution, that's why adding a space fixes the problem.

We're fixing this in #93 and $variable will no longer trigger variable substitution. We'll use the more common {{something}} format.

for the second example: I couldn't reproduce it (see my output below). I got a "Table with name v_ddl does not exist!" error, which makes sense. But you're getting "invalid syntax", which means that your Python session is interpreting the SQL code as Python; maybe you have some extra line breaks?

full session below:


first example

%load_ext sql
%sql duckdb://
%sql create table test_table(sqlstr text);

Console output (1/2):

*  duckdb://
Done.

Console output (2/2):

%%sql
insert into test_table values ($$create table t(id int)
,code text$$)

Console output (1/1):

*  duckdb://
(duckdb.ParserException) Parser Error: syntax error at or near "create"
LINE 1: insert into test_table values ($create table t(id int)
                                        ^
[SQL: insert into test_table values ($create table t(id int)
,code text$$)]
(Background on this error at: https://sqlalche.me/e/14/f405)

second example

%%sql 
select regexp_split_to_array(a.parts,','),regexp_split_to_array(b.parts,',')
from v_ddl a
inner join v_ddl2 b on a.word3=b.word3 and a.sqlstr<>b.sqlstr

Console output (1/1):

*  duckdb://
(duckdb.CatalogException) Catalog Error: Table with name v_ddl does not exist!
Did you mean "pg_am"?
[SQL: select regexp_split_to_array(a.parts,','),regexp_split_to_array(b.parts,',')
from v_ddl a
inner join v_ddl2 b on a.word3=b.word3 and a.sqlstr<>b.sqlstr]
(Background on this error at: https://sqlalche.me/e/14/f405)

@wonb168
Copy link
Author

wonb168 commented Mar 1, 2023

Second example:

with p1 as(select word3,unnest(regexp_split_to_array(a.parts,', \n')) p from v_ddl a) 
,p2 as(select word3,unnest(regexp_split_to_array(a.parts,', \n')) p from v_ddl2 a) 
select --concat('alter table ',p1.word3,' add ',p1.p,';'), 
p1.word3,p1.p 
from  p1 
left join p2 on p1.p=p2.p 
where p2.p is null

this WORKS, but if I add the concat line, raise error :
SyntaxError: invalid syntax. Perhaps you forgot a comma?
but this works in the cli also.

BUT, if I copy this sql into a new cell, also works, see the flowing:
image
WHY? maybe some proplem in jupysql cell?

@wonb168
Copy link
Author

wonb168 commented Mar 1, 2023

here is the whole test data:

%%sql
create table ddl (sqlstr text);
create table ddl2 (sqlstr text);
insert into ddl values ($$ CREATE TABLE public.test_partition_list (
    id integer,
    name character varying(64),
    fdate character varying(10)
)
 DISTRIBUTED BY (id) PARTITION BY LIST(fdate) 
          (
          PARTITION p1 VALUES('2017-01-01') WITH (tablename='test_partition_list_1_prt_p1', appendonly='false'), 
          PARTITION p2 VALUES('2017-01-02') WITH (tablename='test_partition_list_1_prt_p2', appendonly='false')
         );$$)
;
insert into ddl2 values ($$ CREATE TABLE public.test_partition_list (
    id integer,
    name character varying(64),
    fdate character varying(10)
)
 DISTRIBUTED BY (name) PARTITION BY LIST(fdate) 
          (
          PARTITION p2 VALUES('2017-01-02') WITH (tablename='test_partition_list_1_prt_p2', appendonly='false')
         );$$)
;
create view v_ddl as
select regexp_extract(sqlstr, '(\S+)', 1) left1
,regexp_extract(sqlstr, '(\S+\s\S+)', 1) left2
,regexp_extract(sqlstr, '\S+\s\S+\s(\S+)', 1) word3
-- 动作,除set、select、comment取首单词,其他取前2个单词
,case when left1 in ('SET','SELECT','COMMENT') then left1 else left2 end as action
-- 对象,只有create取,取动作后的第一个单词
,case when left1='CREATE' then word3 else sqlstr end as obj
-- 表的字段列表,函数的参数列表
,case when left2='CREATE TABLE' then regexp_extract(sqlstr, '(\(.*\))\s+DISTRIBUTED', 1,'s') --s跨行匹配,默认.不支持\n
--  when left2='CREATE FUNCTION' then regexp_extract(sqlstr, '(\(.*) DISTIRBUTED', 1) 函数也是动作后第一段非空
  end as cols
-- 表的分布键,DISTIRBUTED BY开始,括号结束'(DISTRIBUTED BY .*?\))'
,case when left2='CREATE TABLE' then regexp_extract(sqlstr, '(DISTRIBUTED BY .*?\))', 1) end as dkey 
-- 表的分区转array
,case when left2='CREATE TABLE' then regexp_extract(sqlstr, '\n(\s+PARTITION \S+ VALUES.*\))\s+\)', 1,'s') end as parts
,* 
from ddl;
create view v_ddl2 as
select regexp_extract(sqlstr, '(\S+)', 1) left1
,regexp_extract(sqlstr, '(\S+\s\S+)', 1) left2
,regexp_extract(sqlstr, '\S+\s\S+\s(\S+)', 1) word3
-- 动作,除set、select、comment取首单词,其他取前2个单词
,case when left1 in ('SET','SELECT','COMMENT') then left1 else left2 end as action
-- 对象,只有create取,取动作后的第一个单词
,case when left1='CREATE' then word3 else sqlstr end as obj
-- 表的字段列表,函数的参数列表
,case when left2='CREATE TABLE' then regexp_extract(sqlstr, '(\(.*\))\s+DISTRIBUTED', 1,'s') --s跨行匹配,默认.不支持\n
--  when left2='CREATE FUNCTION' then regexp_extract(sqlstr, '(\(.*) DISTIRBUTED', 1) 函数也是动作后第一段非空
  end as cols
-- 表的分布键,DISTIRBUTED BY开始,括号结束'(DISTRIBUTED BY .*?\))'
,case when left2='CREATE TABLE' then regexp_extract(sqlstr, '(DISTRIBUTED BY .*?\))', 1) end as dkey 
-- 表的分区转array
,case when left2='CREATE TABLE' then regexp_extract(sqlstr, '\n(\s+PARTITION \S+ VALUES.*\))\s+\)', 1,'s') end as parts
,* 
from ddl2;

then:

with p1 as(select word3,unnest(regexp_split_to_array(a.parts,', \n')) p from v_ddl a) 
,p2 as(select word3,unnest(regexp_split_to_array(a.parts,', \n')) p from v_ddl2 a) 
select concat('alter table ', p1.word3,' drop ', regexp_extract(p1.p,'(\S+\s+\S+)',1),';'), p1.word3, p1.p 
from  p1 
left join p2 on p1.word3=p2.word3 and p1.p=p2.p 
where p2.p is null

In jupysql error, but in cli it's OK!

@edublancas
Copy link

can you share the .py script (the one that you have in the screenshot)? you can remove stuff if there's sensitive info there.

@wonb168
Copy link
Author

wonb168 commented Mar 2, 2023

can you share the .py script (the one that you have in the screenshot)? you can remove stuff if there's sensitive info there.

issue.py.txt
because github not support py, I rename the file name as .txt.

@edublancas
Copy link

I see what the problem is! You cannot add comments before the cell magic.

This will break:

# %%
# stuff
%%sql
create table ddl (sqlstr text);
create table ddl2 (sqlstr text);

because Python will interpret the line after the # stuff comment as Python, instead of SQL.

You can create it in a separate cell:

# %% [markdown]
# some comment

# %%
# stuff
%%sql
create table ddl (sqlstr text);
create table ddl2 (sqlstr text);

Or use a SQL-style comment after the cell magic:

# %%
%%sql
-- stuff
create table ddl (sqlstr text);
create table ddl2 (sqlstr text);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants