Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sqlalchemy.Executable in pandas.read_sql(sql) #10846

Closed
zyzhu2000 opened this issue Aug 18, 2015 · 16 comments
Closed

Support sqlalchemy.Executable in pandas.read_sql(sql) #10846

zyzhu2000 opened this issue Aug 18, 2015 · 16 comments
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@zyzhu2000
Copy link

Since pandas.read_sql() can always use sqlalchemy.Engine or Connection, it would be great if the sql parameter can accept sqlalchemy.Executable in addition to a string. sqlalchemy.Executable provides a standardized way of binding parameters and generating sql for different kinds of DB.

@jorisvandenbossche jorisvandenbossche added Enhancement IO SQL to_sql, read_sql, read_sql_query labels Aug 18, 2015
@jorisvandenbossche jorisvandenbossche added this to the Someday milestone Aug 18, 2015
@jorisvandenbossche
Copy link
Member

I think this is a good idea.

In practice, wouldn't this only be a Select object? As it only makes sense for Executables that return values to provide this to read_sql, and other Executables as insert, update, delete don't do this.

@zyzhu2000
Copy link
Author

Yes it should only be a Select or Text object. I especially like sqlalchemy's way to standardize parameters with text(). For example, I can just write text('select * from tbl where name=:value') regardless of actual backends.

@stephenpascoe
Copy link

I'm going to look at this for EuroSciPy sprints. Maybe get some tests in place at least.

@stephenpascoe
Copy link

I've got a test describing what I think we want at stephenpascoe@2a0bfff
Am I on the right track?

@zyzhu2000
Copy link
Author

I have read the above link. It is not the same as what I was thinking. SQLAlchemy allows us to create statements using objects, similar to creating ASTs.

One example of executing the 'select' statement can be found in the SQLAlchemy document.
>>> from sqlalchemy.sql import select
>>> s = select([users])
>>> result = conn.execute(s)

The above code is found here: http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html

If we could pass the object s into pd.read_sql() as parameter sql, that would be very nice.

In addtion to the select object, it is also possible to pass the text object.

from sqlalchemy import text
sql = text("select * from mytable where id=:myid")
result = conn.execute(sql, myid=x)

Note in the above code, we could always pass parameters in the syntax of ":myid" in the sql statement, regardless of the actual backend. SQLAlchemy will do the conversion to the proper db specific format inside.

Now if we could write something like:

pd.read_sql(sql, params={'myid' : my_id})

That would be great,

@s-celles
Copy link
Contributor

s-celles commented Sep 2, 2015

I like this idea of text object. But I wonder if it will protect against SQL injection. see #10899

What could happened if

myid = "1; DROP TABLE test_table;"

?

I think according http://stackoverflow.com/questions/19314342/python-sqlalchemy-pass-parameters-in-connection-execute that it's safer than sql.format(...)

@s-celles
Copy link
Contributor

s-celles commented Sep 2, 2015

I won't be

pd.read_sql(sql, params={'myid' : my_id})

but

pd.read_sql(sql, engine, params={'myid' : my_id})

@zyzhu2000
Copy link
Author

@scls19fr , regarding the SQL injection problem, I think the text() object idea will in fact prevent the SQL injection attack. If you pass in myid="1; DROP table test_table" , the execution will fail. The reason is that the parameter :myid in "select * from mytable where id=:myid" is a real parameter for the underlying DBAPI. For example, in the case of SQL Server, the above will be translated to "select * from mytable where id=?", this statement is then passed to MS SQL Server, along with the value of the parameter.

This is very different from "select * from mytable where id=%s" % (my_id), which suffers the SQL injection problem.

The stackoverflow article you cited in fact talks about how to prevent SQL Inject attack using the text() object instead of raw string.

text() is way safer than raw string.

@s-celles
Copy link
Contributor

s-celles commented Sep 2, 2015

I think so. Doc should also be improved accordingly. The use of prepared statement / bind variable should be shown in doc to avoid SQL Injection.

@stephenpascoe
Copy link

I believe we can already do this. E.g. see stephenpascoe@bebc6a6. These tests pass on my system.

I.e. this should work:

from sqlalchemy import text
sql = text("select * from mytable where id=:myid")
result = pd.read_sql(sql, engine, params={'myid': my_id})

The tests show we can also make this work for sqlalchemy.select() statements if you use sqlalchemy.bindparam()

@zyzhu2000
Copy link
Author

I would avoid bindparam() if at all possible. If we use bindparam(), sqlalchemy will try to 'bind the parameters' to SQL, whereas if we don't use it, the underlying SQL database will do the work. The latter will be faster and possibly safer.

@stephenpascoe
Copy link

I don't understand this. In the example tests these two sqlalchemy objects are equivalent:

name_text = sqlalchemy.text('select * from iris where name=:name')
name_select = sqlalchemy.select([iris]).where(iris.c.Name == sqlalchemy.bindparam('name'))

They both generate virtually the same SQL. name_text is a TextClause instance whereas name_select is a Selectable but they both inherit from Executable so can be executed on the engine. See Selectables and bindparam. I don't believe there would be any performance difference between these two statements.

I think there may be a way of adding bindparams() calls within pandas so that, if a param hasn't been bound to the FromClause when passed to read_sql, it will be added. This would make Selectables easier to use. I'm working on a patch for this.

@jorisvandenbossche
Copy link
Member

If text objects already work, always good to explicitly add a test for that! (so it keeps working)

@zyzhu2000
Copy link
Author

@stephenpascoe , I mean bindparam() should be avoided in any case, whether or not we use select object or text object, if possible.

Without losing generality, let's say myid=1 for the below example.

If we just do the following, without calling bindparam(),

from sqlalchemy import text
sql = text("select * from mytable where id=:myid")
result = conn.execute(sql, myid=x)

then sqlalchemy will translate the SQL statement to DB specific format along with the unbound parameters. For example, in the case of Microsoft SQL Server, the above will be translated to

SELECT * from mytable where id=?  

and the value 1 will be passed to Microsoft SQL server along with the above sql statement. Microsoft SQL server will then bind the parameter to the SQL statement.

In comparison, with bindparam(), sqlalchemy will try to bind the parameters itself. if we do bindparam(), the sqlahcmely statement

sql = text("select * from mytable where id=:myid")

will be translated to something like

SELECT * from mytable where id=1

BEFORE passing to MS SQL server..

In addition, bindparam() has a lot of limitations of data types it supports.

So the approach without using bindparam() is more efficient, supports more data types, and perhaps more secure.

@stephenpascoe
Copy link

@zyzhu2000 I'm sorry but I think you are mistaken. If you look at the documentation for [bindparam] it explains that it's purpose is exactly what you want: to allow a value to be bound to a parameter at execution time (and by the database when supported).

Either way, I don't think this ticket is the right place for debating sqlalchemy internals.

I've played around with adding where clauses automatically for Selectables and it isn't working. You end up creating a self-referencing select statement. Therefore I've submitted PR #10983 to provide tests and documentation of existing functionality.

@zyzhu2000
Copy link
Author

@stephenpascoe , I think you are right about bindparam(). I must have mistaken bindparam() with something else. Sorry about that.

jorisvandenbossche added a commit that referenced this issue Sep 22, 2015
TST/DOC #10846 Test and document use of SQLAlchemy expressions in read_sql()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

4 participants