Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db.dropcolumn: Circumvent length limit for sqlite3 SQL STDIN string #3273

Conversation

mmacata
Copy link
Contributor

@mmacata mmacata commented Nov 29, 2023

This PR supports a more recent sqlite3 command to drop a column within db.dropcolumn.
In sqlite3 3.35.0, the DROP COLUMN command was added. This PR makes use of the command if the sqlite3 version is high enough.

Background is a failing db.in.ogr command for a CSV file containing many columns:

GRASS cttest/PERMANENT:~ > db.in.ogr input=migr_emi2_L0.csv output=migr_emi2_L0
Traceback (most recent call last):
  File "/usr/local/grass83/scripts/db.in.ogr", line 189, in <module>
    main()
  File "/usr/local/grass83/scripts/db.in.ogr", line 172, in main
    grass.run_command(
  File "/usr/local/grass83/etc/python/grass/script/core.py", line 466, in run_command
    return handle_errors(returncode, result=None, args=args, kwargs=kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/grass83/etc/python/grass/script/core.py", line 345, in handle_errors
    raise CalledModuleError(module=module, code=code, returncode=returncode)
grass.exceptions.CalledModuleError: Module run `db.dropcolumn --q -f table=migr_emi2_L0 column=cat` ended with an error.
The subprocess ended with a non-zero return code: 1. See errors above the traceback or in the error output.

And with further debugging:

GRASS cttest/PERMANENT:~ > db.dropcolumn --q -f table=migr_emi2_L0 column=cat
WARNING: Deleting <cat> column which may be needed to keep table connected
         to a vector map
DBMI-SQLite driver error:
Error in sqlite3_prepare():
incomplete input

DBMI-SQLite driver error:
Error in sqlite3_prepare():
incomplete input

ERROR: Error while executing: 'CREATE TEMPORARY TABLE
       migr_emi2_L0_backup(NUTS_ID TEXT, TOTAL_COMPLET_NR_F_2021 DOUBLE
       PRECISION, TOTAL_COMPLET_NR_F_2020 DOUBLE PRECISION,
       TOTAL_COMPLET_NR_F_2019 DOUBLE PRECISION, TOTAL_COMPLET_NR_F_2018
       DOUBLE PRECISION, TOTAL_COMPLET_NR_F_2017 DOUBLE PRECISION,
...    
       Y43_REACH_NR_T_2017 DOUBLE PRECISION, Y43_REACH_NR_T_2016 DOUBLE
       PRECISION, Y43_REACH_NR_T_2015 DOUBLE PRECISION,
       Y44_COMPLET_NR_F_2021 DOUBLE PRECISION, Y4'
ERROR: Cannot continue (problem deleting column)

So the created SQL string becomes too long to be passed via stdin in "db.execute", input="-", database=database, driver=driver, stdin=sql command. This PR circumvents this by using the newer sqlite3 syntax if available.

@mmacata
Copy link
Contributor Author

mmacata commented Nov 29, 2023

(Backport to 8.3 would be nice - I am not allowed to add labels)

@nilason nilason added bug Something isn't working Python Related code is in Python database Related to database management backport to 8.3 labels Nov 29, 2023
@nilason nilason added this to the 8.4.0 milestone Nov 29, 2023
Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great upgrade!

driver=driver,
)

if sqlite3_version >= "3.35.0":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing versions is a pain:

>>> "10.0.0" >= "3.35.0"
False
>>> [int(i) for i in "10.0.0".split(".")] >= [int(i) for i in "3.35.0".split(".")]
True

but:

>>> [int(i) for i in "4.0.0-dev".split(".")] >= [int(i) for i in "3.35.0".split(".")]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: invalid literal for int() with base 10: '0-dev'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing only major and minor version (3.35) should be enough and skip the patch/minor part which is not really relevant to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now comparing major and minor versions as arrays of INTs in recent commit.

@echoix
Copy link
Member

echoix commented Dec 1, 2023

In a little concerned about the old and new code in there. I never did SQL in Python, but if a code equivalent to that was done in C# and PHP, what looks like concatenation of strings to create a SQL query, would be vulnerable to SQL injection.

I would expect that placeholders would be used and the parameters would be binded then executed.

mmacata and others added 2 commits December 1, 2023 13:09
Co-authored-by: Vaclav Petras <wenzeslaus@gmail.com>
@mmacata
Copy link
Contributor Author

mmacata commented Dec 1, 2023

In a little concerned about the old and new code in there. I never did SQL in Python, but if a code equivalent to that was done in C# and PHP, what looks like concatenation of strings to create a SQL query, would be vulnerable to SQL injection.

I would expect that placeholders would be used and the parameters would be binded then executed.

How could this look like? I read in some stackoverflow posts (no documentation quote at hand) that binding table or column names is not possible (for sqlite3), only values.
To secure this I can only think of adding the following, maybe even before the column check:

     if table not in gscript.read_command("db.tables"):
        gscript.fatal(_("Table <%s> not found in db.tables") % column)

but I am not sure if this is how db.dropcolumn should behave.

flags="c",
database=database,
driver=driver,
).split(".")[0:2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See maxsplit parameter which avoids the need for indexing later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the option to select only the first items when using maxsplit, or am I overseeing something? It will split until the given index but then adding all following into one item. This will cause the problem you described above:

>>> sqlite3_version = "4.0.0-dev"
>>> sqlite3_version.split('.',1)
['4', '0.0-dev']
>>> [int(i) for i in sqlite3_version.split('.',1)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: invalid literal for int() with base 10: '0.0-dev'
>>> sqlite3_version.split('.',2)
['4', '0', '0-dev']
>>> [int(i) for i in sqlite3_version.split('.',2)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: invalid literal for int() with base 10: '0-dev'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmacata You're right, maxsplit might be used to limit the resulting list to three parts, eg:

>>> sqlite3_version = "4.0.0-dev"
>>> sqlite3_version.split('.', maxsplit=3)[:2]
['4', '0']

but the index is needed. Although that seems to me to be over doing things. Index should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenzeslaus do you also agee? Can this be merged as is?

@nilason
Copy link
Contributor

nilason commented Dec 1, 2023

In a little concerned about the old and new code in there. I never did SQL in Python, but if a code equivalent to that was done in C# and PHP, what looks like concatenation of strings to create a SQL query, would be vulnerable to SQL injection.
I would expect that placeholders would be used and the parameters would be binded then executed.

How could this look like? I read in some stackoverflow posts (no documentation quote at hand) that binding table or column names is not possible (for sqlite3), only values. To secure this I can only think of adding the following, maybe even before the column check:

     if table not in gscript.read_command("db.tables"):
        gscript.fatal(_("Table <%s> not found in db.tables") % column)

but I am not sure if this is how db.dropcolumn should behave.

I think this question on sql safety is out of scope for this PR.

Copy link
Contributor

@nilason nilason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@neteler
Copy link
Member

neteler commented Dec 14, 2023

I'll merge this if no objections.

@neteler neteler merged commit 045f43b into OSGeo:main Dec 15, 2023
18 checks passed
neteler pushed a commit that referenced this pull request Dec 15, 2023
…3273)

* db.dropcolumn: support more recent sqlite3 drop column command
* use sqlite_version()
* Update scripts/db.dropcolumn/db.dropcolumn.py
* enhance version comparison

Co-authored-by: Vaclav Petras <wenzeslaus@gmail.com>
@neteler neteler modified the milestones: 8.4.0, 8.3.2 Dec 15, 2023
HuidaeCho pushed a commit to HuidaeCho/grass that referenced this pull request Jan 9, 2024
…SGeo#3273)

* db.dropcolumn: support more recent sqlite3 drop column command
* use sqlite_version()
* Update scripts/db.dropcolumn/db.dropcolumn.py
* enhance version comparison

Co-authored-by: Vaclav Petras <wenzeslaus@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database Related to database management Python Related code is in Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants