Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steampipe hangs reliably under some conditions #583

Open
ajoga opened this issue Jun 28, 2023 · 1 comment
Open

Steampipe hangs reliably under some conditions #583

ajoga opened this issue Jun 28, 2023 · 1 comment

Comments

@ajoga
Copy link

ajoga commented Jun 28, 2023

Hello,

Following the discussion on Slack in this thread, and as suggested by @cbruno10, I'm logging an issue.

I can reliably make Steampipe hang given:

  • Steampipe v0.20.7 ; did not test other versions.
  • The use of the csv plugin v0.9.0 configured with two files ; no hang with v0.8.0.
  • A specific poorly written query -- its result is non-deterministic ; no hang when the query is deterministic, see below.
  • cache = true (https://steampipe.io/docs/guides/caching#server-level-cache-settings) ; no hang when cache is disabled.
  • Launched interactively, without a service, through steampipe query ./queries/tmp.sql.

Unfortunately I can't reduce the CSV files enough to publish them on Github. One is three lines long, including header, the other one is ~700 lines long. There is in both 10-15 columns.
When I try to trim the files more than that, I can't reliably get Steampipe to hang anymore for some reason.

The CSV files are of poor quality, some fields are quoted, not others. I have duplicate datas in all columns, and I suspect that there is an issue with the cache logic somehow around this.

The pseudonymized query is like this:

WITH tmp_table as (
    SELECT
        "AccountId" as account_id,
        "BusinessGroup" as business_group
    FROM
        csv."MYCSVfile"
)
SELECT
    acclist.id as account_id,
    (SELECT business_group from tmp_table WHERE tmp_table.account_id=acclist.id LIMIT 1) as bg
FROM
    csv."accounts" acclist /* list of accounts updated by generateSteampipeConfig.py */

Non-deterministic subquery:

  1. When I replace SELECT business_group from tmp_table WHERE tmp_table.account_id=acclist.id LIMIT 1 by SELECT MAX(business_group) from tmp_table WHERE tmp_table.account_id=acclist.id, then I do not experience the freeze.
  2. Alternatively, if I replace WITH tmp_table as ( by WITH tmp_table as MATERLIZED (, then I do not experience the freeze.

I have stashed an acceptance test locally (on this model) that reproduces the issue, if you'd like me to try another build. It runs on a debian machine and I run it through ~/steampipe/tests/acceptance$ ./run-local.sh ajoga.bats.
The hanging is such that even to get back my shell when I launch the test, I have to run $ killall steampipe && killall postgres && killall steampipe-plugin-csv.plugin in an other shell (advise welcome on how to tune bats to handle that ; export BATS_TEST_TIMEOUT=20 in the setup_file of the bats file didn't do it)

@judell
Copy link
Contributor

judell commented Jun 28, 2023

the CSV files are of poor quality, some fields are quoted, not others.

I wonder if a preprocessing step, for example using the Python module's QUOTE_ALL, would make a difference.

WITH tmp_table as MATERIALIZED

Interesting. There are cases where a seemingly unnecessary ORDER BY (turbot/steampipe#968) influences the query planner in ways that fix an otherwise failing query. We suspect that AS MATERIALIZED can have a similar influence in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants