Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC server sometimes returns the wrong type #2984

Closed
1 of 5 tasks
aaronraff opened this issue Dec 29, 2020 · 2 comments · Fixed by #3499
Closed
1 of 5 tasks

RPC server sometimes returns the wrong type #2984

aaronraff opened this issue Dec 29, 2020 · 2 comments · Fixed by #3499
Labels
bug Something isn't working rpc Issues related to dbt's RPC server

Comments

@aaronraff
Copy link

Describe the bug

While using the RPC server, some values are returned as the wrong type. It looks like strings are being cast to floats somehow.

Steps To Reproduce

  1. Start up the rpc server (dbt rpc)
  2. Make a POST request with the following body:
{
    "jsonrpc": "2.0",
    "method": "run_sql",
    "id": "test3",
    "params": {
        "timeout": 60,
        "name": "test query",
        "sql": "d2l0aCBiYXNlIGFzICgKCiAgc2VsZWN0CgogICAgJ3Rlc3QnOjp2YXJjaGFyIGFzIHRlc3RfdmFyY2hhciwKCiAgICAyNTo6dmFyY2hhciBhcyB0ZXN0X2ludF92YXJjaGFyLAoKICAgICctMjUnOjp2YXJjaGFyIGFzIHRoaXNfd29ya3MKCikKCnNlbGVjdCAKCiAgKiwKCiAgTFBBRCh0ZXN0X3ZhcmNoYXIsIDEwLCAnMCcpIGFzIHBhZF92YXJjaGFyLAoKICBMUEFEKHRlc3RfaW50X3ZhcmNoYXIsIDEwLCAnMCcpIGFzIHBhZF9pbnRfdmFyY2hhciwKCiAgTFBBRCh0aGlzX3dvcmtzLCAxMCwgJzAnKSBhcyBwYWRfdGhpc193b3JrcwoKZnJvbSBiYXNlCg=="
    }
}

The SQL here is the base64 encoded version of:

with base as (

  select

    'test'::varchar as test_varchar,

    25::varchar as test_int_varchar,

    '-25'::varchar as this_works

)

select 

  *,

  LPAD(test_varchar, 10, '0') as pad_varchar,

  LPAD(test_int_varchar, 10, '0') as pad_int_varchar,

  LPAD(this_works, 10, '0') as pad_this_works

from base
  1. Copy the request_token from the response
  2. Make another request with the following body:
{
    "jsonrpc": "2.0",
    "method": "poll",
    "id": "test4",
    "params": {
        "request_token": <the-copied-request-token>,
        "logs": true,
        "logs_start": 0
    }
}
  1. Inspect the rows that are returned. You should see something like this:
"rows": [
    [
        "test",
        25.0,
        -25.0,
        "000000test",
        25.0,
        "0000000-25"
    ]
]
  1. If you run the same query in psql, you will get the following output:
 test_varchar | test_int_varchar | this_works | pad_varchar | pad_int_varchar | pad_this_works
--------------+------------------+------------+-------------+-----------------+----------------
 test         | 25               | -25        | 000000test  | 0000000025      | 0000000-25
  1. Notice how pad_int_varchar is padded since it is represented as a varchar, and not an integer

Expected behavior

I would expect that all of the rows returned would be strings (since they were casted), and not floats. As seen in step 5, the second to last row is not padded since it is being represented as a float.

Screenshots and log output

I don't have any other relevant logs or output, but I'm happy to add more of the RPC response to this issue if that is helpful!

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.18.1

Up to date!

Plugins:
  - postgres: 0.18.1
  - spark: 0.18.1.1

The operating system you're using:

macOS Catalina
Version 10.15.7

The output of python --version:

Python 3.7.7

Additional context

N/A

@aaronraff aaronraff added bug Something isn't working triage labels Dec 29, 2020
@jtcohen6 jtcohen6 added rpc Issues related to dbt's RPC server and removed triage labels Dec 31, 2020
@jtcohen6
Copy link
Contributor

Thanks for this reproduction case @aaronraff! I have an even simpler one:

select '0000000025'::varchar as my_varchar_column

base64 encoded: c2VsZWN0ICcwMDAwMDAwMDI1Jzo6dmFyY2hhciBhcyBteV92YXJjaGFyX2NvbHVtbg==

"table": {
    "column_names": [
        "my_varchar_column"
    ],
    "rows": [
        [
            25.0
        ]
    ]
}

I think the issue here is with how the RPC server handles data types implicitly, rather than storing them explicitly alongside JSON results.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 1, 2021

This is definitely an issue with agate. I dived a bit deeper into the example from above. The method that switches this from a padded string to an integer is table_from_data_flat, called by executeget_result_from_cursor:

https://github.com/fishtown-analytics/dbt/blob/1e5a7878e5188fa58f91e2f257041f28fdb948fc/core/dbt/adapters/sql/connections.py#L112-L125

Here's the simple reproduction case:

>>> import dbt.clients.agate_helper
>>> data = [{'my_varchar_column': '0000000025'}]
>>> column_names = ['my_varchar_column']
>>> agate_tbl = dbt.clients.agate_helper.table_from_data_flat(
...             data,
...             column_names
...         )
>>> agate_tbl.print_table()
| my_varchar_column |
| ----------------- |
|                25 |

Agate does a lot of type inference under the hood. We enable user-supplied column type overrides for seeds, but I don't think that makes a lot of sense for one-off RPC queries. Really, we should be getting the data type from the database, though that may mean handling some low-level differences across cursors. Here's what cursor.description looks like for Postgres + Redshift:

(Column(name='my_varchar_column', type_code=1043),)

Versus Snowflake:

[('MY_VARCHAR_COLUMN', 2, None, 16777216, None, None, False)]

Whereas other databases, e.g. BigQuery, reimplement adapter.execute and use other methods to convert fetched results to an agate table. So the intervention needed may vary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rpc Issues related to dbt's RPC server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants