Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1665955: Semicolon breaks DataFrame.count() #2299

Open
Tim-Kracht opened this issue Sep 16, 2024 · 3 comments
Open

SNOW-1665955: Semicolon breaks DataFrame.count() #2299

Tim-Kracht opened this issue Sep 16, 2024 · 3 comments
Assignees
Labels
status-triage_done Initial triage done, will be further handled by the driver team

Comments

@Tim-Kracht
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.11.10 (main, Sep 7 2024, 01:03:31) [Clang 15.0.0 (clang-1500.3.9.4)]

  1. What operating system and processor architecture are you using?

macOS-14.6.1-arm64-arm-64bit

  1. What are the component versions in the environment (pip freeze)?

asn1crypto==1.5.1
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.3.2
cloudpickle==2.2.1
cryptography==43.0.1
filelock==3.16.0
idna==3.8
packaging==24.1
platformdirs==4.3.2
pycparser==2.22
PyJWT==2.9.0
pyOpenSSL==24.2.1
python-dotenv==1.0.1
pytz==2024.2
PyYAML==6.0.2
requests==2.32.3
snowflake-connector-python==3.12.1
snowflake-snowpark-python==1.21.1
sortedcontainers==2.4.0
tomlkit==0.13.2
typing_extensions==4.12.2
urllib3==2.2.2

  1. What did you do?

Called Session.sql() for a query that ends with a ; then called DataFrame.collect() and DataFrame.count()

  1. What did you expect to see?

I expected to see the same behavior from a SQL compilation perspective.
DataFrame.collect() worked as expected, DataFrame.count() raised a SQL compilation error
The same query without the semicolon runs fine in both cases.

from dotenv import load_dotenv
import logging
import os
import snowflake.snowpark as sp

def main():
    load_dotenv()
    args = {
        "account": os.getenv("SNOWFLAKE_ACCOUNT"),
        "user": os.getenv("SNOWFLAKE_USER"),
        "authenticator": "externalbrowser",
    }
    queries = ("select query_start_time from snowflake.account_usage.access_history limit 1", "select query_start_time from snowflake.account_usage.access_history limit 1;")
    session = sp.Session.builder.configs(args).create()
    
    for query in queries:
        print()
        df = None
        result = None
        count = None

        print(f"{query=}")
        df = session.sql(query=query)
        
        try:
            result = df.collect()
            print(f"{result=}")
        except Exception as ex:
            print(f"{ex=}")

        try:
            count = df.count()
            print(f"{count=}")
        except Exception as ex:
            print(f"{ex=}")

if __name__ == "__main__":
    for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
       logger = logging.getLogger(logger_name)
       logger.setLevel(logging.DEBUG)
       ch = logging.StreamHandler()
       ch.setLevel(logging.DEBUG)
       ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
       logger.addHandler(ch)
    main()

  1. Can you set logging to DEBUG and collect the logs?
query='select query_start_time from snowflake.account_usage.access_history limit 1'
2024-09-16 15:56:22,571 - MainThread cursor.py:916 - execute() - DEBUG - executing SQL/command
2024-09-16 15:56:22,571 - MainThread cursor.py:931 - execute() - DEBUG - query: [select query_start_time from snowflake.account_usage.access_history limit 1]
2024-09-16 15:56:22,571 - MainThread connection.py:1651 - _next_sequence_counter() - DEBUG - sequence counter: 1
2024-09-16 15:56:22,571 - MainThread cursor.py:641 - _execute_helper() - DEBUG - Request id: x
2024-09-16 15:56:22,571 - MainThread cursor.py:643 - _execute_helper() - DEBUG - running query [select query_start_time from snowflake.account_usage.access_history limit 1]
2024-09-16 15:56:22,571 - MainThread cursor.py:650 - _execute_helper() - DEBUG - is_file_transfer: True
2024-09-16 15:56:22,571 - MainThread connection.py:1312 - cmd_query() - DEBUG - _cmd_query
2024-09-16 15:56:22,571 - MainThread _query_context_cache.py:155 - serialize_to_dict() - DEBUG - serialize_to_dict() called
2024-09-16 15:56:22,571 - MainThread connection.py:1341 - cmd_query() - DEBUG - sql=[select query_start_time from snowflake.account_usage.access_history limit 1], sequence_id=[1], is_file_transfer=[False]
2024-09-16 15:56:22,572 - MainThread network.py:487 - request() - DEBUG - Opentelemtry otel injection failed because of: No module named 'opentelemetry'
2024-09-16 15:56:22,572 - MainThread network.py:1187 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 1/1 active sessions
2024-09-16 15:56:22,572 - MainThread network.py:886 - _request_exec_wrapper() - DEBUG - remaining request timeout: N/A ms, retry cnt: 1
2024-09-16 15:56:22,572 - MainThread network.py:868 - add_request_guid() - DEBUG - Request guid: x
2024-09-16 15:56:22,572 - MainThread network.py:1046 - _request_exec() - DEBUG - socket timeout: 60
2024-09-16 15:56:25,086 - MainThread connectionpool.py:474 - _make_request() - DEBUG - https://x.snowflakecomputing.com:443 "POST /queries/v1/query-request?requestId=x&request_guid=x HTTP/1.1" 200 None
2024-09-16 15:56:25,087 - MainThread network.py:1073 - _request_exec() - DEBUG - SUCCESS
2024-09-16 15:56:25,088 - MainThread network.py:1192 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 0/1 active sessions
2024-09-16 15:56:25,088 - MainThread network.py:750 - _post_request() - DEBUG - ret[code] = None, after post request
2024-09-16 15:56:25,088 - MainThread network.py:776 - _post_request() - DEBUG - Query id: x
2024-09-16 15:56:25,088 - MainThread _query_context_cache.py:191 - deserialize_json_dict() - DEBUG - deserialize_json_dict() called: data from server: {'entries': [{'id': 0, 'timestamp': 1726516585060215, 'priority': 0}]}
2024-09-16 15:56:25,088 - MainThread _query_context_cache.py:232 - deserialize_json_dict() - DEBUG - deserialize {'id': 0, 'timestamp': 1726516585060215, 'priority': 0}
2024-09-16 15:56:25,088 - MainThread _query_context_cache.py:101 - _sync_priority_map() - DEBUG - sync_priority_map called priority_map size = 0, new_priority_map size = 1
2024-09-16 15:56:25,088 - MainThread _query_context_cache.py:127 - trim_cache() - DEBUG - trim_cache() called. treeSet size is 1 and cache capacity is 5
2024-09-16 15:56:25,088 - MainThread _query_context_cache.py:136 - trim_cache() - DEBUG - trim_cache() returns. treeSet size is 1 and cache capacity is 5
2024-09-16 15:56:25,088 - MainThread _query_context_cache.py:271 - deserialize_json_dict() - DEBUG - deserialize_json_dict() returns
2024-09-16 15:56:25,088 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516585060215, 0)
2024-09-16 15:56:25,088 - MainThread cursor.py:990 - execute() - DEBUG - sfqid: x
2024-09-16 15:56:25,088 - MainThread cursor.py:996 - execute() - DEBUG - query execution done
2024-09-16 15:56:25,088 - MainThread cursor.py:1010 - execute() - DEBUG - SUCCESS
2024-09-16 15:56:25,088 - MainThread cursor.py:1029 - execute() - DEBUG - PUT OR GET: False
2024-09-16 15:56:25,088 - MainThread cursor.py:1142 - _init_result_and_meta() - DEBUG - Query result format: arrow
2024-09-16 15:56:25,089 - MainThread cursor.py:1156 - _init_result_and_meta() - INFO - Number of results in first chunk: 1
2024-09-16 15:56:25,089 - MainThread server_connection.py:421 - run_query() - DEBUG - Execute query [queryID: x] select query_start_time from snowflake.account_usage.access_history limit 1
2024-09-16 15:56:25,089 - MainThread result_batch.py:68 - _create_nanoarrow_iterator() - DEBUG - Using nanoarrow as the arrow data converter
2024-09-16 15:56:25,089 - MainThread CArrowIterator.cpp:120 - CArrowIterator() - DEBUG - Arrow BatchSize: 1
2024-09-16 15:56:25,089 - MainThread CArrowChunkIterator.cpp:46 - CArrowChunkIterator() - DEBUG - Arrow chunk info: batchCount 1, columnCount 1, use_numpy: 0
2024-09-16 15:56:25,089 - MainThread nanoarrow_arrow_iterator.cpython-311-darwin.so:0 - __cinit__() - DEBUG - Batches read: 0
2024-09-16 15:56:25,089 - MainThread result_set.py:87 - result_set_iterator() - DEBUG - beginning to schedule result batch downloads
2024-09-16 15:56:25,089 - MainThread CArrowChunkIterator.cpp:70 - next() - DEBUG - Current batch index: 0, rows in current batch: 1
result=[Row(QUERY_START_TIME=datetime.datetime(2023, 9, 1, 6, 27, 7, 973000, tzinfo=<DstTzInfo 'America/Los_Angeles' PDT-1 day, 17:00:00 DST>))]
2024-09-16 15:56:26,071 - MainThread cursor.py:916 - execute() - DEBUG - executing SQL/command
2024-09-16 15:56:26,071 - MainThread cursor.py:931 - execute() - DEBUG - query: [SELECT count(1) AS "COUNT(LITERAL())" FROM (select query_start_time from snowfla...]
2024-09-16 15:56:26,071 - MainThread connection.py:1651 - _next_sequence_counter() - DEBUG - sequence counter: 2
2024-09-16 15:56:26,071 - MainThread cursor.py:641 - _execute_helper() - DEBUG - Request id: x
2024-09-16 15:56:26,071 - MainThread cursor.py:643 - _execute_helper() - DEBUG - running query [SELECT count(1) AS "COUNT(LITERAL())" FROM (select query_start_time from snowfla...]
2024-09-16 15:56:26,071 - MainThread cursor.py:650 - _execute_helper() - DEBUG - is_file_transfer: True
2024-09-16 15:56:26,071 - MainThread connection.py:1312 - cmd_query() - DEBUG - _cmd_query
2024-09-16 15:56:26,071 - MainThread _query_context_cache.py:155 - serialize_to_dict() - DEBUG - serialize_to_dict() called
2024-09-16 15:56:26,071 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516585060215, 0)
2024-09-16 15:56:26,071 - MainThread _query_context_cache.py:180 - serialize_to_dict() - DEBUG - serialize_to_dict(): data to send to server {'entries': [{'id': 0, 'timestamp': 1726516585060215, 'priority': 0, 'context': {}}]}
2024-09-16 15:56:26,071 - MainThread connection.py:1341 - cmd_query() - DEBUG - sql=[SELECT count(1) AS "COUNT(LITERAL())" FROM (select query_start_time from snowfla...], sequence_id=[2], is_file_transfer=[False]
2024-09-16 15:56:26,072 - MainThread network.py:487 - request() - DEBUG - Opentelemtry otel injection failed because of: No module named 'opentelemetry'
2024-09-16 15:56:26,072 - MainThread network.py:1187 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 1/1 active sessions
2024-09-16 15:56:26,072 - MainThread network.py:886 - _request_exec_wrapper() - DEBUG - remaining request timeout: N/A ms, retry cnt: 1
2024-09-16 15:56:26,072 - MainThread network.py:868 - add_request_guid() - DEBUG - Request guid: x
2024-09-16 15:56:26,072 - MainThread network.py:1046 - _request_exec() - DEBUG - socket timeout: 60
2024-09-16 15:56:28,213 - MainThread connectionpool.py:474 - _make_request() - DEBUG - https://x.snowflakecomputing.com:443 "POST /queries/v1/query-request?requestId=x&request_guid=x HTTP/1.1" 200 None
2024-09-16 15:56:28,214 - MainThread network.py:1073 - _request_exec() - DEBUG - SUCCESS
2024-09-16 15:56:28,215 - MainThread network.py:1192 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 0/1 active sessions
2024-09-16 15:56:28,215 - MainThread network.py:750 - _post_request() - DEBUG - ret[code] = None, after post request
2024-09-16 15:56:28,215 - MainThread network.py:776 - _post_request() - DEBUG - Query id: x
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:191 - deserialize_json_dict() - DEBUG - deserialize_json_dict() called: data from server: {'entries': [{'id': 0, 'timestamp': 1726516588190659, 'priority': 0}]}
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516585060215, 0)
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:232 - deserialize_json_dict() - DEBUG - deserialize {'id': 0, 'timestamp': 1726516588190659, 'priority': 0}
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:101 - _sync_priority_map() - DEBUG - sync_priority_map called priority_map size = 0, new_priority_map size = 1
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:127 - trim_cache() - DEBUG - trim_cache() called. treeSet size is 1 and cache capacity is 5
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:136 - trim_cache() - DEBUG - trim_cache() returns. treeSet size is 1 and cache capacity is 5
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:271 - deserialize_json_dict() - DEBUG - deserialize_json_dict() returns
2024-09-16 15:56:28,215 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516588190659, 0)
2024-09-16 15:56:28,216 - MainThread cursor.py:990 - execute() - DEBUG - sfqid: x
2024-09-16 15:56:28,216 - MainThread cursor.py:996 - execute() - DEBUG - query execution done
2024-09-16 15:56:28,216 - MainThread cursor.py:1010 - execute() - DEBUG - SUCCESS
2024-09-16 15:56:28,216 - MainThread cursor.py:1029 - execute() - DEBUG - PUT OR GET: False
2024-09-16 15:56:28,216 - MainThread cursor.py:1142 - _init_result_and_meta() - DEBUG - Query result format: arrow
2024-09-16 15:56:28,216 - MainThread cursor.py:1156 - _init_result_and_meta() - INFO - Number of results in first chunk: 1
2024-09-16 15:56:28,216 - MainThread server_connection.py:421 - run_query() - DEBUG - Execute query [queryID: x]  SELECT count(1) AS "COUNT(LITERAL())" FROM (select query_start_time from snowflake.account_usage.access_history limit 1) LIMIT 1
2024-09-16 15:56:28,216 - MainThread result_batch.py:68 - _create_nanoarrow_iterator() - DEBUG - Using nanoarrow as the arrow data converter
2024-09-16 15:56:28,216 - MainThread CArrowIterator.cpp:120 - CArrowIterator() - DEBUG - Arrow BatchSize: 1
2024-09-16 15:56:28,216 - MainThread CArrowChunkIterator.cpp:46 - CArrowChunkIterator() - DEBUG - Arrow chunk info: batchCount 1, columnCount 1, use_numpy: 0
2024-09-16 15:56:28,216 - MainThread nanoarrow_arrow_iterator.cpython-311-darwin.so:0 - __cinit__() - DEBUG - Batches read: 0
2024-09-16 15:56:28,216 - MainThread result_set.py:87 - result_set_iterator() - DEBUG - beginning to schedule result batch downloads
2024-09-16 15:56:28,216 - MainThread CArrowChunkIterator.cpp:70 - next() - DEBUG - Current batch index: 0, rows in current batch: 1
count=1

query='select query_start_time from snowflake.account_usage.access_history limit 1;'
2024-09-16 15:56:28,218 - MainThread cursor.py:916 - execute() - DEBUG - executing SQL/command
2024-09-16 15:56:28,218 - MainThread cursor.py:931 - execute() - DEBUG - query: [select query_start_time from snowflake.account_usage.access_history limit 1;]
2024-09-16 15:56:28,218 - MainThread connection.py:1651 - _next_sequence_counter() - DEBUG - sequence counter: 3
2024-09-16 15:56:28,218 - MainThread cursor.py:641 - _execute_helper() - DEBUG - Request id: x
2024-09-16 15:56:28,218 - MainThread cursor.py:643 - _execute_helper() - DEBUG - running query [select query_start_time from snowflake.account_usage.access_history limit 1;]
2024-09-16 15:56:28,218 - MainThread cursor.py:650 - _execute_helper() - DEBUG - is_file_transfer: True
2024-09-16 15:56:28,218 - MainThread connection.py:1312 - cmd_query() - DEBUG - _cmd_query
2024-09-16 15:56:28,218 - MainThread _query_context_cache.py:155 - serialize_to_dict() - DEBUG - serialize_to_dict() called
2024-09-16 15:56:28,218 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516588190659, 0)
2024-09-16 15:56:28,219 - MainThread _query_context_cache.py:180 - serialize_to_dict() - DEBUG - serialize_to_dict(): data to send to server {'entries': [{'id': 0, 'timestamp': 1726516588190659, 'priority': 0, 'context': {}}]}
2024-09-16 15:56:28,219 - MainThread connection.py:1341 - cmd_query() - DEBUG - sql=[select query_start_time from snowflake.account_usage.access_history limit 1;], sequence_id=[3], is_file_transfer=[False]
2024-09-16 15:56:28,220 - MainThread network.py:487 - request() - DEBUG - Opentelemtry otel injection failed because of: No module named 'opentelemetry'
2024-09-16 15:56:28,220 - MainThread network.py:1187 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 1/1 active sessions
2024-09-16 15:56:28,220 - MainThread network.py:886 - _request_exec_wrapper() - DEBUG - remaining request timeout: N/A ms, retry cnt: 1
2024-09-16 15:56:28,220 - MainThread network.py:868 - add_request_guid() - DEBUG - Request guid: x
2024-09-16 15:56:28,220 - MainThread network.py:1046 - _request_exec() - DEBUG - socket timeout: 60
2024-09-16 15:56:30,058 - MainThread connectionpool.py:474 - _make_request() - DEBUG - https://x.snowflakecomputing.com:443 "POST /queries/v1/query-request?requestId=x&request_guid=x HTTP/1.1" 200 None
2024-09-16 15:56:30,060 - MainThread network.py:1073 - _request_exec() - DEBUG - SUCCESS
2024-09-16 15:56:30,060 - MainThread network.py:1192 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 0/1 active sessions
2024-09-16 15:56:30,060 - MainThread network.py:750 - _post_request() - DEBUG - ret[code] = None, after post request
2024-09-16 15:56:30,060 - MainThread network.py:776 - _post_request() - DEBUG - Query id: x
2024-09-16 15:56:30,060 - MainThread _query_context_cache.py:191 - deserialize_json_dict() - DEBUG - deserialize_json_dict() called: data from server: {'entries': [{'id': 0, 'timestamp': 1726516590034491, 'priority': 0}]}
2024-09-16 15:56:30,060 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516588190659, 0)
2024-09-16 15:56:30,060 - MainThread _query_context_cache.py:232 - deserialize_json_dict() - DEBUG - deserialize {'id': 0, 'timestamp': 1726516590034491, 'priority': 0}
2024-09-16 15:56:30,061 - MainThread _query_context_cache.py:101 - _sync_priority_map() - DEBUG - sync_priority_map called priority_map size = 0, new_priority_map size = 1
2024-09-16 15:56:30,061 - MainThread _query_context_cache.py:127 - trim_cache() - DEBUG - trim_cache() called. treeSet size is 1 and cache capacity is 5
2024-09-16 15:56:30,061 - MainThread _query_context_cache.py:136 - trim_cache() - DEBUG - trim_cache() returns. treeSet size is 1 and cache capacity is 5
2024-09-16 15:56:30,061 - MainThread _query_context_cache.py:271 - deserialize_json_dict() - DEBUG - deserialize_json_dict() returns
2024-09-16 15:56:30,061 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516590034491, 0)
2024-09-16 15:56:30,061 - MainThread cursor.py:990 - execute() - DEBUG - sfqid: x
2024-09-16 15:56:30,061 - MainThread cursor.py:996 - execute() - DEBUG - query execution done
2024-09-16 15:56:30,061 - MainThread cursor.py:1010 - execute() - DEBUG - SUCCESS
2024-09-16 15:56:30,061 - MainThread cursor.py:1029 - execute() - DEBUG - PUT OR GET: False
2024-09-16 15:56:30,061 - MainThread cursor.py:1142 - _init_result_and_meta() - DEBUG - Query result format: arrow
2024-09-16 15:56:30,061 - MainThread cursor.py:1156 - _init_result_and_meta() - INFO - Number of results in first chunk: 1
2024-09-16 15:56:30,062 - MainThread server_connection.py:421 - run_query() - DEBUG - Execute query [queryID: x] select query_start_time from snowflake.account_usage.access_history limit 1;
2024-09-16 15:56:30,062 - MainThread result_batch.py:68 - _create_nanoarrow_iterator() - DEBUG - Using nanoarrow as the arrow data converter
2024-09-16 15:56:30,062 - MainThread CArrowIterator.cpp:120 - CArrowIterator() - DEBUG - Arrow BatchSize: 1
2024-09-16 15:56:30,062 - MainThread CArrowChunkIterator.cpp:46 - CArrowChunkIterator() - DEBUG - Arrow chunk info: batchCount 1, columnCount 1, use_numpy: 0
2024-09-16 15:56:30,062 - MainThread nanoarrow_arrow_iterator.cpython-311-darwin.so:0 - __cinit__() - DEBUG - Batches read: 0
2024-09-16 15:56:30,062 - MainThread result_set.py:87 - result_set_iterator() - DEBUG - beginning to schedule result batch downloads
2024-09-16 15:56:30,062 - MainThread CArrowChunkIterator.cpp:70 - next() - DEBUG - Current batch index: 0, rows in current batch: 1
result=[Row(QUERY_START_TIME=datetime.datetime(2023, 9, 9, 10, 2, 48, 666000, tzinfo=<DstTzInfo 'America/Los_Angeles' PDT-1 day, 17:00:00 DST>))]
2024-09-16 15:56:30,064 - MainThread cursor.py:916 - execute() - DEBUG - executing SQL/command
2024-09-16 15:56:30,064 - MainThread cursor.py:931 - execute() - DEBUG - query: [SELECT count(1) AS "COUNT(LITERAL())" FROM (select query_start_time from snowfla...]
2024-09-16 15:56:30,064 - MainThread connection.py:1651 - _next_sequence_counter() - DEBUG - sequence counter: 4
2024-09-16 15:56:30,064 - MainThread cursor.py:641 - _execute_helper() - DEBUG - Request id: x
2024-09-16 15:56:30,064 - MainThread cursor.py:643 - _execute_helper() - DEBUG - running query [SELECT count(1) AS "COUNT(LITERAL())" FROM (select query_start_time from snowfla...]
2024-09-16 15:56:30,064 - MainThread cursor.py:650 - _execute_helper() - DEBUG - is_file_transfer: True
2024-09-16 15:56:30,064 - MainThread connection.py:1312 - cmd_query() - DEBUG - _cmd_query
2024-09-16 15:56:30,064 - MainThread _query_context_cache.py:155 - serialize_to_dict() - DEBUG - serialize_to_dict() called
2024-09-16 15:56:30,064 - MainThread _query_context_cache.py:276 - log_cache_entries() - DEBUG - Cache Entry: (0, 1726516590034491, 0)
2024-09-16 15:56:30,065 - MainThread _query_context_cache.py:180 - serialize_to_dict() - DEBUG - serialize_to_dict(): data to send to server {'entries': [{'id': 0, 'timestamp': 1726516590034491, 'priority': 0, 'context': {}}]}
2024-09-16 15:56:30,065 - MainThread connection.py:1341 - cmd_query() - DEBUG - sql=[SELECT count(1) AS "COUNT(LITERAL())" FROM (select query_start_time from snowfla...], sequence_id=[4], is_file_transfer=[False]
2024-09-16 15:56:30,065 - MainThread network.py:487 - request() - DEBUG - Opentelemtry otel injection failed because of: No module named 'opentelemetry'
2024-09-16 15:56:30,066 - MainThread network.py:1187 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 1/1 active sessions
2024-09-16 15:56:30,066 - MainThread network.py:886 - _request_exec_wrapper() - DEBUG - remaining request timeout: N/A ms, retry cnt: 1
2024-09-16 15:56:30,066 - MainThread network.py:868 - add_request_guid() - DEBUG - Request guid: x
2024-09-16 15:56:30,066 - MainThread network.py:1046 - _request_exec() - DEBUG - socket timeout: 60
2024-09-16 15:56:30,205 - MainThread connectionpool.py:474 - _make_request() - DEBUG - https://x.snowflakecomputing.com:443 "POST /queries/v1/query-request?requestId=x&request_guid=x HTTP/1.1" 200 None
2024-09-16 15:56:30,207 - MainThread network.py:1073 - _request_exec() - DEBUG - SUCCESS
2024-09-16 15:56:30,207 - MainThread network.py:1192 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 0/1 active sessions
2024-09-16 15:56:30,207 - MainThread network.py:750 - _post_request() - DEBUG - ret[code] = 001003, after post request
2024-09-16 15:56:30,207 - MainThread network.py:776 - _post_request() - DEBUG - Query id: x
2024-09-16 15:56:30,207 - MainThread cursor.py:990 - execute() - DEBUG - sfqid: x
2024-09-16 15:56:30,207 - MainThread cursor.py:996 - execute() - DEBUG - query execution done
2024-09-16 15:56:30,207 - MainThread cursor.py:1071 - execute() - DEBUG - {'data': {'internalError': False, 'unredactedFromSecureObject': False, 'errorCode': '001003', 'age': 0, 'sqlState': '42000', 'queryId': 'x', 'line': -1, 'pos': -1, 'type': 'COMPILATION'}, 'code': '001003', 'message': "SQL compilation error:\nsyntax error line 1 at position 119 unexpected ';'.", 'success': False, 'headers': None}
ex=SnowparkSQLException("001003 (42000): x: SQL compilation error:\nsyntax error line 1 at position 119 unexpected ';'.", '1304', 'x')
2024-09-16 15:56:30,257 - MainThread connection.py:788 - close() - INFO - closed
2024-09-16 15:56:30,257 - MainThread telemetry.py:211 - close() - DEBUG - Closing telemetry client.
2024-09-16 15:56:30,257 - MainThread connection.py:794 - close() - INFO - No async queries seem to be running, deleting session
2024-09-16 15:56:30,257 - MainThread network.py:1187 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 1/1 active sessions
2024-09-16 15:56:30,257 - MainThread network.py:886 - _request_exec_wrapper() - DEBUG - remaining request timeout: 5000 ms, retry cnt: 1
2024-09-16 15:56:30,257 - MainThread network.py:868 - add_request_guid() - DEBUG - Request guid: x
2024-09-16 15:56:30,257 - MainThread network.py:1046 - _request_exec() - DEBUG - socket timeout: 60
2024-09-16 15:56:30,431 - MainThread connectionpool.py:474 - _make_request() - DEBUG - https://x.snowflakecomputing.com:443 "POST /session?delete=true&request_guid=x HTTP/1.1" 200 None
2024-09-16 15:56:30,433 - MainThread network.py:1073 - _request_exec() - DEBUG - SUCCESS
2024-09-16 15:56:30,433 - MainThread network.py:1192 - _use_requests_session() - DEBUG - Session status for SessionPool 'x.snowflakecomputing.com', SessionPool 0/1 active sessions
2024-09-16 15:56:30,433 - MainThread network.py:750 - _post_request() - DEBUG - ret[code] = None, after post request
2024-09-16 15:56:30,437 - MainThread _query_context_cache.py:141 - clear_cache() - DEBUG - clear_cache() called
2024-09-16 15:56:30,437 - MainThread connection.py:807 - close() - DEBUG - Session is closed
2024-09-16 15:56:30,438 - MainThread session.py:594 - close() - DEBUG - No-op because session x had been previously closed.
2024-09-16 15:56:30,438 - MainThread connection.py:779 - close() - DEBUG - Rest object has been destroyed, cannot close session
2024-09-16 15:56:30,438 - MainThread session.py:607 - close() - INFO - Closed session: x
@Tim-Kracht Tim-Kracht added bug Something isn't working needs triage Initial RCA is required labels Sep 16, 2024
@github-actions github-actions bot changed the title Semicolon breaks DataFrame.count() SNOW-1665955: Semicolon breaks DataFrame.count() Sep 16, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Sep 17, 2024
@sfc-gh-sghosh
Copy link

Hello @Tim-Kracht ,

Thanks for raising the issue, we are looking into it, will update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage Issue is under initial triage and removed needs triage Initial RCA is required labels Sep 17, 2024
@sfc-gh-sghosh
Copy link

Hello @Tim-Kracht ,

The semicolon (;) is typically used to terminate SQL statements in interactive environments like SnowSQL, Snowflake Worksheets, or UI-based tools, where multiple queries can be executed in sequence.

However, when executing SQL queries programmatically (such as through the Snowpark API, Python connectors, or other programmatic interfaces), the semicolon is not required and can lead to errors if included. Programmatic interfaces treat each query separately, so the semicolon is unnecessary and can cause syntax errors.

To fix the issue:
You can remove the semi-colon
or
clean_query = query.rstrip(';')

queries = (
    "select query_start_time from snowflake.account_usage.access_history limit 1",
    "select query_start_time from snowflake.account_usage.access_history limit 1;"
)

for query in queries:
    print()
    df = None
    result = None
    count = None

    # Remove trailing semicolon if it exists
    clean_query = query.rstrip(';')
    
    print(f"{clean_query=}")
    df = session.sql(clean_query)
    
    try:
        result = df.collect()
        print(f"{result=}")
    except Exception as ex:
        print(f"{ex=}")

    try:
        count = df.count()
        print(f"{count=}")
    except Exception as ex:
        print(f"{ex=}")

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed bug Something isn't working status-triage Issue is under initial triage labels Sep 17, 2024
@Tim-Kracht
Copy link
Author

Tim-Kracht commented Dec 6, 2024

Thank you. Yes, I noted that it works when removing the semicolon. I was mostly noting the inconsistent behavior:

I expected to see the same behavior from a SQL compilation perspective.
DataFrame.collect() worked as expected, DataFrame.count() raised a SQL compilation error
The same query without the semicolon runs fine in both cases.

In other words, why does Dataframe.collect() work with the semicolon, but Dataframe.count() does not? Given the same query, I would expect both to work or both to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants