This page explains how to use PyEXASOL with maximum efficiency.
Wireless network bandwidth is usually the main bottleneck for office laptops. Compression
flag enables zlib compression both for common fetching and for fast HTTP transport. It may improve overall performance by factor 4-8x.
C = pyexasol.connect(... , compression=True)
It is okay to use common fetching for small data sets up to 1M of records.
For anything bigger than that you should always consider HTTP transport (export_*
and import_*
functions). It prevents creation of intermediate Python objects and scales much better.
pd = C.export_to_pandas('SELECT * FROM table')
C.export_to_file('my_file.csv', 'SELECT * FROM table')
C.import_from_pandas(pd, 'table')
C.import_from_file('my_file.csv', 'table')
Iterator syntax is much shorter and easier to use. Also, there is no need to check for None
or empty list []
to detect end of result set.
stmt = C.execute('SELECT * FROM table')
for row in stmt:
print(row)
PyEXASOL does not support INSERT prepared statements on purpose. You should use more efficient IMPORT
command instead, even for small batch inserts.
You may use import_from_iterable
to insert data from list of rows.
data = [
(1, 'John'),
(2, 'Gill'),
(3, 'Ben')
]
C.import_from_iterable(data, 'table')
Please note: if you want to INSERT only single row into Exasol, you're probably doing something wrong. We strongly suggest to use row-based databases (MySQL, PostgreSQL, etc) to track status of ETL jobs and to avoid putting any technical tables in Exasol.
Unlike standard WebSocket Python driver, PyEXASOL supports full connection strings and node redundancy. For example, connection string myexasol1..5:8563
will be expanded as:
myexasol1:8563
myexasol2:8563
myexasol3:8563
myexasol4:8563
myexasol5:8563
PyEXASOL tries to connect to random node from this list. If it fails, it tries to connect to another random node. The main benefits of this approach are:
- Multiple connections are evenly distributed across the whole cluster;
- If one or more nodes are down, but the cluster is still operational due to redundancy, users will be able to connect without any problems or random error messages;
PyEXASOL defaults to standard json
library for best compatibility. It is sufficient for majority of use-cases. However, if you are unhappy with HTTP transport and you wish to load large amounts of data using standard fetching, we highly recommend trying faster JSON libraries.
json_lib=rapidjson
pip install pyexasol[rapidjson]
Rapidjson provides significant performance boost and it is well maintained by creators. PyEXASOL defaults to number_mode=NM_NATIVE
. Exasol server wraps big decimals with quotes and returns as strings, so it should be a safe option.
json_lib=ujson
pip install pyexasol[ujson]
Ujson provides best performance in our internal tests, but it is abandoned by creators. Also, float values may lose precision with ujson.
You may try any other json library. All you need to do is to overload _init_json()
method in ExaConnection
.
It is usually good idea to use get_columns
and get_columns_sql
functions to get structure of tables instead of querying EXA_ALL_COLUMNS
system view.
cols = C.ext.get_columns('table')
cols = C.ext.get_columns_sql('SELECT a, b, c FROM table')
Query to system view might be blocked by transaction locks, but ext function calls are not affected by this problem.