Feature/#755 - Open append() method on datanodes #824

trgiangdo · 2023-11-10T05:44:20Z

Avaiga/taipy#408

This PR opens new .append() API for CSV, Excel, Json, MongoCollection, SQLTable data node types.

Notes on other data node types:

GenericDataNode, we can open a new attribute to let the user define their own append method. Let me know if we want to do this.
InMemoryDataNode, we don't know what kind of data stored in memory.
PickleDataNode, we don't know what kind of data stored in the pickle file.
SQLDataNode, the user can just modify the read query to append data.
ParquetDataNode, the current method is to read the data, append the data, then rewrite, which is slow and not working properly with modin. Using fastparquet, there is a simpler way to do this, but I'm working on finding a better way using pyarrow, which is our default engine.

github-actions · 2023-11-10T05:58:07Z

☂️ Python Cov

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
8335	7902	95%	85%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
src/taipy/core/config/checkers/_data_node_config_checker.py	98%	🟢
src/taipy/core/config/data_node_config.py	98%	🟢
src/taipy/core/data/_abstract_sql.py	89%	🟢
src/taipy/core/data/_data_converter.py	97%	🟢
src/taipy/core/data/csv.py	95%	🟢
src/taipy/core/data/data_node.py	99%	🟢
src/taipy/core/data/excel.py	96%	🟢
src/taipy/core/data/json.py	96%	🟢
src/taipy/core/data/mongo.py	90%	🟢
src/taipy/core/data/parquet.py	98%	🟢
src/taipy/core/data/sql.py	98%	🟢
src/taipy/core/data/sql_table.py	100%	🟢
src/taipy/core/exceptions/exceptions.py	92%	🟢
TOTAL	96%	🟢

updated for commit: 2682eb4 by action🐍

jrobinAV · 2023-11-10T11:39:56Z

On SQLDataNode, I believe we need 2 different methods for writing and appending data. Writting will still need to write everything. The orchestrator will require that. Then the append method should be used for the manual edition.
Same for read and filter...

trgiangdo · 2023-11-10T11:48:04Z

According to https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#example-with-a-microsoft-sql-database-table_1, not really.

In that example, the write_query_builder() will return 2 queries, one delete all data, and the next one will write the data.

There is no way to know for sure which table to delete data from when writing SQLDataNode, so we can not force it I think

jrobinAV · 2023-11-10T12:19:21Z

According to https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#example-with-a-microsoft-sql-database-table_1, not really.

In that example, the write_query_builder() will return 2 queries, one delete all data, and the next one will write the data.

There is no way to know for sure which table to delete data from when writing SQLDataNode, so we can not force it I think

My point is that the SqlDataNode must expose two methods like any other Data node: write(all_data) and append(data).

The write basically executes the 2 SQL queries returned by the write_query_builder().
The append should execute one SQL query as well. This query can be different from the second query of the write_query_builder(). That means we need an append_query_builder(). Am I missing something?

trgiangdo · 2023-11-10T12:21:59Z

Oh, then it can be something like sql_dn.append(data, append_query_builder) where append_query_builder is a different method.
The signature will be different from other data nodes though

jrobinAV · 2023-11-10T12:28:30Z

No, the append_query_builder can be set in the data node config, like the write_query_builder, but it will be executed at run time.
Something like:

The configure method could look like:

sales_history_cfg = Config.configure_sql_data_node(
    id="sales_history",
    db_username="admin",
    db_password="password",
    db_name="taipy",
    db_engine="mssql",
    read_query="SELECT * from sales",
    write_query_builder=write_query_builder,
    append_query_builder=append_query_builder,
    db_driver="ODBC Driver 17 for SQL Server",
    db_extra_args={"TrustServerCertificate": "yes"},
)

The data node implementation would look like this:

    def _do_append(self, data) -> None:
        queries = self.properties.get(self._APPEND_QUERY_BUILDER_KEY)(data)
        for query in queries:
            connection.execute(query)

What do you think ?

trgiangdo · 2023-11-10T12:29:52Z

I see, that makes much more sense. Thank you.

jrobinAV

The test coverage is great! Thx!

…nd() method

trgiangdo · 2023-11-10T15:58:01Z

In a few latest commits:

Add append() method to ParquetDataNode. However, this only works if the fastparquet package is installed.
Add append() method to SQLDataNode, which uses the queries return by the append_query_builder provided at the config level.

tests/core/data/test_parquet_data_node.py

src/taipy/core/data/sql.py

src/taipy/core/config/data_node_config.py

…r at SQLDataNodeConfig

tests/core/data/test_parquet_data_node.py

feat: open append() method on datanodes

8a53561

trgiangdo requested review from jrobinAV, joaoandre-avaiga and toan-quach November 10, 2023 05:44

jrobinAV reviewed Nov 10, 2023

View reviewed changes

jrobinAV previously approved these changes Nov 10, 2023

View reviewed changes

trgiangdo added 2 commits November 10, 2023 22:13

feat: open append() for ParquetDN with fastparquet engine

0ea22da

feat: add append_query_builder attribute to SQLDataNode and open appe…

e142305

…nd() method

trgiangdo dismissed jrobinAV’s stale review via e142305 November 10, 2023 15:51

fix: remove typing to fix mypy error

da5e7b8

trgiangdo requested a review from jrobinAV November 10, 2023 15:56

jrobinAV reviewed Nov 10, 2023

View reviewed changes

tests/core/data/test_parquet_data_node.py Outdated Show resolved Hide resolved

src/taipy/core/data/sql.py Show resolved Hide resolved

src/taipy/core/config/data_node_config.py Show resolved Hide resolved

feat: raise exception if append without specified append_query_builde…

5812d79

…r at SQLDataNodeConfig

trgiangdo requested a review from jrobinAV November 12, 2023 15:46

jrobinAV previously approved these changes Nov 12, 2023

View reviewed changes

tests: skip append parquet tests if fastparquet is not installed

2682eb4

trgiangdo dismissed jrobinAV’s stale review via 2682eb4 November 13, 2023 01:59

trgiangdo requested a review from jrobinAV November 13, 2023 01:59

toan-quach approved these changes Nov 13, 2023

View reviewed changes

jrobinAV reviewed Nov 13, 2023

View reviewed changes

tests/core/data/test_parquet_data_node.py Show resolved Hide resolved

jrobinAV approved these changes Nov 15, 2023

View reviewed changes

trgiangdo merged commit 405568c into develop Nov 15, 2023
41 of 42 checks passed

trgiangdo deleted the feature/#755-append-to-datanodes branch November 15, 2023 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/#755 - Open append() method on datanodes #824

Feature/#755 - Open append() method on datanodes #824

trgiangdo commented Nov 10, 2023 •

edited

Loading

github-actions bot commented Nov 10, 2023 •

edited

Loading

jrobinAV commented Nov 10, 2023

trgiangdo commented Nov 10, 2023

jrobinAV commented Nov 10, 2023

trgiangdo commented Nov 10, 2023

jrobinAV commented Nov 10, 2023

trgiangdo commented Nov 10, 2023

jrobinAV left a comment

trgiangdo commented Nov 10, 2023

Feature/#755 - Open append() method on datanodes #824

Feature/#755 - Open append() method on datanodes #824

Conversation

trgiangdo commented Nov 10, 2023 • edited Loading

github-actions bot commented Nov 10, 2023 • edited Loading

☂️ Python Cov

Overall Coverage

New Files

Modified Files

jrobinAV commented Nov 10, 2023

trgiangdo commented Nov 10, 2023

jrobinAV commented Nov 10, 2023

trgiangdo commented Nov 10, 2023

jrobinAV commented Nov 10, 2023

trgiangdo commented Nov 10, 2023

jrobinAV left a comment

Choose a reason for hiding this comment

trgiangdo commented Nov 10, 2023

trgiangdo commented Nov 10, 2023 •

edited

Loading

github-actions bot commented Nov 10, 2023 •

edited

Loading