Skip to content
This repository has been archived by the owner on Jan 2, 2024. It is now read-only.

Feature/#755 - Open append() method on datanodes #824

Merged
merged 6 commits into from
Nov 15, 2023

Conversation

trgiangdo
Copy link
Member

@trgiangdo trgiangdo commented Nov 10, 2023

Avaiga/taipy#408

This PR opens new .append() API for CSV, Excel, Json, MongoCollection, SQLTable data node types.

Notes on other data node types:

  • GenericDataNode, we can open a new attribute to let the user define their own append method. Let me know if we want to do this.
  • InMemoryDataNode, we don't know what kind of data stored in memory.
  • PickleDataNode, we don't know what kind of data stored in the pickle file.
  • SQLDataNode, the user can just modify the read query to append data.
  • ParquetDataNode, the current method is to read the data, append the data, then rewrite, which is slow and not working properly with modin. Using fastparquet, there is a simpler way to do this, but I'm working on finding a better way using pyarrow, which is our default engine.

Copy link

github-actions bot commented Nov 10, 2023

☂️ Python Cov

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
8335 7902 95% 85% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
src/taipy/core/config/checkers/_data_node_config_checker.py 98% 🟢
src/taipy/core/config/data_node_config.py 98% 🟢
src/taipy/core/data/_abstract_sql.py 89% 🟢
src/taipy/core/data/_data_converter.py 97% 🟢
src/taipy/core/data/csv.py 95% 🟢
src/taipy/core/data/data_node.py 99% 🟢
src/taipy/core/data/excel.py 96% 🟢
src/taipy/core/data/json.py 96% 🟢
src/taipy/core/data/mongo.py 90% 🟢
src/taipy/core/data/parquet.py 98% 🟢
src/taipy/core/data/sql.py 98% 🟢
src/taipy/core/data/sql_table.py 100% 🟢
src/taipy/core/exceptions/exceptions.py 92% 🟢
TOTAL 96% 🟢

updated for commit: 2682eb4 by action🐍

@jrobinAV
Copy link
Member

On SQLDataNode, I believe we need 2 different methods for writing and appending data. Writting will still need to write everything. The orchestrator will require that. Then the append method should be used for the manual edition.
Same for read and filter...

@trgiangdo
Copy link
Member Author

According to https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#example-with-a-microsoft-sql-database-table_1, not really.

In that example, the write_query_builder() will return 2 queries, one delete all data, and the next one will write the data.

There is no way to know for sure which table to delete data from when writing SQLDataNode, so we can not force it I think

@jrobinAV
Copy link
Member

According to https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#example-with-a-microsoft-sql-database-table_1, not really.

In that example, the write_query_builder() will return 2 queries, one delete all data, and the next one will write the data.

There is no way to know for sure which table to delete data from when writing SQLDataNode, so we can not force it I think

My point is that the SqlDataNode must expose two methods like any other Data node: write(all_data) and append(data).

  • The write basically executes the 2 SQL queries returned by the write_query_builder().
  • The append should execute one SQL query as well. This query can be different from the second query of the write_query_builder(). That means we need an append_query_builder(). Am I missing something?

@trgiangdo
Copy link
Member Author

Oh, then it can be something like sql_dn.append(data, append_query_builder) where append_query_builder is a different method.
The signature will be different from other data nodes though

@jrobinAV
Copy link
Member

No, the append_query_builder can be set in the data node config, like the write_query_builder, but it will be executed at run time.
Something like:

The configure method could look like:

sales_history_cfg = Config.configure_sql_data_node(
    id="sales_history",
    db_username="admin",
    db_password="password",
    db_name="taipy",
    db_engine="mssql",
    read_query="SELECT * from sales",
    write_query_builder=write_query_builder,
    append_query_builder=append_query_builder,
    db_driver="ODBC Driver 17 for SQL Server",
    db_extra_args={"TrustServerCertificate": "yes"},
)

The data node implementation would look like this:

    def _do_append(self, data) -> None:
        queries = self.properties.get(self._APPEND_QUERY_BUILDER_KEY)(data)
        for query in queries:
            connection.execute(query)

What do you think ?

@trgiangdo
Copy link
Member Author

I see, that makes much more sense. Thank you.

Copy link
Member

@jrobinAV jrobinAV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is great! Thx!

jrobinAV
jrobinAV previously approved these changes Nov 10, 2023
@trgiangdo trgiangdo requested a review from jrobinAV November 10, 2023 15:56
@trgiangdo
Copy link
Member Author

In a few latest commits:

  • Add append() method to ParquetDataNode. However, this only works if the fastparquet package is installed.
  • Add append() method to SQLDataNode, which uses the queries return by the append_query_builder provided at the config level.

tests/core/data/test_parquet_data_node.py Outdated Show resolved Hide resolved
src/taipy/core/data/sql.py Show resolved Hide resolved
src/taipy/core/config/data_node_config.py Show resolved Hide resolved
@trgiangdo trgiangdo requested a review from jrobinAV November 12, 2023 15:46
jrobinAV
jrobinAV previously approved these changes Nov 12, 2023
@trgiangdo trgiangdo merged commit 405568c into develop Nov 15, 2023
41 of 42 checks passed
@trgiangdo trgiangdo deleted the feature/#755-append-to-datanodes branch November 15, 2023 09:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants