Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid batch api to carto create table #1731

Merged
merged 3 commits into from
Apr 13, 2021

Conversation

Mmoncadaisla
Copy link
Contributor

Context

In this small PR from Support we aim to avoid an undesired behavior when using the to_carto function while a Batch API job is already running, by using the standard SQL API endpoint when possible.

If the table wouldn't exist in the target CARTO account, it'd be created through the _create_table_from_columns method which seems to be using the Batch API for what seems a light operation (create table + cartodbfy an empty table).

This could lead to an undesired behavior if there is an active long-time running job, as the referred operation would be enqueued.

Further details can be found in this CH story.

NOTE: We're unsure there's any other block where this could be implemented safely without reaching any timeout limitation, such as here, and please let us know if there is any approach that'd make more sense for this purpose 🙏🏼

PR changes

  • io/managers/context_manager.py: Use execute_query instead of execute_long_running_query in _create_table_from_columns
  • tests/unit/io/managers/test_context_manager.py: Check that execute_query is ran instead of execute_long_running_query in test_copy_from

@Mmoncadaisla Mmoncadaisla requested a review from Jesus89 April 8, 2021 09:51
@Mmoncadaisla Mmoncadaisla self-assigned this Apr 8, 2021
Copy link
Member

@Jesus89 Jesus89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jgoizueta, is there any inconvenience creating tables without the Batch API?

@jgoizueta
Copy link
Contributor

🤔 I see no problem in principle, LGTM.

@Mmoncadaisla
Copy link
Contributor Author

Mmoncadaisla commented Apr 13, 2021

Thank you both!!

@Jesus89 it is our understanding that the _truncate_and_drop_add_columns function would need to be executed through the Batch API.

However, do you think it'd make sense to implement this approach in the _truncate_table function? We've been able to perform a successful test with this approach on a ~4 GB (11M rows) dataset (size according to the Dashboard).

def _truncate_table(self, table_name, schema, cartodbfy):
        log.debug('TRUNCATE table "{}"'.format(table_name))
        query = 'BEGIN; {truncate}; {cartodbfy}; COMMIT;'.format(
            truncate=_truncate_table_query(table_name),
            cartodbfy=_cartodbfy_query(table_name, schema) if cartodbfy else '')
        self.execute_query(query)
>>> gdf = cartoframes.read_carto('select * from table_name limit 5')
>>> cartoframes.to_carto(gdf, 'table_name', if_exists='replace')
Success! Data uploaded to table "table_name" correctly
'table_name'

@Jesus89
Copy link
Member

Jesus89 commented Apr 13, 2021

It's a quick operation (truncate_table) so I think it could use the single SQL API, yes.

@Mmoncadaisla
Copy link
Contributor Author

Added the change for the _truncate_table function, could you please take a look at it? 🙏🏼

@Jesus89
Copy link
Member

Jesus89 commented Apr 13, 2021

👍

@Mmoncadaisla Mmoncadaisla merged commit a8fab58 into develop Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants