Change iterrows method for index attribute in row data generation #1706
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
This small PR from Support aims to perform a minor change on the
_compute_copy_data
function used by_copy_from
and at the same time byto_carto
, to improve performance when dealing with large datasets in terms of rows and columns.The referred function is currently using the
pandas.DataFrame.iterrows()
method, which retrieves both the row index and a Series containing column values but only using the index afterward.Further context can be found in this CH story.
PR changes
This PR contains one file modification:
_compute_copy_data
functionDetected potential improvement
After performing a test with a 100.000 x 10 (rows x cols) dummy DataFrame, it seems that there could be a timing difference,
Moreover, a single
to_carto
test performed againstmmoncada
account using a 722720 rows x 172 columns retrieved the following results,A) With index instead of iterrows()
B) With actual iterrows()