Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: logging during ingest #219

Merged
merged 1 commit into from
May 30, 2024
Merged

Conversation

Mohizurkhan
Copy link
Contributor

@Mohizurkhan Mohizurkhan commented May 29, 2024

In some cases people think pg-bulk-ingest is either doing nothing during ingest, or loading all the data into memory. Adding in a bit more logging to make it clearer data is going into the database, including interleaved with fetching data from source (due to the magic of iterables/generators).

The complex logic on when to log is an attempt to make the logging suitable for both small and large numbers of rows without per ingest configuration, and keeping the noise down in the logs. It logs every 100 rows initially, going up to logging every million rows later in the ingest. For example it could log:

Ingesting from source into the database...
Ingested 100 rows...
Ingested 200 rows...
Ingested 300 rows...
Ingested 400 rows...
Ingested 500 rows...
Ingested 600 rows...
Ingested 700 rows...
Ingested 800 rows...
Ingested 900 rows...
Ingested 1000 rows...
Ingested 2000 rows...
Ingested 3000 rows...
Ingested 4000 rows...
Ingested 5000 rows...
Ingested 6000 rows...
Ingested 7000 rows...
Ingested 8000 rows...
Ingested 9000 rows...
Ingested 10000 rows...
Ingested 20000 rows...
Ingested 30000 rows...
Ingested 40000 rows...
Ingested 50000 rows...
Ingested 60000 rows...
Ingested 70000 rows...
Ingested 80000 rows...
Ingested 90000 rows...
Ingested 100000 rows...
Ingested 200000 rows...
Ingested 300000 rows...
Ingested 400000 rows...
Ingested 500000 rows...
Ingested 600000 rows...
Ingested 700000 rows...
Ingested 800000 rows...
Ingested 900000 rows...
Ingested 1000000 rows...
Ingested 2000000 rows...
Ingested 3000000 rows...
Ingested 4000000 rows...
Ingested 5000000 rows...
Ingested 6000000 rows...
Ingested 7000000 rows...
Ingested 8000000 rows...
Ingested 9000000 rows...
Ingested 10000000 rows...
Ingested 11000000 rows...
Ingested 12000000 rows...
Ingested 12312312 rows in total

@Mohizurkhan Mohizurkhan requested a review from a team as a code owner May 29, 2024 15:42
@joshwong-cs joshwong-cs changed the title feat: Adding in login during ingest feat: Add logging to ingest May 29, 2024
@michalc michalc force-pushed the feat/adding-login-during-ingest branch from b51e6c9 to 2608380 Compare May 30, 2024 08:03
@michalc michalc changed the title feat: Add logging to ingest feat: logging during ingest May 30, 2024
@michalc michalc force-pushed the feat/adding-login-during-ingest branch from 2608380 to 92e9ab7 Compare May 30, 2024 08:05
@michalc michalc force-pushed the feat/adding-login-during-ingest branch from 92e9ab7 to 556a57c Compare May 30, 2024 08:07
In some cases people think pg-bulk-ingest is either doing nothing during
ingest, or loading all the data into memory. Adding in a bit more logging to
make it clearer data is going into the database, including interleaved with
fetching data from source (due to the magic of iterables/generators).

The complex logic on when to log is an attempt to make the logging suitable for
both small and large numbers of rows without per ingest configuration, and
keeping the noise down in the logs. It logs every 100 rows initially, going up
to logging every million rows later in the ingest. For example it could log:

Ingesting from source into the database...
Ingested 100 rows...
Ingested 200 rows...
Ingested 300 rows...
Ingested 400 rows...
Ingested 500 rows...
Ingested 600 rows...
Ingested 700 rows...
Ingested 800 rows...
Ingested 900 rows...
Ingested 1000 rows...
Ingested 2000 rows...
Ingested 3000 rows...
Ingested 4000 rows...
Ingested 5000 rows...
Ingested 6000 rows...
Ingested 7000 rows...
Ingested 8000 rows...
Ingested 9000 rows...
Ingested 10000 rows...
Ingested 20000 rows...
Ingested 30000 rows...
Ingested 40000 rows...
Ingested 50000 rows...
Ingested 60000 rows...
Ingested 70000 rows...
Ingested 80000 rows...
Ingested 90000 rows...
Ingested 100000 rows...
Ingested 200000 rows...
Ingested 300000 rows...
Ingested 400000 rows...
Ingested 500000 rows...
Ingested 600000 rows...
Ingested 700000 rows...
Ingested 800000 rows...
Ingested 900000 rows...
Ingested 1000000 rows...
Ingested 2000000 rows...
Ingested 3000000 rows...
Ingested 4000000 rows...
Ingested 5000000 rows...
Ingested 6000000 rows...
Ingested 7000000 rows...
Ingested 8000000 rows...
Ingested 9000000 rows...
Ingested 10000000 rows...
Ingested 11000000 rows...
Ingested 12000000 rows...
Ingested 12312312 rows in total

Co-authored-by: Tash Boyse <57753415+nboyse@users.noreply.github.com>
Co-authored-by: Michal Charemza <michal@charemza.name>
Co-authored-by: Mohizur Khan <mohizurkhan@digital.trade.gov.uk>
Co-authored-by: Josh Wong <166488409+joshwong-cs@users.noreply.github.com>
@michalc michalc force-pushed the feat/adding-login-during-ingest branch from 556a57c to a927012 Compare May 30, 2024 08:10
@michalc michalc merged commit a8648f6 into main May 30, 2024
120 checks passed
@michalc michalc deleted the feat/adding-login-during-ingest branch May 30, 2024 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants