Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database dumps do not work on large databases #59

Open
Brayden opened this issue Dec 29, 2024 · 7 comments · May be fixed by #76
Open

Database dumps do not work on large databases #59

Brayden opened this issue Dec 29, 2024 · 7 comments · May be fixed by #76
Labels
💎 Bounty bug Something isn't working

Comments

@Brayden
Copy link
Member

Brayden commented Dec 29, 2024

Describe the bug
If you try to use any of the database dump endpoints such as SQL, CSV or JSON the data is loaded into memory and then created as a dump file. To support any size database we should investigate enhancements to allow any sized database to be exported. Currently the size limitations are 1GB for Durable Objects with 10GB in the future. Operate under the assumption that we might be attempting to dump a 10GB database into a .sql file.

Another consideration to make is because Durable Objects execute synchronous operations we may need to allow for "breathing intervals". An example might be we allow our export operation to run for 5 seconds, and take 5 seconds off if other requests are in a queue, then it can pick up again. The goal here would be to prevent locking the database for long periods of time.

But then poses the questions:

  1. How do we continue operations that need more than 30 seconds to work?
  2. Where is the data stored as it's being created? (R2, S3, something else)?
  3. How do we deliver that dump information to the user after its completed?

To Reproduce
Steps to reproduce the behavior:

  1. Hit the /export/dump endpoint on a large database
  2. Will eventually fail when the 30 second request response time window closes

Run the following command in Terminal (replace the URL with yours) and if your operation exceeds 30 seconds you should see a failed network response instead of a dump file.

curl --location 'https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump' \
--header 'Authorization: Bearer ABC123' \
--output database_dump.sql

If you can't create a large enough test database feel free to add code in to sleep for 29 seconds before proceeding with the /export/dump functional code and should also see the failure.

Expected behavior
As a user I would expect any and all of the specified data to be dumped out without an error and without partial results. Where it ends up for the user to access if the operation takes more than 30 seconds is up for discussion. Ideally if shorter than 30 seconds it could be returned as our cURL above works today (downloads the file from the response of the origin request), but perhaps after the timeout it continues on uploads it to a destination source to access afterwards?

Proposed Solution:

  1. For backups require an R2 binding
  2. Have a .sql file that gets created in R2 with the filename like dump_20240101-170000.sql where it represents 2024-01-01 17:00:00
  3. Create the file and continuously append new chunks to it until reaching the end
  4. May need to utilize a DO alarm to continue the work after X time if a timeout occurs & mark where it currently is in the process in internal memory so it can pick up and continue.
  5. Provide a callback URL when the operation is finally completed so users can create custom logic to notify them (e.g. Email, Slack, etc)
@Brayden Brayden added the bug Something isn't working label Dec 29, 2024
@Brayden
Copy link
Member Author

Brayden commented Jan 24, 2025

/bounty $250

Copy link

algora-pbc bot commented Jan 24, 2025

💎 $250 bounty • Outerbase (YC W23)

Steps to solve:

  1. Start working: Comment /attempt #59 with your implementation plan
  2. Submit work: Create a pull request including /claim #59 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to outerbase/starbasedb!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @onyedikachi-david Jan 25, 2025, 5:26:58 AM #76
🟢 @BenraouaneSoufiane Jan 27, 2025, 3:54:09 PM WIP

@b4s36t4
Copy link

b4s36t4 commented Jan 24, 2025

Hi, @Brayden. instead of using DO alarm API we can use queues if the database is size greater than certain size (say 200 or 500)?

My suggestion is to attach the cloudflare queues to the new private endpoints where the consumer and producer are a single worker which will export the data into R2 for certain time say (20 to 30sec) depends on usage limits, this way we can have the queues with the capability of the last record of the export queue and there's blockage of any request/response.

https://developers.cloudflare.com/durable-objects/api/sql-storage/#databasesize we can depend on this property of DO to trigger the queue or something like.

@Brayden
Copy link
Member Author

Brayden commented Jan 24, 2025

Hey @b4s36t4!

Instead of using the DO alarm API, I think queues is also an acceptable approach. Just to talk through it a bit more so I fully understand how you're thinking of approaching it. A couple of questions:

  • When you mention attaching to new private endpoints, would this be a new Worker altogether? Could it instead be a Starbase plugin perhaps? Here's a blog outlining how they work and some examples are also included in the repo (https://www.outerbase.com/blog/extensibility-with-plugins/)
  • When you say we can base it off if the database is size greater than some defined number (e.g. 200 or 500) would we fetch the database size at some interval to see if the difference in size from the last export meets the criteria and auto-run the backup?
  • Could we allow for users to define say like "backup daily" or "backup weekly" and trigger at that time?

Thanks for looking into this!

@onyedikachi-david
Copy link

onyedikachi-david commented Jan 25, 2025

/attempt #59

Algora profile Completed bounties Tech Active attempts Options
@onyedikachi-david 14 bounties from 7 projects
TypeScript, Python,
JavaScript & more
Cancel attempt

@onyedikachi-david onyedikachi-david linked a pull request Jan 25, 2025 that will close this issue
10 tasks
Copy link

algora-pbc bot commented Jan 25, 2025

💡 @onyedikachi-david submitted a pull request that claims the bounty. You can visit your bounty board to reward.

@BenraouaneSoufiane
Copy link

BenraouaneSoufiane commented Jan 27, 2025

/attempt #59 may I ask you for cloudflare paid account because I don't have one (it may resolved without cloudflare account following your guidances)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💎 Bounty bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants