Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enterprise Backup schedule stopped working when server restarted after a period #8

Open
akshay-mahakalkar opened this issue Dec 14, 2022 · 9 comments

Comments

@akshay-mahakalkar
Copy link

akshay-mahakalkar commented Dec 14, 2022

Details

Version: 3.0.42
Enterprise agent 3.0.42

Problem statement
I scheduled a database Full backup every 3 hours and Incremental every hour. It was taking backup properly as expected but for weekend I had to shutdown environment. When the environment was started again the backup got stuck and was unable to process further.
After again restarting the orientdb server, it was showing the old date and time for schedule and not the latest.
image
On checking the records in OBackupLog class
image
old tasks scheduled never got finished.

I am not using studio to schedule backups. I created a backups.json file and updated my schedule and configs there.

{
    "backups": [
        {
            "modes": {"FULL_BACKUP":{"when":"0 0 /3 * * ?"},"INCREMENTAL_BACKUP":{"when":"0 /1 * * * ?"}},
            "retentionDays": 30,
            "dbName": "DemoDB",
            "directory": "/orientdb/backup",
            "uuid": "0053ac60-9de1-48f8-af0f-0266c0d8e501",
            "enabled": true
        }
    ]
}

Please help
@Laa

@tglman
Copy link
Contributor

tglman commented Dec 14, 2022

Hi,

Thanks for reporting this, do you have a thread dump of when the backup got stuck ? this would be really useful to check what was going on.

Regards

@akshay-mahakalkar
Copy link
Author

akshay-mahakalkar commented Dec 14, 2022

Hi @tglman ,

Currently not sure if I have. If you can guide me where to collect, I can upload here.
To add up to my issue,

  1. I tried to change the schedule and also deleted the records from #13:247 to #13:242 it showed new schedule on logs but is still in stuck state. There is no error log in logs for backup operation
2022-12-14 14:41:19:290 INFO  [OVariableParser.resolveVariables] Property not found: distributed [orientechnologies]
2022-12-14 14:41:19:295 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2022-12-14 14:41:19:436 INFO  Scheduled [INCREMENTAL_BACKUP] task : 0053ac60-9de1-48f8-af0f-0266c0d8e501. Next execution will be Wed Dec 14 14:45:00 UTC 2022 [OBackupTask]
2022-12-14 14:41:19:524 INFO  OrientDB Studio available at http://10.244.3.40:2480/studio/index.html [OServer]
2022-12-14 14:41:19:524 INFO  OrientDB Server is active v3.0.42 - Veloce (build 2cabb46c9581572b7f46724864f02d9c688070c5, branch UNKNOWN). [OServer]

OBackupLog class remains the same and task never moves to Finished
image

  1. There is a backup.ibl file created in my previous backup folder
    image

@tglman
Copy link
Contributor

tglman commented Dec 14, 2022

Hi,

OrientDB is just a java process you can use jstack of visualvm or any other similar tools to get a thread dump, this may help: https://examples.javacodegeeks.com/java-thread-dump/

@tglman
Copy link
Contributor

tglman commented Dec 14, 2022

as well if you could get multiple thread dumps especially around the time the backup is supposed do be running, it would be very helpful.

@akshay-mahakalkar
Copy link
Author

akshay-mahakalkar commented Dec 14, 2022

Hi,
Here is the dump file. I collected this via jcmd 9 Thread.print command. Hope this could help.
thread-dump.txt

@tglman
Copy link
Contributor

tglman commented Dec 15, 2022

Hi,

Looking a the thread dump looks like nothing is running, so the backup do not seems is running at all, do you see any error in the server log or similar ? it feel like the backup is failing to start for some reason.

@akshay-mahakalkar
Copy link
Author

Hi @tglman ,
No, there is no error message in the database logs. In the attached dump, I do see some threads related to the timer. Do you think it could be pointing to the same issue?

@tglman
Copy link
Contributor

tglman commented Dec 20, 2022

Hi,

I just noticed that you are running on the 3.0.x series, we do no do hotfix release anymore for that release, have you considered to update to the most recent version 3.2.x ?

Regards

@akshay-mahakalkar
Copy link
Author

Hi @tglman,

Its little difficult to update existing database to 3.2.13 as multiple schema level changes may be required on application side. I had tested to upgrade to latest but it broke the copy of my database.
One option I tried was to clean the OSystem database and start server again. And with this again backup started working. For now we need to rely on 3.0.42 and in future I will consider upgrade.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants