Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild-indexes cmd #7753

Closed
wants to merge 17 commits into from
Closed

Conversation

guillep2k
Copy link
Member

@guillep2k guillep2k commented Aug 5, 2019

This PR adds the following command:

gitea [global options] rebuild-indexes {--repositories|--issues|--all}
  • If Gitea is running, the command will make an api call and perform the operation online.
  • If Gitea is not running (e.g. cannot start because the indexes are corrupt, like in Gitea 1.9.3 fails to start #8412), the command will just mark the indexes obsolete and they will be rebuilt by Gitea at startup.

Old, outdated description

So, this is the direction I'm taking to implement the rebuild-indexes cmd mentioned in #7733:

gitea [global options] rebuild-indexes {--repositories|--issues|--all}

Only the --repositories option is currently implemented. It's a little bit convoluted because the indexes are normally updated in background on the web server. I didn't want to cause any kind of concurrency problem, so I've decided for a private api route.

I figured a per-repository request would make sense for very big installations, as the requests will go very quickly for the first setting.Indexer.UpdateQueueLength (20) repos. After that it will have to wait in the queue to advance. All in all, there's a risk for the requests to timeout. I'd like your input on how to handle that.

Please feel free to criticize. If you guys like where I'm going, I'll implement the --issues option.

@codecov-io
Copy link

codecov-io commented Aug 6, 2019

Codecov Report

Merging #7753 into master will decrease coverage by 0.07%.
The diff coverage is 10.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7753      +/-   ##
==========================================
- Coverage   41.79%   41.71%   -0.08%     
==========================================
  Files         497      500       +3     
  Lines       65639    65801     +162     
==========================================
+ Hits        27433    27450      +17     
- Misses      34689    34835     +146     
+ Partials     3517     3516       -1
Impacted Files Coverage Δ
modules/private/rebuild_indexes.go 0% <0%> (ø)
modules/indexer/repo.go 66.66% <0%> (-3.57%) ⬇️
modules/indexer/issues/db.go 54.16% <0%> (-4.93%) ⬇️
modules/indexer/issues/bleve.go 60% <0%> (-1.54%) ⬇️
routers/private/rebuild_indexes.go 0% <0%> (ø)
cmd/rebuild_indexes.go 0% <0%> (ø)
models/repo_indexer.go 64.25% <10.52%> (-3.96%) ⬇️
routers/private/internal.go 23.63% <100%> (+2.88%) ⬆️
modules/indexer/indexer.go 44.73% <100%> (-10.53%) ⬇️
modules/indexer/issues/indexer.go 49.05% <26%> (-9.96%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 628f9da...1c4af80. Read the comment docs.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Aug 6, 2019
@lafriks lafriks added type/enhancement An improvement of existing functionality pr/wip This PR is not ready for review labels Aug 6, 2019
@guillep2k guillep2k changed the title WIP: Rebuild-indexes cmd Rebuild-indexes cmd Aug 8, 2019
@guillep2k
Copy link
Member Author

I've implemented the issues index rebuild that was previously missing. The PR is now complete (not WIP) except requests by the managers, of course.
My thoughts: it's a lot of code for such a "simple" functionality. Perhaps it would have been better to make an offline command (i.e. one that requires that gitea is not running) that just deletes the index files (bevle) or records (db).

@lunny lunny removed the pr/wip This PR is not ready for review label Aug 8, 2019
@silverwind
Copy link
Member

silverwind commented Aug 11, 2019

Edit: Nevermind, it seems to work, I was missing my GITEA_CUSTOM in the command invocation.

@silverwind
Copy link
Member

silverwind commented Aug 11, 2019

So, some feedback:

I get a lot of this while running against a pretty much idle server:

Server too busy; backing off...
Server too busy; backing off...
Server too busy; backing off...

Not sure exactly why's that but on a server hosting like 15 repos, it ran like 10 minutes til completion with a lot of those "too busy" in between which seems quite long. Maybe drop the per-repo thing and delete the index files and force-reinitialize bleve? If possible, it should only take a few seconds.

The fact that his has to run while gitea is running on another thread seems non-intuitive to me. In fact I was caught off guard above when I forgot to set GITEA_CUSTOM for the command invocation, so it did not find the config. How about we move that command to the admin web ui only?

Edit: Also, what's interesting is that when I run the rebuild command a second time, it now completes almost instantly, not sure why that is.

@guillep2k
Copy link
Member Author

So, some feedback:

I get a lot of this while running against a pretty much idle server:

Server too busy; backing off...
Server too busy; backing off...
Server too busy; backing off...

Not sure exactly why's that but on a server hosting like 15 repos, it ran like 10 minutes til completion with a lot of those "too busy" in between which seems quite long. Maybe drop the per-repo thing and delete the index files and force-reinitialize bleve? If possible, it should only take a few seconds.

The fact that his has to run while gitea is running on another thread seems non-intuitive to me. In fact I was caught off guard above when I forgot to set GITEA_CUSTOM for the command invocation, so it did not find the config. How about we move that command to the admin web ui only?

Edit: Also, what's interesting is that when I run the rebuild command a second time, it now completes almost instantly, not sure why that is.

@silverwind Please see my response to Lunny about this option being more future proof.

Perhaps it took different times because the indexes got in fact different content, like with/without the camel case tokenizer? Also, it may seem overkill to do it per repo if you have 15 very big ones, If you have 500 small ones may make more sense.

Anyway, based on you guys' feedback, I think I'll change the strategy:

  • Make the rebuild-indexes cmd global, meaning it will delete all indexes at once and it will require gitea to be down.
  • Postpone the "per-repository" capability for a future version, when it makes more sense.

If you guy agree, I'll drop this PR and start another.

@silverwind
Copy link
Member

require gitea to be down

If it cannot be done at runtime, I guess there's not much sense to have the command in first place, may as well instruct users to stop gitea, manually delete index files and start it back up. Thought my preference would generally be to bump the index version on every change that affects indexing, that way, users will probably never have to manually touch index files.

@guillep2k
Copy link
Member Author

@silverwind I re-engineered the logic to support both offline and online rebuilds. It also uses the version metadata to force the rebuild, so it's much faster (although I've dropped the per-repository rebuild).

I think this version makes more sense an it's more robust.

@sapk sapk mentioned this pull request Sep 5, 2019
5 tasks
@guillep2k guillep2k mentioned this pull request Oct 10, 2019
7 tasks
@stale
Copy link

stale bot commented Dec 10, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions.

@stale stale bot added the issue/stale label Dec 10, 2019
@stale stale bot removed the issue/stale label Dec 25, 2019
@lafriks lafriks added this to the 1.12.0 milestone Jan 2, 2020
@6543
Copy link
Member

6543 commented Feb 25, 2020

can you resolve conflicts pleace :)

@stale
Copy link

stale bot commented Apr 25, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions.

@stale stale bot added the issue/stale label Apr 25, 2020
@6543
Copy link
Member

6543 commented Apr 25, 2020

ping @Stale

@stale stale bot removed the issue/stale label Apr 25, 2020
@guillep2k
Copy link
Member Author

I'm closing this, I don't think it's the correct way of handling it anymore. We could add the drop indexes offline to the doctor some other time.

A command line utility for rebuilding the indexes online doesn't make much sense since the #1 problem we have with indexes is that they become corrupt and prevent Gitea from starting in the first place, rendering the command useless. If an online rebuild is ever needed for other reasons we should add an UI option instead.

@guillep2k guillep2k closed this May 3, 2020
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants