Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues indexer document mapping #25619

Merged
merged 7 commits into from
Jul 4, 2023
Merged

Fix issues indexer document mapping #25619

merged 7 commits into from
Jul 4, 2023

Conversation

wolfogre
Copy link
Member

@wolfogre wolfogre commented Jul 1, 2023

Fix regression of #5363 (so long ago).

The old code definded a document mapping for issueIndexerDocType, and assigned it to BleveIndexerData as its type. (BleveIndexerData has been renamed to IndexerData in #25174, but nothing more.) But the old code never used BleveIndexerData, it wrote the index with an anonymous struct type. Nonetheless, bleve would use the default auto-mapping for struct it didn't know, so the indexer still worked. This means the custom document mapping was always dead code.

The custom document mapping is not useless, it can reduce index storage, this PR brings it back and disable default mapping to prevent it from happening again. Since IndexerData(BleveIndexerData) has JSON tags, and bleve uses them first, so we should use repo_id as the field name instead of RepoID.

I did a test to compare the storage size before and after this, with about 3k real comments that were migrated from some public repos.

Before:

[ 160]  .
├── [  42]  index_meta.json
├── [  13]  rupture_meta.json
└── [ 128]  store
    ├── [6.9M]  00000000005d.zap
    └── [256K]  root.bolt

After:

[ 160]  .
├── [  42]  index_meta.json
├── [  13]  rupture_meta.json
└── [ 128]  store
    ├── [3.5M]  000000000065.zap
    └── [256K]  root.bolt

It saves about half the storage space.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jul 1, 2023
@pull-request-size pull-request-size bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 1, 2023
@wolfogre wolfogre added type/bug issue/regression Indicates a previously functioning feature or behavior that has broken or regressed after a change backport/v1.20 This PR should be backported to Gitea 1.20 labels Jul 1, 2023
@wolfogre wolfogre added this to the 1.21.0 milestone Jul 1, 2023
@pull-request-size pull-request-size bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 1, 2023
@KN4CK3R
Copy link
Member

KN4CK3R commented Jul 1, 2023

Does this migrate the existing index or does it reindex everything?

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jul 1, 2023
@wolfogre
Copy link
Member Author

wolfogre commented Jul 3, 2023

Does this migrate the existing index or does it reindex everything?

It will reindex everything. We have no way to migrate indexes, either bleve or es.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jul 4, 2023
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Jul 4, 2023
@silverwind silverwind enabled auto-merge (squash) July 4, 2023 08:46
@silverwind silverwind merged commit 9958642 into go-gitea:main Jul 4, 2023
@GiteaBot
Copy link
Contributor

GiteaBot commented Jul 4, 2023

I was unable to create a backport for 1.20. @wolfogre, please send one manually. 🍵

go run ./contrib/backport 25619
...  // fix git conflicts if any
go run ./contrib/backport --continue

@GiteaBot GiteaBot added backport/manual No power to the bots! Create your backport yourself! and removed reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. labels Jul 4, 2023
zjjhot added a commit to zjjhot/gitea that referenced this pull request Jul 5, 2023
* giteaofficial/main: (22 commits)
  [skip ci] Updated translations via Crowdin
  Replace `interface{}` with `any` (go-gitea#25686)
  Several fixes for mobile UI (go-gitea#25634)
  Add elapsed time on debug for slow git commands (go-gitea#25642)
  some less naked returns (go-gitea#25682)
  Prevent duplicate image loading (go-gitea#25675)
  Add unit test for `HashAvatar` (go-gitea#25662)
  Fix the nil pointer when assigning issues to projects (go-gitea#25665)
  Actions list enhancements (go-gitea#25601)
  Fix issues indexer document mapping (go-gitea#25619)
  Fix show more for image on diff page (go-gitea#25672)
  Prevent SVG shrinking (go-gitea#25652)
  Log the real reason when authentication fails (but don't show the user) (go-gitea#25414)
  Add unit test for repository collaboration (go-gitea#25640)
  Fix UI misalignment on user setting page (go-gitea#25629)
  [skip ci] Updated translations via Crowdin
  Correct translation wrong format (go-gitea#25643)
  Add direct serving of package content (go-gitea#25543)
  Fix bug when change user name (go-gitea#25637)
  Make "cancel" buttons have proper type in modal forms (go-gitea#25618)
  ...
@wolfogre
Copy link
Member Author

wolfogre commented Jul 5, 2023

I was unable to create a backport for 1.20. @wolfogre, please send one manually. 🍵

go run ./contrib/backport 25619
...  // fix git conflicts if any
go run ./contrib/backport --continue

Since this PR is based on #25174 which is a refactor and hasn't been backported to v1.20, so it's difficult to backport this one. So I think it's OK to leave the old code in v1.20.

@wolfogre wolfogre removed backport/manual No power to the bots! Create your backport yourself! backport/v1.20 This PR should be backported to Gitea 1.20 labels Jul 5, 2023
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Oct 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/regression Indicates a previously functioning feature or behavior that has broken or regressed after a change lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants