Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor indexer #25174

Merged
merged 54 commits into from
Jun 23, 2023
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
22777af
fix: mark stats
wolfogre Jun 8, 2023
534945b
feat: bleve issue basic
wolfogre Jun 8, 2023
15d6247
feat: elasticsearch issue
wolfogre Jun 8, 2023
afde928
fix: better composite
wolfogre Jun 8, 2023
94a5736
feat: db issue
wolfogre Jun 8, 2023
242095f
fix: inner
wolfogre Jun 8, 2023
e3190e6
fix: move gitea_bleve
wolfogre Jun 8, 2023
3dbc2e4
feat: code indexer
wolfogre Jun 8, 2023
3a2873a
feat: new IndexerHolder
wolfogre Jun 8, 2023
5748343
fix: remove generic
wolfogre Jun 8, 2023
e609a36
feat: use holder in code indexer
wolfogre Jun 8, 2023
c43676e
fix: be safe with nil indexer
wolfogre Jun 8, 2023
72b1a39
fix: Get never return nil
wolfogre Jun 8, 2023
34d136d
chore: meilisearch
wolfogre Jun 9, 2023
94a6620
fix: spell of Elasticsearch
wolfogre Jun 9, 2023
7e033b9
chore: remove useless code for DB
wolfogre Jun 9, 2023
f7aff40
feat: bleve mapping
wolfogre Jun 9, 2023
b6f20d3
fix: elasticsearch with version
wolfogre Jun 9, 2023
a1ce99b
feat: base32
wolfogre Jun 9, 2023
d062296
fix: esRepoIndexerLatestVersion
wolfogre Jun 9, 2023
4f1d3b7
fix: split issues
wolfogre Jun 9, 2023
a29ecee
fix: split codes
wolfogre Jun 9, 2023
cc6d0aa
chore: issues/internal
wolfogre Jun 9, 2023
a1d48a6
fix: indexer_internal for code
wolfogre Jun 9, 2023
1bc8d3e
fix: meilisearch
wolfogre Jun 9, 2023
ecfbf77
Merge branch 'main' into feature/refactor_indexer
wolfogre Jun 9, 2023
419ea3f
fix: test
wolfogre Jun 9, 2023
afc4337
fix: format code
wolfogre Jun 9, 2023
ff0ab3d
fix: rewrite ping
wolfogre Jun 13, 2023
136fa3f
fix: use context
wolfogre Jun 13, 2023
d2707da
fix: check all old versions index
wolfogre Jun 13, 2023
77d788b
fix: VersionedIndexName
wolfogre Jun 13, 2023
45c26bc
feat: meilisearch index version
wolfogre Jun 13, 2023
aa7a795
test: fix
wolfogre Jun 14, 2023
01f6235
feat: support version 0
wolfogre Jun 14, 2023
847dac1
chore: update comments
wolfogre Jun 14, 2023
fd48e00
fix: give up IndexerHolder
wolfogre Jun 14, 2023
b36dbac
test: fix
wolfogre Jun 14, 2023
425bf87
Merge branch 'main' into feature/refactor_indexer
wolfogre Jun 14, 2023
8882f2e
fix: return unhandled data when index failed
wolfogre Jun 14, 2023
54a0195
docs: update config
wolfogre Jun 14, 2023
c5f4f9b
fix: requeue tasks
wolfogre Jun 14, 2023
9afc27a
Merge branch 'main' into feature/refactor_indexer
wolfogre Jun 14, 2023
15e4518
fix: format of meilisearch index
wolfogre Jun 14, 2023
e30d5c1
feat: warn before rebuild bleve index
wolfogre Jun 15, 2023
11fd0f9
fix: globalIndexer atomic
wolfogre Jun 18, 2023
0fb08ff
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 22, 2023
bf63432
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 22, 2023
5f2f088
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 22, 2023
3ef0f64
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 23, 2023
1c19169
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 23, 2023
c98a801
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 23, 2023
ebf89ad
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 23, 2023
a5a8244
Merge branch 'main' into feature/refactor_indexer
GiteaBot Jun 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions custom/conf/app.example.ini
Original file line number Diff line number Diff line change
Expand Up @@ -1337,10 +1337,10 @@ LEVEL = Info
;; Issue indexer storage path, available when ISSUE_INDEXER_TYPE is bleve
;ISSUE_INDEXER_PATH = indexers/issues.bleve ; Relative paths will be made absolute against _`AppWorkPath`_.
;;
;; Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch or meilisearch
;ISSUE_INDEXER_CONN_STR = http://elastic:changeme@localhost:9200
;; Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch (e.g. http://elastic:password@localhost:9200) or meilisearch (e.g. http://:apikey@localhost:7700)
lunny marked this conversation as resolved.
Show resolved Hide resolved
;ISSUE_INDEXER_CONN_STR =
;;
;; Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch
;; Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch or meilisearch.
;ISSUE_INDEXER_NAME = gitea_issues
;;
;; Timeout the indexer if it takes longer than this to start.
Expand Down
6 changes: 3 additions & 3 deletions docs/content/doc/administration/config-cheat-sheet.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -459,15 +459,15 @@ relation to port exhaustion.
## Indexer (`indexer`)

- `ISSUE_INDEXER_TYPE`: **bleve**: Issue indexer type, currently supported: `bleve`, `db`, `elasticsearch` or `meilisearch`.
- `ISSUE_INDEXER_CONN_STR`: ****: Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch, or meilisearch. i.e. http://elastic:changeme@localhost:9200
- `ISSUE_INDEXER_NAME`: **gitea_issues**: Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch
- `ISSUE_INDEXER_CONN_STR`: ****: Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch (e.g. http://elastic:password@localhost:9200) or meilisearch (e.g. http://:apikey@localhost:7700)
- `ISSUE_INDEXER_NAME`: **gitea_issues**: Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch or meilisearch.
- `ISSUE_INDEXER_PATH`: **indexers/issues.bleve**: Index file used for issue search; available when ISSUE_INDEXER_TYPE is bleve and elasticsearch. Relative paths will be made absolute against _`AppWorkPath`_.

- `REPO_INDEXER_ENABLED`: **false**: Enables code search (uses a lot of disk space, about 6 times more than the repository size).
- `REPO_INDEXER_REPO_TYPES`: **sources,forks,mirrors,templates**: Repo indexer units. The items to index could be `sources`, `forks`, `mirrors`, `templates` or any combination of them separated by a comma. If empty then it defaults to `sources` only, as if you'd like to disable fully please see `REPO_INDEXER_ENABLED`.
- `REPO_INDEXER_TYPE`: **bleve**: Code search engine type, could be `bleve` or `elasticsearch`.
- `REPO_INDEXER_PATH`: **indexers/repos.bleve**: Index file used for code search.
- `REPO_INDEXER_CONN_STR`: ****: Code indexer connection string, available when `REPO_INDEXER_TYPE` is elasticsearch. i.e. http://elastic:changeme@localhost:9200
- `REPO_INDEXER_CONN_STR`: ****: Code indexer connection string, available when `REPO_INDEXER_TYPE` is elasticsearch. i.e. http://elastic:password@localhost:9200
- `REPO_INDEXER_NAME`: **gitea_codes**: Code indexer name, available when `REPO_INDEXER_TYPE` is elasticsearch

- `REPO_INDEXER_INCLUDE`: **empty**: A comma separated list of glob patterns (see https://github.com/gobwas/glob) to **include** in the index. Use `**.txt` to match any files with .txt extension. An empty list means include all files.
Expand Down
2 changes: 1 addition & 1 deletion modules/context/repo.go
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,7 @@ func RepoAssignment(ctx *Context) (cancel context.CancelFunc) {

ctx.Data["RepoSearchEnabled"] = setting.Indexer.RepoIndexerEnabled
if setting.Indexer.RepoIndexerEnabled {
ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable()
ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable(ctx)
}

if ctx.IsSigned {
Expand Down
156 changes: 36 additions & 120 deletions modules/indexer/code/bleve.go → modules/indexer/code/bleve/bleve.go
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
// Copyright 2019 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT

package code
package bleve

import (
"bufio"
"context"
"fmt"
"io"
"os"
"strconv"
"strings"
"time"
Expand All @@ -17,12 +16,13 @@ import (
"code.gitea.io/gitea/modules/analyze"
"code.gitea.io/gitea/modules/charset"
"code.gitea.io/gitea/modules/git"
gitea_bleve "code.gitea.io/gitea/modules/indexer/bleve"
"code.gitea.io/gitea/modules/indexer/code/internal"
indexer_internal "code.gitea.io/gitea/modules/indexer/internal"
inner_bleve "code.gitea.io/gitea/modules/indexer/internal/bleve"
"code.gitea.io/gitea/modules/log"
"code.gitea.io/gitea/modules/setting"
"code.gitea.io/gitea/modules/timeutil"
"code.gitea.io/gitea/modules/typesniffer"
"code.gitea.io/gitea/modules/util"

"github.com/blevesearch/bleve/v2"
analyzer_custom "github.com/blevesearch/bleve/v2/analysis/analyzer/custom"
Expand All @@ -31,10 +31,8 @@ import (
"github.com/blevesearch/bleve/v2/analysis/token/lowercase"
"github.com/blevesearch/bleve/v2/analysis/token/unicodenorm"
"github.com/blevesearch/bleve/v2/analysis/tokenizer/unicode"
"github.com/blevesearch/bleve/v2/index/upsidedown"
"github.com/blevesearch/bleve/v2/mapping"
"github.com/blevesearch/bleve/v2/search/query"
"github.com/ethantkoenig/rupture"
"github.com/go-enry/go-enry/v2"
)

Expand All @@ -59,38 +57,6 @@ func addUnicodeNormalizeTokenFilter(m *mapping.IndexMappingImpl) error {
})
}

// openBleveIndexer open the index at the specified path, checking for metadata
// updates and bleve version updates. If index needs to be created (or
// re-created), returns (nil, nil)
func openBleveIndexer(path string, latestVersion int) (bleve.Index, error) {
_, err := os.Stat(path)
if err != nil && os.IsNotExist(err) {
return nil, nil
} else if err != nil {
return nil, err
}

metadata, err := rupture.ReadIndexMetadata(path)
if err != nil {
return nil, err
}
if metadata.Version < latestVersion {
// the indexer is using a previous version, so we should delete it and
// re-populate
return nil, util.RemoveAll(path)
}

index, err := bleve.Open(path)
if err != nil && err == upsidedown.IncompatibleVersion {
// the indexer was built with a previous version of bleve, so we should
// delete it and re-populate
return nil, util.RemoveAll(path)
} else if err != nil {
return nil, err
}
return index, nil
}

// RepoIndexerData data stored in the repo indexer
type RepoIndexerData struct {
RepoID int64
Expand All @@ -111,8 +77,8 @@ const (
repoIndexerLatestVersion = 6
)

// createBleveIndexer create a bleve repo indexer if one does not already exist
func createBleveIndexer(path string, latestVersion int) (bleve.Index, error) {
// generateBleveIndexMapping generates a bleve index mapping for the repo indexer
func generateBleveIndexMapping() (mapping.IndexMapping, error) {
docMapping := bleve.NewDocumentMapping()
numericFieldMapping := bleve.NewNumericFieldMapping()
numericFieldMapping.IncludeInAll = false
Expand Down Expand Up @@ -147,42 +113,28 @@ func createBleveIndexer(path string, latestVersion int) (bleve.Index, error) {
mapping.AddDocumentMapping(repoIndexerDocType, docMapping)
mapping.AddDocumentMapping("_all", bleve.NewDocumentDisabledMapping())

indexer, err := bleve.New(path, mapping)
if err != nil {
return nil, err
}

if err = rupture.WriteIndexMetadata(path, &rupture.IndexMetadata{
Version: latestVersion,
}); err != nil {
return nil, err
}
return indexer, nil
return mapping, nil
}

var _ Indexer = &BleveIndexer{}
var _ internal.Indexer = &Indexer{}

// BleveIndexer represents a bleve indexer implementation
type BleveIndexer struct {
indexDir string
indexer bleve.Index
// Indexer represents a bleve indexer implementation
type Indexer struct {
inner *inner_bleve.Indexer
indexer_internal.Indexer // do not composite inner_bleve.Indexer directly to avoid exposing too much
}

// NewBleveIndexer creates a new bleve local indexer
func NewBleveIndexer(indexDir string) (*BleveIndexer, bool, error) {
indexer := &BleveIndexer{
indexDir: indexDir,
// NewIndexer creates a new bleve local indexer
func NewIndexer(indexDir string) *Indexer {
inner := inner_bleve.NewIndexer(indexDir, repoIndexerLatestVersion, generateBleveIndexMapping)
return &Indexer{
Indexer: inner,
inner: inner,
}
created, err := indexer.init()
if err != nil {
indexer.Close()
return nil, false, err
}
return indexer, created, err
}

func (b *BleveIndexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserError, batchReader *bufio.Reader, commitSha string,
update fileUpdate, repo *repo_model.Repository, batch *gitea_bleve.FlushingBatch,
func (b *Indexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserError, batchReader *bufio.Reader, commitSha string,
update internal.FileUpdate, repo *repo_model.Repository, batch *inner_bleve.FlushingBatch,
) error {
// Ignore vendored files in code search
if setting.Indexer.ExcludeVendored && analyze.IsVendor(update.Filename) {
Expand Down Expand Up @@ -227,7 +179,7 @@ func (b *BleveIndexer) addUpdate(ctx context.Context, batchWriter git.WriteClose
if _, err = batchReader.Discard(1); err != nil {
return err
}
id := filenameIndexerID(repo.ID, update.Filename)
id := internal.FilenameIndexerID(repo.ID, update.Filename)
return batch.Index(id, &RepoIndexerData{
RepoID: repo.ID,
CommitID: commitSha,
Expand All @@ -237,50 +189,14 @@ func (b *BleveIndexer) addUpdate(ctx context.Context, batchWriter git.WriteClose
})
}

func (b *BleveIndexer) addDelete(filename string, repo *repo_model.Repository, batch *gitea_bleve.FlushingBatch) error {
id := filenameIndexerID(repo.ID, filename)
func (b *Indexer) addDelete(filename string, repo *repo_model.Repository, batch *inner_bleve.FlushingBatch) error {
id := internal.FilenameIndexerID(repo.ID, filename)
return batch.Delete(id)
}

// init init the indexer
func (b *BleveIndexer) init() (bool, error) {
var err error
b.indexer, err = openBleveIndexer(b.indexDir, repoIndexerLatestVersion)
if err != nil {
return false, err
}
if b.indexer != nil {
return false, nil
}

b.indexer, err = createBleveIndexer(b.indexDir, repoIndexerLatestVersion)
if err != nil {
return false, err
}

return true, nil
}

// Close close the indexer
func (b *BleveIndexer) Close() {
log.Debug("Closing repo indexer")
if b.indexer != nil {
err := b.indexer.Close()
if err != nil {
log.Error("Error whilst closing the repository indexer: %v", err)
}
}
log.Info("PID: %d Repository Indexer closed", os.Getpid())
}

// Ping does nothing
func (b *BleveIndexer) Ping() bool {
return true
}

// Index indexes the data
func (b *BleveIndexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *repoChanges) error {
batch := gitea_bleve.NewFlushingBatch(b.indexer, maxBatchSize)
func (b *Indexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *internal.RepoChanges) error {
batch := inner_bleve.NewFlushingBatch(b.inner.Indexer, maxBatchSize)
if len(changes.Updates) > 0 {

// Now because of some insanity with git cat-file not immediately failing if not run in a valid git directory we need to run git rev-parse first!
Expand Down Expand Up @@ -308,14 +224,14 @@ func (b *BleveIndexer) Index(ctx context.Context, repo *repo_model.Repository, s
}

// Delete deletes indexes by ids
func (b *BleveIndexer) Delete(repoID int64) error {
func (b *Indexer) Delete(_ context.Context, repoID int64) error {
query := numericEqualityQuery(repoID, "RepoID")
searchRequest := bleve.NewSearchRequestOptions(query, 2147483647, 0, false)
result, err := b.indexer.Search(searchRequest)
result, err := b.inner.Indexer.Search(searchRequest)
if err != nil {
return err
}
batch := gitea_bleve.NewFlushingBatch(b.indexer, maxBatchSize)
batch := inner_bleve.NewFlushingBatch(b.inner.Indexer, maxBatchSize)
for _, hit := range result.Hits {
if err = batch.Delete(hit.ID); err != nil {
return err
Expand All @@ -326,7 +242,7 @@ func (b *BleveIndexer) Delete(repoID int64) error {

// Search searches for files in the specified repo.
// Returns the matching file-paths
func (b *BleveIndexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*SearchResult, []*SearchResultLanguages, error) {
func (b *Indexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*internal.SearchResult, []*internal.SearchResultLanguages, error) {
var (
indexerQuery query.Query
keywordQuery query.Query
Expand Down Expand Up @@ -379,14 +295,14 @@ func (b *BleveIndexer) Search(ctx context.Context, repoIDs []int64, language, ke
searchRequest.AddFacet("languages", bleve.NewFacetRequest("Language", 10))
}

result, err := b.indexer.SearchInContext(ctx, searchRequest)
result, err := b.inner.Indexer.SearchInContext(ctx, searchRequest)
if err != nil {
return 0, nil, nil, err
}

total := int64(result.Total)

searchResults := make([]*SearchResult, len(result.Hits))
searchResults := make([]*internal.SearchResult, len(result.Hits))
for i, hit := range result.Hits {
startIndex, endIndex := -1, -1
for _, locations := range hit.Locations["Content"] {
Expand All @@ -405,11 +321,11 @@ func (b *BleveIndexer) Search(ctx context.Context, repoIDs []int64, language, ke
if t, err := time.Parse(time.RFC3339, hit.Fields["UpdatedAt"].(string)); err == nil {
updatedUnix = timeutil.TimeStamp(t.Unix())
}
searchResults[i] = &SearchResult{
searchResults[i] = &internal.SearchResult{
RepoID: int64(hit.Fields["RepoID"].(float64)),
StartIndex: startIndex,
EndIndex: endIndex,
Filename: filenameOfIndexerID(hit.ID),
Filename: internal.FilenameOfIndexerID(hit.ID),
Content: hit.Fields["Content"].(string),
CommitID: hit.Fields["CommitID"].(string),
UpdatedUnix: updatedUnix,
Expand All @@ -418,15 +334,15 @@ func (b *BleveIndexer) Search(ctx context.Context, repoIDs []int64, language, ke
}
}

searchResultLanguages := make([]*SearchResultLanguages, 0, 10)
searchResultLanguages := make([]*internal.SearchResultLanguages, 0, 10)
if len(language) > 0 {
// Use separate query to go get all language counts
facetRequest := bleve.NewSearchRequestOptions(facetQuery, 1, 0, false)
facetRequest.Fields = []string{"Content", "RepoID", "Language", "CommitID", "UpdatedAt"}
facetRequest.IncludeLocations = true
facetRequest.AddFacet("languages", bleve.NewFacetRequest("Language", 10))

if result, err = b.indexer.Search(facetRequest); err != nil {
if result, err = b.inner.Indexer.Search(facetRequest); err != nil {
return 0, nil, nil, err
}

Expand All @@ -436,7 +352,7 @@ func (b *BleveIndexer) Search(ctx context.Context, repoIDs []int64, language, ke
if len(term.Term) == 0 {
continue
}
searchResultLanguages = append(searchResultLanguages, &SearchResultLanguages{
searchResultLanguages = append(searchResultLanguages, &internal.SearchResultLanguages{
Language: term.Term,
Color: enry.GetColor(term.Term),
Count: term.Count,
Expand Down
30 changes: 0 additions & 30 deletions modules/indexer/code/bleve_test.go
lunny marked this conversation as resolved.
Show resolved Hide resolved

This file was deleted.

Loading