gitea

Commit Graph

Author	SHA1	Message	Date
Jason Song	9958642502	Fix issues indexer document mapping (#25619 ) Fix regression of #5363 (so long ago). The old code definded a document mapping for `issueIndexerDocType`, and assigned it to `BleveIndexerData` as its type. (`BleveIndexerData` has been renamed to `IndexerData` in #25174, but nothing more.) But the old code never used `BleveIndexerData`, it wrote the index with an anonymous struct type. Nonetheless, bleve would use the default auto-mapping for struct it didn't know, so the indexer still worked. This means the custom document mapping was always dead code. The custom document mapping is not useless, it can reduce index storage, this PR brings it back and disable default mapping to prevent it from happening again. Since `IndexerData`(`BleveIndexerData`) has JSON tags, and bleve uses them first, so we should use `repo_id` as the field name instead of `RepoID`. I did a test to compare the storage size before and after this, with about 3k real comments that were migrated from some public repos. Before: ```text [ 160] . ├── [ 42] index_meta.json ├── [ 13] rupture_meta.json └── [ 128] store ├── [6.9M] 00000000005d.zap └── [256K] root.bolt ``` After: ```text [ 160] . ├── [ 42] index_meta.json ├── [ 13] rupture_meta.json └── [ 128] store ├── [3.5M] 000000000065.zap └── [256K] root.bolt ``` It saves about half the storage space. --------- Co-authored-by: Giteabot <teabot@gitea.io>	1 year ago
Jason Song	375fd15fbf	Refactor indexer (#25174 ) Refactor `modules/indexer` to make it more maintainable. And it can be easier to support more features. I'm trying to solve some of issue searching, this is a precursor to making functional changes. Current supported engines and the index versions: \| engines \| issues \| code \| \| - \| - \| - \| \| db \| Just a wrapper for database queries, doesn't need version \| - \| \| bleve \| The version of index is 2 \| The version of index is 6 \| \| elasticsearch \| The old index has no version, will be treated as version 0 in this PR \| The version of index is 1 \| \| meilisearch \| The old index has no version, will be treated as version 0 in this PR \| - \| ## Changes ### Split Splited it into mutiple packages ```text indexer ├── internal │ ├── bleve │ ├── db │ ├── elasticsearch │ └── meilisearch ├── code │ ├── bleve │ ├── elasticsearch │ └── internal └── issues ├── bleve ├── db ├── elasticsearch ├── internal └── meilisearch ``` - `indexer/interanal`: Internal shared package for indexer. - `indexer/interanal/[engine]`: Internal shared package for each engine (bleve/db/elasticsearch/meilisearch). - `indexer/code`: Implementations for code indexer. - `indexer/code/internal`: Internal shared package for code indexer. - `indexer/code/[engine]`: Implementation via each engine for code indexer. - `indexer/issues`: Implementations for issues indexer. ### Deduplication - Combine `Init/Ping/Close` for code indexer and issues indexer. - ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to `internal.IndexHolder`.~ Remove it, use dummy indexer instead when the indexer is not ready. - Duplicate two copies of creating ES clients. - Duplicate two copies of `indexerID()`. ### Enhancement - [x] Support index version for elasticsearch issues indexer, the old index without version will be treated as version 0. - [x] Fix spell of `elastic_search/ElasticSearch`, it should be `Elasticsearch`. - [x] Improve versioning of ES index. We don't need `Aliases`: - Gitea does't need aliases for "Zero Downtime" because it never delete old indexes. - The old code of issues indexer uses the orignal name to create issue index, so it's tricky to convert it to an alias. - [x] Support index version for meilisearch issues indexer, the old index without version will be treated as version 0. - [x] Do "ping" only when `Ping` has been called, don't ping periodically and cache the status. - [x] Support the context parameter whenever possible. - [x] Fix outdated example config. - [x] Give up the requeue logic of issues indexer: When indexing fails, call Ping to check if it was caused by the engine being unavailable, and only requeue the task if the engine is unavailable. - It is fragile and tricky, could cause data losing (It did happen when I was doing some tests for this PR). And it works for ES only. - Just always requeue the failed task, if it caused by bad data, it's a bug of Gitea which should be fixed. --------- Co-authored-by: Giteabot <teabot@gitea.io>	1 year ago
wxiaoguang	6f9c278559	Rewrite queue (#24505 ) # ⚠️ Breaking Many deprecated queue config options are removed (actually, they should have been removed in 1.18/1.19). If you see the fatal message when starting Gitea: "Please update your app.ini to remove deprecated config options", please follow the error messages to remove these options from your app.ini. Example: ``` 2023/05/06 19:39:22 [E] Removed queue option: `[indexer].ISSUE_INDEXER_QUEUE_TYPE`. Use new options in `[queue.issue_indexer]` 2023/05/06 19:39:22 [E] Removed queue option: `[indexer].UPDATE_BUFFER_LEN`. Use new options in `[queue.issue_indexer]` 2023/05/06 19:39:22 [F] Please update your app.ini to remove deprecated config options ``` Many options in `[queue]` are are dropped, including: `WRAP_IF_NECESSARY`, `MAX_ATTEMPTS`, `TIMEOUT`, `WORKERS`, `BLOCK_TIMEOUT`, `BOOST_TIMEOUT`, `BOOST_WORKERS`, they can be removed from app.ini. # The problem The old queue package has some legacy problems: * complexity: I doubt few people could tell how it works. * maintainability: Too many channels and mutex/cond are mixed together, too many different structs/interfaces depends each other. * stability: due to the complexity & maintainability, sometimes there are strange bugs and difficult to debug, and some code doesn't have test (indeed some code is difficult to test because a lot of things are mixed together). * general applicability: although it is called "queue", its behavior is not a well-known queue. * scalability: it doesn't seem easy to make it work with a cluster without breaking its behaviors. It came from some very old code to "avoid breaking", however, its technical debt is too heavy now. It's a good time to introduce a better "queue" package. # The new queue package It keeps using old config and concept as much as possible. * It only contains two major kinds of concepts: * The "base queue": channel, levelqueue, redis * They have the same abstraction, the same interface, and they are tested by the same testing code. * The "WokerPoolQueue", it uses the "base queue" to provide "worker pool" function, calls the "handler" to process the data in the base queue. * The new code doesn't do "PushBack" * Think about a queue with many workers, the "PushBack" can't guarantee the order for re-queued unhandled items, so in new code it just does "normal push" * The new code doesn't do "pause/resume" * The "pause/resume" was designed to handle some handler's failure: eg: document indexer (elasticsearch) is down * If a queue is paused for long time, either the producers blocks or the new items are dropped. * The new code doesn't do such "pause/resume" trick, it's not a common queue's behavior and it doesn't help much. * If there are unhandled items, the "push" function just blocks for a few seconds and then re-queue them and retry. * The new code doesn't do "worker booster" * Gitea's queue's handlers are light functions, the cost is only the go-routine, so it doesn't make sense to "boost" them. * The new code only use "max worker number" to limit the concurrent workers. * The new "Push" never blocks forever * Instead of creating more and more blocking goroutines, return an error is more friendly to the server and to the end user. There are more details in code comments: eg: the "Flush" problem, the strange "code.index" hanging problem, the "immediate" queue problem. Almost ready for review. TODO: * [x] add some necessary comments during review * [x] add some more tests if necessary * [x] update documents and config options * [x] test max worker / active worker * [x] re-run the CI tasks to see whether any test is flaky * [x] improve the `handleOldLengthConfiguration` to provide more friendly messages * [x] fine tune default config values (eg: length?) ## Code coverage: ![image](https://user-images.githubusercontent.com/2114189/236620635-55576955-f95d-4810-b12f-879026a3afdf.png)	2 years ago
sillyguodong	34399cfd7a	Make issue and code search support camel case (#22829 ) Fixes #22714 ### Changes: 1. Add a token filter which named "camelCase" between custom unicode token filter and "to_lower" token filter when add custom analyzer. ### Notice: If users want this feature to work, they should delete folder under {giteaPath}/data/indexers and restart application. Then application will create a new IndexMapping. ### Screenshots: ![image](https://user-images.githubusercontent.com/33891828/217715692-c18c41f2-57a1-4727-861c-470935c8e0c8.png) ### Others: I originally attempted to give users the ability to configure the "token_filters" in the "app.ini" file. But I found that if users does not strictly follow a right order to register "token_filters", they won't get the expected results. I think it is difficult to ask users to do this. So I finally give up this idea. --------- Co-authored-by: Jason Song <i@wolfogre.com> Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>	2 years ago
flynnnnnnnnnn	e81ccc406b	Implement FSFE REUSE for golang files (#21840 ) Change all license headers to comply with REUSE specification. Fix #16132 Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github> Co-authored-by: John Olheiser <john.olheiser@gmail.com>	2 years ago
delvh	0ebb45cfe7	Replace all instances of fmt.Errorf(%v) with fmt.Errorf(%w) (#21551 ) Found using `find . -type f -name '.go' -print -exec vim {} -c ':%s/fmt\.Errorf(\(.\)%v\(.*\)err/fmt.Errorf(\1%w\2err/g' -c ':wq' \;` Co-authored-by: 6543 <6543@obermui.de> Co-authored-by: Andrew Thornton <art27@cantab.net> Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>	2 years ago
Lauris BH	8038610a42	Automatically pause queue if index service is unavailable (#15066 ) * Handle keyword search error when issue indexer service is not available * Implement automatic disabling and resume of code indexer queue	3 years ago
6543	54e9ee37a7	format with gofumpt (#18184 ) * gofumpt -w -l . * gofumpt -w -l -extra . * Add linter * manual fix * change make fmt	3 years ago
Lunny Xiao	43262226db	Fix data race in bleve indexer (#16474 ) * Fix data race in bleve indexer	3 years ago
Lauris BH	f5abe2f563	Upgrade blevesearch dependency to v2.0.1 (#14346 ) * Upgrade blevesearch dependency to v2.0.1 * Update rupture to v1.0.0 * Fix test	4 years ago
Lauris BH	0a3c3357f3	Sort issue search results by revelance (#14353 )	4 years ago
zeripath	74bd9691c6	Re-attempt to delete temporary upload if the file is locked by another process (#12447 ) Replace all calls to os.Remove/os.RemoveAll by retrying util.Remove/util.RemoveAll and remove circular dependencies from util. Fix #12339 Signed-off-by: Andrew Thornton <art27@cantab.net> Co-authored-by: silverwind <me@silverwind.io>	4 years ago
Lunny Xiao	5dbf36f356	Issue search support elasticsearch (#9428 ) * Issue search support elasticsearch * Fix lint * Add indexer name on app.ini * add a warnning on SearchIssuesByKeyword * improve code	5 years ago
zeripath	55cd33e124	Stop various tests from adding to the source tree (#9515 ) Instead of just adding test generated files to .gitignore prevent them from being produced in the first place. Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>	5 years ago
Brad Albright	887a8fe242	Allow cross-repository dependencies on issues (#7901 ) * in progress changes for #7405, added ability to add cross-repo dependencies * removed unused repolink var * fixed query that was breaking ci tests; fixed check in issue dependency add so that the id of the issue and dependency is checked rather than the indexes * reverted removal of string in local files becasue these are done via crowdin, not updated manually * removed 'Select("issue.")' from getBlockedByDependencies and getBlockingDependencies based on comments in PR review changed getBlockedByDependencies and getBlockingDependencies to use a more xorm-like query, also updated the sidebar as a result * simplified the getBlockingDependencies and getBlockedByDependencies methods; changed the sidebar to show the dependencies in a different format where you can see the name of the repository * made some changes to the issue view in the dependencies (issue name on top, repo full name on separate line). Change view of issue in the dependency search results (also showing the full repo name on separate line) * replace call to FindUserAccessibleRepoIDs with SearchRepositoryByName. The former was hardcoded to use isPrivate = false on the repo search, but this code needed it to be true. The SearchRepositoryByName method is used more in the code including on the user's dashboard * some more tweaks to the layout of the issues when showing dependencies and in the search box when you add new dependencies * added Name to the RepositoryMeta struct * updated swagger doc * fixed total count for link header on SearchIssues * fixed indentation * fixed aligment of remove icon on dependencies in issue sidebar * removed unnecessary nil check (unnecessary because issue.loadRepo is called prior to this block) * reverting .css change, somehow missed or forgot that less is used * updated less file and generated css; updated sidebar template with styles to line up delete and issue index * added ordering to the blocked by/depends on queries * fixed sorting in issue dependency search and the depends on/blocks views to show issues from the current repo first, then by created date descending; added a "all cross repository dependencies" setting to allow this feature to be turned off, if turned off, the issue dependency search will work the way it did before (restricted to the current repository) * re-applied my swagger changes after merge * fixed split string condition in issue search * changed ALLOW_CROSS_REPOSITORY_DEPENDENCIES description to sound more global than just the issue dependency search; returning 400 in the cross repo issue search api method if not enabled; fixed bug where the issue count did not respect the state parameter * when adding a dependency to an issue, added a check to make sure the issue and dependency are in the same repo if cross repo dependencies is not enabled * updated sortIssuesSession call in PullRequests, another commit moved this method from pull.go to pull_list.go so I had to re-apply my change here * fixed incorrect setting of user id parameter in search repos call	5 years ago
Lunny Xiao	830ae61456	Refactor issue indexer (#5363 )	6 years ago

2 Commits (00dbba7f4266032a2b91b760e7c611950ffad096)