Backport #24900Fix#24896
If users set different languages by `linguist-language`, the `stats` map
could be: `java: 100, Java: 200`.
Language stats are stored as case-insensitive in database and there is a
unique key.
So, the different language names should be merged to one unique name:
`Java: 300`
Backport #23920 by @ChristopherHX
Remove the misbehaving function and call
Repository.GetFilesChangedBetween instead.
Fixes#23919
---
~~_TODO_ test this~~ `Repository.getFilesChanged` seems to be only used
by Gitea Actions, but a similar function already exists
**Update** I tested this change and the issue is gone.
Co-authored-by: ChristopherHX <christopher.homberger@web.de>
Backport #23606 by @wxiaoguang
Reference:
https://github.com/go-gitea/gitea/issues/22578#issuecomment-1444180053
Credits to @tdesveaux , thank you very much for catching the problem. If
you'd like to open a PR, feel free to replace this one.
Git reports fatal errors for ambiguous arguments:
```
fatal: ambiguous argument 'refs/a...refs/b': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
```
So the `--` separator is necessary in some cases.
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Backport #22451 by @philip-peterson
This PR adds support for reflogs on all repositories. It does this by
adding a global configuration entry.
Implements #14865
Signed-off-by: Philip Peterson <philip.c.peterson@gmail.com>
Co-authored-by: Philip Peterson <philip-peterson@users.noreply.github.com>
Backport #22568
The merge and update branch code was previously a little tangled and had
some very long functions. The functions were not very clear in their
reasoning and there were deficiencies in their logging and at least one
bug in the handling of LFS for update by rebase.
This PR substantially refactors this code and splits things out to into
separate functions. It also attempts to tidy up the calls by wrapping
things in "context"s. There are also attempts to improve logging when
there are errors.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: zeripath <art27@cantab.net>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: delvh <dev.lh@web.de>
Backport #23272
The code for GetFilesChangedBetween uses `git diff --name-only
base..head` to get the names of files changed between base and head
however this forgets that git will escape certain values.
This PR simply switches to use `-z` which has the `NUL` character as the
separator.
Ref https://github.com/go-gitea/gitea/pull/22568#discussion_r1123138096
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: zeripath <art27@cantab.net>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Backport #23157
This arose out of #22451; it seems we are checking using non-global
settings to see if a config value is set, in order to decide whether to
call another global(-indeed) configuration command. This PR changes it
so that both the check and the set are for global configuration.
Co-authored-by: Philip Peterson <philip-peterson@users.noreply.github.com>
Close #23027
`git commit` message option _only_ supports 4 formats (well, only ....):
* `"commit", "-m", msg`
* `"commit", "-m{msg}"` (no space)
* `"commit", "--message", msg`
* `"commit", "--message={msg}"`
The long format with `=` is the best choice, and it's documented in `man
git-commit`:
`-m <msg>, --message=<msg> ...`
ps: I would suggest always use long format option for git command, as
much as possible.
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
!fixup https://github.com/go-gitea/gitea/pull/22177
The only place this function is used so far is in
findReadmeFileInEntries(), so the only visible effect of this oversight
was in an obscure README-related corner: if the README was in a
subfolder and was a symlink that pointed up, as in .github/README.md ->
../docs/old/setup.md, the README would fail to render when FollowLinks()
hit the nil ptree. This makes the ptree non-nil and thus repairs it.
This code was copy-pasted at some point. Revisit it to reunify it.
~~Doing that then encouraged simplifying the types of a couple of
related functions.~~
~~As a follow-up, move two helper functions, `isReadmeFile()` and
`isReadmeFileExtension()`, intimately tied to `findReadmeFile()`, in as
package-private.~~
Signed-off-by: Nick Guenther <nick.guenther@polymtl.ca>
Creating a new buffered reader for every part of the blame can miss
lines, as it will read and buffer bytes that the next buffered reader
will not get.
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
During the refactoring of the git module, I found there were some
strange operations. This PR tries to fix 2 of them
1. The empty argument `--` in repo_attribute.go, which was introduced by
#16773. It seems unnecessary because nothing else would be added later.
2. The complex git service logic in repo/http.go.
* Before: the `hasAccess` only allow `service == "upload-pack" ||
service == "receive-pack"`
* After: unrelated code is removed. No need to call ToTrustedCmdArgs
anymore.
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
This PR follows #21535 (and replace #22592)
## Review without space diff
https://github.com/go-gitea/gitea/pull/22678/files?diff=split&w=1
## Purpose of this PR
1. Make git module command completely safe (risky user inputs won't be
passed as argument option anymore)
2. Avoid low-level mistakes like
https://github.com/go-gitea/gitea/pull/22098#discussion_r1045234918
3. Remove deprecated and dirty `CmdArgCheck` function, hide the `CmdArg`
type
4. Simplify code when using git command
## The main idea of this PR
* Move the `git.CmdArg` to the `internal` package, then no other package
except `git` could use it. Then developers could never do
`AddArguments(git.CmdArg(userInput))` any more.
* Introduce `git.ToTrustedCmdArgs`, it's for user-provided and already
trusted arguments. It's only used in a few cases, for example: use git
arguments from config file, help unit test with some arguments.
* Introduce `AddOptionValues` and `AddOptionFormat`, they make code more
clear and simple:
* Before: `AddArguments("-m").AddDynamicArguments(message)`
* After: `AddOptionValues("-m", message)`
* -
* Before: `AddArguments(git.CmdArg(fmt.Sprintf("--author='%s <%s>'",
sig.Name, sig.Email)))`
* After: `AddOptionFormat("--author='%s <%s>'", sig.Name, sig.Email)`
## FAQ
### Why these changes were not done in #21535 ?
#21535 is mainly a search&replace, it did its best to not change too
much logic.
Making the framework better needs a lot of changes, so this separate PR
is needed as the second step.
### The naming of `AddOptionXxx`
According to git's manual, the `--xxx` part is called `option`.
### How can it guarantee that `internal.CmdArg` won't be not misused?
Go's specification guarantees that. Trying to access other package's
internal package causes compilation error.
And, `golangci-lint` also denies the git/internal package. Only the
`git/command.go` can use it carefully.
### There is still a `ToTrustedCmdArgs`, will it still allow developers
to make mistakes and pass untrusted arguments?
Generally speaking, no. Because when using `ToTrustedCmdArgs`, the code
will be very complex (see the changes for examples). Then developers and
reviewers can know that something might be unreasonable.
### Why there was a `CmdArgCheck` and why it's removed?
At the moment of #21535, to reduce unnecessary changes, `CmdArgCheck`
was introduced as a hacky patch. Now, almost all code could be written
as `cmd := NewCommand(); cmd.AddXxx(...)`, then there is no need for
`CmdArgCheck` anymore.
### Why many codes for `signArg == ""` is deleted?
Because in the old code, `signArg` could never be empty string, it's
either `-S[key-id]` or `--no-gpg-sign`. So the `signArg == ""` is just
dead code.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
- Remove code that isn't being used.
Found this is my stash from a few weeks ago, not sure how I found this
in the first place.
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Provide a new type to make it easier to parse a ref name.
Actually, it's picked up from #21937, to make the origin PR lighter.
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Change all license headers to comply with REUSE specification.
Fix#16132
Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
Although git does expect that author names should be of the form: `NAME
<EMAIL>` some users have been able to create commits with: `<EMAIL>`
Fix#21900
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: Lauris BH <lauris@nix.lv>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Fix#20456
At some point during the 1.17 cycle abbreviated refishs to issue
branches started breaking. This is likely due serious inconsistencies in
our management of refs throughout Gitea - which is a bug needing to be
addressed in a different PR. (Likely more than one)
We should try to use non-abbreviated `fullref`s as much as possible.
That is where a user has inputted a abbreviated `refish` we should add
`refs/heads/` if it is `branch` etc. I know people keep writing and
merging PRs that remove prefixes from stored content but it is just
wrong and it keeps causing problems like this. We should only remove the
prefix at the time of
presentation as the prefix is the only way of knowing umambiguously and
permanently if the `ref` is referring to a `branch`, `tag` or `commit` /
`SHA`. We need to make it so that every ref has the appropriate prefix,
and probably also need to come up with some definitely unambiguous way
of storing `SHA`s if they're used in a `ref` or `refish` field. We must
not store a potentially
ambiguous `refish` as a `ref`. (Especially when referring a `tag` -
there is no reason why users cannot create a `branch` with the same
short name as a `tag` and vice versa and any attempt to prevent this
will fail. You can even create a `branch` and a
`tag` that matches the `SHA` pattern.)
To that end in order to fix this bug, when parsing issue templates check
the provided `Ref` (here a `refish` because almost all users do not know
or understand the subtly), if it does not start with `refs/` add the
`BranchPrefix` to it. This allows people to make their templates refer
to a `tag` but not to a `SHA` directly. (I don't think that is
particularly unreasonable but if people disagree I can make the `refish`
be checked to see if it matches the `SHA` pattern.)
Next we need to handle the issue links that are already written. The
links here are created with `git.RefURL`
Here we see there is a bug introduced in #17551 whereby the provided
`ref` argument can be double-escaped so we remove the incorrect external
escape. (The escape added in #17551 is in the right place -
unfortunately I missed that the calling function was doing the wrong
thing.)
Then within `RefURL()` we check if an unprefixed `ref` (therefore
potentially a `refish`) matches the `SHA` pattern before assuming that
is actually a `commit` - otherwise is assumed to be a `branch`. This
will handle most of the problem cases excepting the very unusual cases
where someone has deliberately written a `branch` to look like a `SHA1`.
But please if something is called a `ref` or interpreted as a `ref` make
it a full-ref before storing or using it. By all means if something is a
`branch` assume the prefix is removed but always add it back in if you
are using it as a `ref`. Stop storing abbreviated `branch` names and
`tag` names - which are `refish` as a `ref`. It will keep on causing
problems like this.
Fix#20456
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Lauris BH <lauris@nix.lv>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
The doctor check `storages` currently only checks the attachment
storage. This PR adds some basic garbage collection functionality for
the other types of storage.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
There was a bug introduced in #21352 due to a change of behaviour caused
by #19280. This causes a panic on running the default doctor checks
because the panic introduced by #19280 assumes that the only way
opts.StdOut and opts.Stderr can be set in RunOpts is deliberately.
Unfortunately, when running a git.Command the provided RunOpts can be
set, therefore if you share a common set of RunOpts these two values can
be set by the previous commands.
This PR stops using common RunOpts for the commands in that doctor check
but secondly stops RunCommand variants from changing the provided
RunOpts.
Signed-off-by: Andrew Thornton <art27@cantab.net>
A lot of our code is repeatedly testing if individual errors are
specific types of Not Exist errors. This is repetitative and unnecesary.
`Unwrap() error` provides a common way of labelling an error as a
NotExist error and we can/should use this.
This PR has chosen to use the common `io/fs` errors e.g.
`fs.ErrNotExist` for our errors. This is in some ways not completely
correct as these are not filesystem errors but it seems like a
reasonable thing to do and would allow us to simplify a lot of our code
to `errors.Is(err, fs.ErrNotExist)` instead of
`package.IsErr...NotExist(err)`
I am open to suggestions to use a different base error - perhaps
`models/db.ErrNotExist` if that would be felt to be better.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: delvh <dev.lh@web.de>
After some discussion, introduce a new slice `brokenArgs` to make
`gitCmd.Run()` return errors if any dynamic argument is invalid.
Co-authored-by: delvh <dev.lh@web.de>
We should only log CheckPath errors if they are not simply due to
context cancellation - and we should add a little more context to the
error message.
Fix#20709
Signed-off-by: Andrew Thornton <art27@cantab.net>
Close#20315 (fix the panic when parsing invalid input), Speed up #20231 (use ls-tree without size field)
Introduce ListEntriesRecursiveFast (ls-tree without size) and ListEntriesRecursiveWithSize (ls-tree with size)
Using `append(args, strings.Fields(arg)...)` is dangerous, it may
generate incorrect results.
For example: `arg1 "the dangerous"` will be splitted to 3 arguments:
`arg1`, `"the`, `dangerous"`. In some cases the incorrect arguments may
lead to security problems.
This fixes#5709 and #17316 by changing the order of listed branches
and tags to show the ones with latest commits atop.
It's achieved with changing underlying "show-ref" git command with
"for-each-ref" as suggested in https://stackoverflow.com/a/5188364
Also, it's passing format string so the output matches "show-ref"
command output.
close#5709close#17316
A testing cleanup.
This pull request replaces `os.MkdirTemp` with `t.TempDir`. We can use the `T.TempDir` function from the `testing` package to create temporary directory. The directory created by `T.TempDir` is automatically removed when the test and all its subtests complete.
This saves us at least 2 lines (error check, and cleanup) on every instance, or in some cases adds cleanup that we forgot.
Reference: https://pkg.go.dev/testing#T.TempDir
```go
func TestFoo(t *testing.T) {
// before
tmpDir, err := os.MkdirTemp("", "")
require.NoError(t, err)
defer os.RemoveAll(tmpDir)
// now
tmpDir := t.TempDir()
}
```
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
When setting.Git.DisablePartialClone is set to false then the web server will add filter support to web http. It does this by using`-c` command arguments but this will not work on gitea serv as the upload-pack and receive-pack commands do not support this.
Instead we move these options into the .gitconfig instead.
Fix#20400
Signed-off-by: Andrew Thornton <art27@cantab.net>
When migrating add several more important sanity checks:
* SHAs must be SHAs
* Refs must be valid Refs
* URLs must be reasonable
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: techknowlogick <matti@mdranta.net>
* Set no-tags in git fetch on compare
In the compare endpoint the git fetch is restricted to a certain branch however,
this does not completely prevent tag acquisition/pollution as git fetch will collect
any tags on that branch.
This causes pollution of the tag namespace and could cause confusion by users.
This PR adds `--no-tags` to the `git fetch` call.
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Update modules/git/repo_compare.go
* Update modules/git/repo_compare.go
Signed-off-by: Andrew Thornton <art27@cantab.net>
The use of `--follow` makes getting these commits very slow on large repositories
as it results in searching the whole commit tree for a blob.
Now as nice as the results of `--follow` are, I am uncertain whether it is really
of sufficient importance to keep around.
Fix#20764
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
* merge `CheckLFSVersion` into `InitFull` (renamed from `InitWithSyncOnce`)
* remove the `Once` during git init, no data-race now
* for doctor sub-commands, `InitFull` should only be called in initialization stage
Co-authored-by: zeripath <art27@cantab.net>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
This enables git.Command's Run to optionally use the given context directly so its deadline will be respected. Otherwise, it falls back to the previous behavior of using the supplied timeout or a default timeout value of 360 seconds.
repo's serviceRPC() calls now use the context's deadline (which is unset/unlimited) instead of the default 6-minute timeout. This means that large repo clones will no longer arbitrarily time out on the upload-pack step, and pushes can take longer than 6 minutes on the receive-pack step.
Fixes#20680
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
* Add latest commit's SHA to content response
- When requesting the contents of a filepath, add the latest commit's
SHA to the requested file.
- Resolves#12840
* Add swagger
* Fix NPE
* Fix tests
* Hook into LastCommitCache
* Move AddLastCommitCache to a common nogogit and gogit file
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Prevent NPE
Co-authored-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
The LastCommitCache code is a little complex and there is unnecessary
duplication between the gogit and nogogit variants.
This PR adds the LastCommitCache as a field to the git.Repository and
pre-creates it in the ReferencesGit helpers etc. There has been some
simplification and unification of the variant code.
Signed-off-by: Andrew Thornton <art27@cantab.net>