Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new [lfs_client].BATCH_SIZE and [server].LFS_MAX_BATCH_SIZE config settings. #32307

Merged
merged 3 commits into from
Oct 30, 2024

Conversation

rremer
Copy link
Contributor

@rremer rremer commented Oct 21, 2024

This contains two backwards-compatible changes:
* in the lfs http_client, the number of lfs oids requested per batch is loaded from lfs_client#BATCH_SIZE and defaulted to the previous value of 20
* in the lfs server/service, the max number of lfs oids allowed in a batch api request is loaded from server#LFS_MAX_BATCH_SIZE and defaults to 'nil' which equates to the previous behavior of 'infinite'

This fixes #32306

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Oct 21, 2024
@pull-request-size pull-request-size bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 21, 2024
@github-actions github-actions bot added modifies/go Pull requests that update Go code docs-update-needed The document needs to be updated synchronously labels Oct 21, 2024
@rremer
Copy link
Contributor Author

rremer commented Oct 21, 2024

fyi I ignored two failures under test-backend from modules/zstd/zstd_test.go as they are also failing on main and appear unrelated. My internal branch which is behind main by a few weeks has all tests passing.

Also, if there is a separate docs site I should be updating with the new gitea config, I would appreciate a pointer to where that lives.

@lunny
Copy link
Member

lunny commented Oct 22, 2024

You can send doc pull request here https://gitea.com/gitea/docs

@rremer
Copy link
Contributor Author

rremer commented Oct 22, 2024

related doc pr: https://gitea.com/gitea/docs/pulls/83

@lunny
Copy link
Member

lunny commented Oct 22, 2024

And maybe we should split lfs server configurations and lfs client configurations.

@rremer
Copy link
Contributor Author

rremer commented Oct 22, 2024

Fair enough @lunny , although it looks like the built-in lfs server doesn't have any constraint on batch size today: https://github.com/go-gitea/gitea/blob/main/services/lfs/server.go#L186

If you're in agreement, the way I'd go about this is:

  1. change my proposed LFS_BATCH_SIZE to LFS_CLIENT_BATCH_SIZE
  2. add a new LFS_SERVER_MAX_BATCH_SIZE config, and consume it early-on in the BatchRequest parsing of services/lfs/server.go, returning HTTP 413 Content Too Large if the slice length was greater than max batch size git lfs reference doc at the bottom

If that sounds good, I'll update this branch with all that.

@lunny
Copy link
Member

lunny commented Oct 22, 2024

Fair enough @lunny , although it looks like the built-in lfs server doesn't have any constraint on batch size today: main/services/lfs/server.go#L186

If you're in agreement, the way I'd go about this is:

1. change my proposed `LFS_BATCH_SIZE` to `LFS_CLIENT_BATCH_SIZE`

2. add a new `LFS_SERVER_MAX_BATCH_SIZE` config, and consume it early-on in the BatchRequest parsing of services/lfs/server.go, returning HTTP 413 Content Too Large if the slice length was greater than max batch size [git lfs reference doc at the bottom](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md)

If that sounds good, I'll update this branch with all that.

I prefer to put the new configuration under a new section [lfs_client].

@rremer
Copy link
Contributor Author

rremer commented Oct 23, 2024

That's a bit confusing to me. All the other configs for lfs are under [server] today, but just to represent the options on the table here (please correct me if I misunderstood your suggestion):

[server]
LFS_START_SERVER = true
LFS_MAX_BATCH_SIZE = X

[lfs_client]
LFS_BATCH_SIZE = 20

And what I was proposing:

[server]
LFS_START_SERVER = true
LFS_CLIENT_BATCH_SIZE = 20
LFS_SERVER_MAX_BATCH_SIZE = X

We could make the argument for a config refactoring though, where we have [server.lfs] and [server.lfs_client], but to maintain backwards-compatibility that would add some cruft to modules/setting/lfs.go.

@lunny
Copy link
Member

lunny commented Oct 23, 2024

That's a bit confusing to me. All the other configs for lfs are under [server] today, but just to represent the options on the table here (please correct me if I misunderstood your suggestion):

[server]
LFS_START_SERVER = true
LFS_MAX_BATCH_SIZE = X

[lfs_client]
LFS_BATCH_SIZE = 20

And what I was proposing:

[server]
LFS_START_SERVER = true
LFS_CLIENT_BATCH_SIZE = 20
LFS_SERVER_MAX_BATCH_SIZE = X

We could make the argument for a config refactoring though, where we have [server.lfs] and [server.lfs_client], but to maintain backwards-compatibility that would add some cruft to modules/setting/lfs.go.

Put all lfs configuration items under [server] is a legacy problem, I think it should be moved to [lfs] or [server.lfs] some day. So creating a new section for lfs client is better.

@rremer
Copy link
Contributor Author

rremer commented Oct 23, 2024

I thought that's what you were getting at, okay so what if we put new configs for lfs under a combined section, like:

[server]
LFS_START_SERVER = true

[lfs.server]
LFS_SERVER_MAX_BATCH_SIZE = X

[lfs.client]
LFS_CLIENT_BATCH_SIZE = 20

That would keep this pr simple without adding more config debt. In a separate PR I could extend the new lfs config class to handle old/new config sections and add warnings for their potential future deprecations.

@rremer rremer force-pushed the configurable-lfs-batch-size branch from 129b5d2 to 1fef084 Compare October 23, 2024 21:49
@pull-request-size pull-request-size bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 23, 2024
@github-actions github-actions bot removed the docs-update-needed The document needs to be updated synchronously label Oct 23, 2024
@rremer rremer force-pushed the configurable-lfs-batch-size branch from 1fef084 to f695949 Compare October 23, 2024 21:51
@rremer rremer changed the title Add new LFS_BATCH_SIZE configuration setting. Add new [lfs.server] and [lfs.client] config settings for batch sizes Oct 23, 2024
@rremer rremer force-pushed the configurable-lfs-batch-size branch from f695949 to d314dff Compare October 23, 2024 21:52
@rremer
Copy link
Contributor Author

rremer commented Oct 23, 2024

Okay, I went with this (very similar to what we agreed upon above, but removing the duplicate LFS_ prefix):

[server]
LFS_START_SERVER = true

[lfs.server]
MAX_BATCH_SIZE = X

[lfs.client]
BATCH_SIZE = 20

I added a test for these defaults, and opened up #32332 to track the refactor/migration of the lfs configs under [server] to [lfs.server].

EDIT: apologies for the all the force-pushes, found a spelling error in a comment and forgot to actually implement the server-side response when MAX_BATCH_SIZE was breached. This PR is ready for review again.

@rremer rremer force-pushed the configurable-lfs-batch-size branch from d314dff to b6afdc2 Compare October 23, 2024 22:04
modules/setting/lfs.go Outdated Show resolved Hide resolved
@rremer rremer force-pushed the configurable-lfs-batch-size branch from b6afdc2 to d01a505 Compare October 23, 2024 22:27
@@ -179,6 +179,12 @@ func BatchHandler(ctx *context.Context) {
return
}

batchSize := len(br.Objects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFSClient.BatchSize hasn't been used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's used the line below actually batchSize > setting.LFSServer.MaxBatchSize, but you're right that it's only used once. I was considering whether to put a debug line to print the batch size the client requested. Would you prefer:

  • leave as-is
  • add such a debug line
  • remove this var and just get the length in-line for that same check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the lfs.client and the configuration items if no place use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I think I probably don't want a log line, so I've moved this inline. We can add it back if we want logging later.

@rremer rremer force-pushed the configurable-lfs-batch-size branch 2 times, most recently from b47a030 to 9de1d58 Compare October 24, 2024 20:58
@lunny lunny added the type/enhancement An improvement of existing functionality label Oct 24, 2024
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Oct 24, 2024
@lunny lunny added this to the 1.23.0 milestone Oct 24, 2024
@wxiaoguang
Copy link
Contributor

The config system becomes more messy:

[server].LFS_MAX_FILE_SIZE
[lfs].STORAGE_TYPE
[lfs.client].BATCH_SIZE
[lfs.server].MAX_BATCH_SIZE

Would they cause problems for maintainers and site admin?

@rremer
Copy link
Contributor Author

rremer commented Oct 25, 2024

@wxiaoguang would you prefer:

[server].LFS_MAX_FILE_SIZE
[lfs].STORAGE_TYPE
[lfs].CLIENT_BATCH_SIZE
[lfs].SERVER_MAX_BATCH_SIZE

Or something else? I'll let you and @lunny hash it out, although I think we're all in agreement that we don't want more lfs-specific configs under [server]?

@wxiaoguang
Copy link
Contributor

I have no clear idea at the moment and I don't mean against it. Just a question.

@lunny
Copy link
Member

lunny commented Oct 25, 2024

@wxiaoguang would you prefer:

[server].LFS_MAX_FILE_SIZE
[lfs].STORAGE_TYPE
[lfs].CLIENT_BATCH_SIZE
[lfs].SERVER_MAX_BATCH_SIZE

Or something else? I'll let you and @lunny hash it out, although I think we're all in agreement that we don't want more lfs-specific configs under [server]?

Maybe we [lfs] means [lfs.server] so we don't need [lfs.server]. Ideally, all configurations related to lfs server should be under [lfs] and all configurations related to lfs client should be under [lfs.client].

@wxiaoguang
Copy link
Contributor

Maybe we [lfs] means [lfs.server] so we don't need [lfs.server]. Ideally, all configurations related to lfs server should be under [lfs] and all configurations related to lfs client should be under [lfs.client].

NO. [lfs] means LFS Storage, a mess. That's why I think new config options should be carefully designed. (again, just an opinion, no blocker)

@lunny
Copy link
Member

lunny commented Oct 25, 2024

Maybe we [lfs] means [lfs.server] so we don't need [lfs.server]. Ideally, all configurations related to lfs server should be under [lfs] and all configurations related to lfs client should be under [lfs.client].

NO. [lfs] means LFS Storage, a mess. That's why I think new config options should be carefully designed. (again, just an opinion, no blocker)

But [lfs] can also have other options rather than storage configurations.

@rremer
Copy link
Contributor Author

rremer commented Oct 25, 2024

I've tried to summarize two of the suggested organizations of this config in #32332 just so we can visualize what they look like.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Oct 26, 2024

NO. [lfs] means LFS Storage, a mess. That's why I think new config options should be carefully designed. (again, just an opinion, no blocker)

But [lfs] can also have other options rather than storage configurations.

No. Because the lfs section was never used like that, it was only used for "storage". Without a careful design there would be new regressions/bugs and more messy (there have been). The lfs section is a heavy technical debit in history, it should not be used for any other new purpose.

See below Details how the lfs section is used:

image

I've tried to summarize two of the suggested organizations of this config in #32332 just so we can visualize what they look like.

Renaming config items involves a lot of backwards compatibility work ...... I am not sure if there would be some people spending time on it and make it right.

And keep in mind that it can't use [lfs.client] if there is a [lfs] section because the INI system has annoying "overriding" behavior. See the TestConfigProviderBehaviors test.

The only clear approach in my mind is [server].LFS_MAX_BATCH_SIZE + [lfs_client].BATCH_SIZE.


(that's just my opinion, maybe it doesn't need to hurry to do any change before there is a consensus)

…g settings.

This contains two backwards-compatible changes:
* in the lfs http_client, the number of lfs oids requested per batch is loaded from lfs_client#BATCH_SIZE and defaulted to the previous value of 20
* in the lfs server/service, the max number of lfs oids allowed in a batch api request is loaded from server#LFS_MAX_BATCH_SIZE and defaults to 'nil' which equates to the previous behavior of 'infinite'

This fixes go-gitea#32306

Signed-off-by: Royce Remer <royceremer@gmail.com>
@rremer rremer force-pushed the configurable-lfs-batch-size branch from 379c752 to 02f9dd4 Compare October 29, 2024 16:41
@rremer rremer changed the title Add new [lfs.server] and [lfs.client] config settings for batch sizes Add new [lfs_client].BATCH_SIZE and [server].LFS_MAX_BATCH_SIZE config settings. Oct 29, 2024
@rremer
Copy link
Contributor Author

rremer commented Oct 29, 2024

Updated this branch and https://gitea.com/gitea/docs/pulls/83 with @wxiaoguang's preference of:

[server]
LFS_MAX_BATCH_SIZE

[lfs_client]
BATCH_SIZE

Should we then close/never #32332 ? Seems like we're committing to not refactoring the lfs configs and keeping it all under [server] going forward?

modules/setting/lfs.go Outdated Show resolved Hide resolved
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Oct 30, 2024
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Oct 30, 2024
@lunny lunny enabled auto-merge (squash) October 30, 2024 05:18
@lunny lunny merged commit c60e4dc into go-gitea:main Oct 30, 2024
26 checks passed
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Oct 30, 2024
silverwind added a commit to silverwind/gitea that referenced this pull request Oct 30, 2024
* origin/main: (21 commits)
  Fix toAbsoluteLocaleDate and add more tests (go-gitea#32387)
  Respect UI.ExploreDefaultSort setting again (go-gitea#32357)
  Fix absolute-date (go-gitea#32375)
  Fix undefined errors on Activity page (go-gitea#32378)
  Add new [lfs_client].BATCH_SIZE and [server].LFS_MAX_BATCH_SIZE config settings. (go-gitea#32307)
  remove unused call to $.HeadRepo in view_title template (go-gitea#32317)
  Fix clean tmp dir (go-gitea#32360)
  Optimize branch protection rule loading (go-gitea#32280)
  Suggestions for issues (go-gitea#32327)
  Migrate vue components to setup (go-gitea#32329)
  Fix db engine (go-gitea#32351)
  Refactor the DB migration system slightly (go-gitea#32344)
  Fix broken image when editing comment with non-image attachments (go-gitea#32319)
  Fix disable 2fa bug (go-gitea#32320)
  Upgrade rollup to 4.24.0 (go-gitea#32312)
  Upgrade vue to 3.5.12 (go-gitea#32311)
  Make admins adhere to branch protection rules (go-gitea#32248)
  Prevent from submitting issue/comment on uploading (go-gitea#32263)
  Add warn log when deleting inactive users (go-gitea#32318)
  Add `DISABLE_ORGANIZATIONS_PAGE` and `DISABLE_CODE_PAGE` settings for explore pages and fix an issue related to user search (go-gitea#32288)
  ...
zjjhot added a commit to zjjhot/gitea that referenced this pull request Oct 31, 2024
* giteaofficial/main:
  Fix suggestions for issues (go-gitea#32380)
  refactor: remove redundant err declarations (go-gitea#32381)
  Fix the missing menu in organization project view page (go-gitea#32313)
  Fix toAbsoluteLocaleDate and add more tests (go-gitea#32387)
  Respect UI.ExploreDefaultSort setting again (go-gitea#32357)
  Fix absolute-date (go-gitea#32375)
  Fix undefined errors on Activity page (go-gitea#32378)
  Add new [lfs_client].BATCH_SIZE and [server].LFS_MAX_BATCH_SIZE config settings. (go-gitea#32307)
  remove unused call to $.HeadRepo in view_title template (go-gitea#32317)
  Fix clean tmp dir (go-gitea#32360)
  Optimize branch protection rule loading (go-gitea#32280)
  Suggestions for issues (go-gitea#32327)
  Migrate vue components to setup (go-gitea#32329)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-update-needed The document needs to be updated synchronously lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. modifies/go Pull requests that update Go code size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LFS batch size should be configurable
4 participants