Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FINAL: Git storage option feature #1279

Open
wants to merge 112 commits into
base: main
Choose a base branch
from

Conversation

begonaguereca
Copy link
Contributor

@begonaguereca begonaguereca commented Oct 8, 2024

Summary:

This PR introduces the ability for the GEI CLI to upload archives to GitHub-owned storage using multipart uploads. To enable this, a new CLI option, --use-github-storage, has been added. This option allows users to explicitly specify that their archives should be uploaded to GitHub's managed storage instead of their own.

Key Features:

Multipart Upload Support: Leveraging multipart upload for reliable and efficient large file uploads. This ensures the robustness of the uploads, minimizing the chances of failure during transmission. We're investigating libraries to handle multipart uploads, but for now, this feature is not rolled by hand unless necessary.

New Option: --use-github-storage:

This option is required to upload archives to GitHub-owned storage and will be hidden until GitHub-owned storage reaches general availability (GA).
By default, archives are still uploaded to user-specified storage locations unless the --use-github-storage flag is explicitly set.

Motivation:

As discussed during our EDR, we want to allow users to utilize GitHub-owned storage as an option without making it the default. This approach lets us roll out GitHub-managed storage in stages and gather feedback while providing an explicit way to opt-in to this feature.

Check in BBS MigrateRepoCommandArgs removed, relating to changes in this PR: #1057


Take it out for a 🚗 :

*** Remember to export env variables***

export GH_SOURCE_PAT=<<PAT>>
export GH_PAT=<<PAT>> 

Migrating from GitHub Enterprise Server:

dotnet run --project src/gei/gei.csproj -- migrate-repo --github-source-org valet-testing --source-repo integration-tests --github-target-org octoshift-staging --target-repo YOU_ADD_NEW_NAME --use-github-storage true --ghes-api-url https://octoshift-ghe.westus2.cloudapp.azure.com/api/v3 --verbose

Successful migration_id: RM_kgHaACRlZjNiMmExMy02ZTRkLTQ0NWYtOTc4Yy05YTk1MTQ5NzE2OTk

Migrating from GitHub Enterprise Server- Multi part upload:

dotnet run --project src/gei/gei.csproj -- migrate-repo --github-source-org github-owned-storage --source-repo large-1-gib --github-target-org import-testing --target-repo YOU_ADD_NEW_NAME --use-github-storage true --ghes-api-url https://octoshift-ghe.westus2.cloudapp.azure.com/api/v3 --verbose

Successful migration_id: RM_kgDaACQ0MzZmN2RjMS0wNjk4LTRlYjctYmMzNy0wMTAxNGRhOTQ3YTU


Migrating from BBS:

Source repo: https://test-bbs-o.githubapp.com/projects/IM/repos/codeowners-test/browse
You'll need to add yourself to the pinhole firewall at https://portal.azure.com/?wa=wsignin1.0#@githubazure.onmicrosoft.com/resource/subscriptions/7c45ec63-636b-445b-8185-54accbc0190d/resourceGroups/octoshift-test-bbs/providers/Microsoft.Compute/virtualMachines/bbs-test/networkSettings
Here's the SSH key (and other creds) for our bbs-test-o instance: https://start.1password.com/open/i?a=LKXPAKKGYNBF7IOHIU6VPDFEU4&v=akrfryl3erhcqdcvgiqqtvh254&i=23dceluq4vdvpptr7qen3wcbiu&h=github.1password.com

dotnet run --project src/bbs2gh/bbs2gh.csproj -- migrate-repo --bbs-project IM --bbs-repo codeowners-test --ssh-user bbs --ssh-private-key /Users/begonaguereca/.ssh/id_rsa_bbs_octoshift --use-github-storage true --bbs-server-url https://test-bbs-o.githubapp.com --github-org octoshift-staging --github-repo ADD_NEW_NAME --verbose

Successful migration_id: RM_kgHaACRhODI4NWRmZS0yNWM3LTQ1YjEtYTJhOC1jYmZjNTkxZDE0Mjk


Closes: https://github.ghe.com/github/octoshift/issues/8969

  • Did you write/update appropriate tests
  • Release notes updated (if appropriate)
  • Appropriate logging output
  • Issue linked
  • Docs updated (or issue created)
  • New package licenses are added to ThirdPartyNotices.txt (if applicable)

@begonaguereca begonaguereca changed the title Git storage option Git storage option feature Oct 8, 2024
@begonaguereca begonaguereca changed the title Git storage option feature FINAL: Git storage option feature Oct 8, 2024
Copy link

github-actions bot commented Oct 8, 2024

Unit Test Results

825 tests   825 ✅  22s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 1478850.

♻️ This comment has been updated with latest results.

begonaguereca and others added 22 commits October 25, 2024 14:15
Co-authored-by: Arin Ghazarian <aringhazarian@github.com>
Co-authored-by: Arin Ghazarian <aringhazarian@github.com>
Co-authored-by: Arin Ghazarian <aringhazarian@github.com>
Co-authored-by: Arin Ghazarian <aringhazarian@github.com>
…ests

Github Owned Storage integration tests
Co-authored-by: Arin Ghazarian <aringhazarian@github.com>
Co-authored-by: Arin Ghazarian <aringhazarian@github.com>
@begonaguereca begonaguereca marked this pull request as ready for review October 30, 2024 18:55
Comment on lines +85 to +88
catch (Exception ex)
{
throw new OctoshiftCliException("Failed during multipart upload.", ex);
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.

try
{
var (responseContent, headers) = await _client.PostWithFullResponseAsync(uploadUrl, body);

Check warning

Code scanning / CodeQL

Useless assignment to local variable Warning

This assignment to
responseContent
is useless, since its value is never read.
Comment on lines +107 to +110
catch (Exception ex)
{
throw new OctoshiftCliException("Failed to start upload.", ex);
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.
try
{
// Make the PATCH request and retrieve headers
var (responseContent, headers) = await _client.PatchWithFullResponseAsync(nextUrl, content);

Check warning

Code scanning / CodeQL

Useless assignment to local variable Warning

This assignment to
responseContent
is useless, since its value is never read.
Comment on lines +127 to +130
catch (Exception ex)
{
throw new OctoshiftCliException("Failed to upload part.", ex);
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.
Comment on lines +140 to +143
catch (Exception ex)
{
throw new OctoshiftCliException("Failed to complete upload.", ex);
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.
// Using a MemoryStream as a valid stream implementation
using var archiveContent = new MemoryStream(new byte[] { 1, 2, 3 });
var expectedUri = "gei://archive/123456";
var jsonResponse = $"{{ \"uri\": \"{expectedUri}\" }}";

Check warning

Code scanning / CodeQL

Useless assignment to local variable Warning test

This assignment to
jsonResponse
is useless, since its value is never read.
Copy link

Code Coverage

Package Line Rate Branch Rate Complexity Health
gei 80% 71% 540
ado2gh 84% 78% 627
Octoshift 87% 76% 1302
bbs2gh 79% 74% 666
Summary 83% (7009 / 8396) 75% (1575 / 2094) 3135

Copy link
Collaborator

@ArinGhazarian ArinGhazarian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at the integration tests and the reason they are failing is because the generate-script commands for gei and bbs2gh don't support the --use-github-storage option yet! What I recommend is to revert the INT test changes for now and add them back (in a separate PR) once we add the option to the generate-script commands.

@@ -117,7 +120,7 @@ await retryPolicy.Retry(async () =>
_tokens.Add("AWS_ACCESS_KEY_ID", Environment.GetEnvironmentVariable("AWS_ACCESS_KEY_ID"));
_tokens.Add("AWS_SECRET_ACCESS_KEY", Environment.GetEnvironmentVariable("AWS_SECRET_ACCESS_KEY"));
var awsBucketName = Environment.GetEnvironmentVariable("AWS_BUCKET_NAME");
archiveUploadOptions = $" --aws-bucket-name {awsBucketName} --aws-region {AWS_REGION}";
archiveUploadOptions = $" --aws-bucket-name {awsBucketName} --aws-region {AWS_REGION} --use-github-storage {useGithubStorage}";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of things here:

  1. --use-github-storage is not yet a supported option for the generate-script commands so this ultimately won't work until we add the option.
  2. Even if the GitHub storage option was supported by generate-script this still wouldn't work because both AWS and GitHub storage options are provided at the same time which should be a validation error.

[InlineData("http://e2e-bbs-7-21-9-win-2019.eastus.cloudapp.azure.com:7990", false, true, false)]
[InlineData("http://e2e-bbs-8-5-0-linux-2204.eastus.cloudapp.azure.com:7990", true, false, false)]
[InlineData("http://e2e-bbs-8-5-0-linux-2204.eastus.cloudapp.azure.com:7990", false, false, true)]
public async Task Basic(string bbsServer, bool useSshForArchiveDownload, bool useAzureForArchiveUpload, bool useGithubStorage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I recommend adding an enum for the upload options as a class member (nested enum) right after all private readonly member declarations:

public enum ArchiveUploadOption { AzureStorage, AwsS3, GithubStorage };

Then change the method signature and the inline data like the following:

[InlineData("http://e2e-bbs-8-5-0-linux-2204.eastus.cloudapp.azure.com:7990", true,  ArchiveUploadOption.AzureStorage)]
[InlineData("http://e2e-bbs-7-21-9-win-2019.eastus.cloudapp.azure.com:7990", false, ArchiveUploadOption.AzureStorage)]
[InlineData("http://e2e-bbs-8-5-0-linux-2204.eastus.cloudapp.azure.com:7990", true, ArchiveUploadOption.AwsS3)]
[InlineData("http://e2e-bbs-8-5-0-linux-2204.eastus.cloudapp.azure.com:7990", true, ArchiveUploadOption.GithubStorage)]
public async Task Basic(string bbsServer, bool useSshForArchiveDownload, bool useAzureForArchiveUpload, bool useGithubStorage)

@@ -83,7 +87,7 @@ await retryPolicy.Retry(async () =>
});

await _targetHelper.RunGeiCliMigration(
$"generate-script --github-source-org {githubSourceOrg} --github-target-org {githubTargetOrg} --ghes-api-url {GHES_API_URL} --download-migration-logs", _tokens);
$"generate-script --github-source-org {githubSourceOrg} --github-target-org {githubTargetOrg} --ghes-api-url {GHES_API_URL} --use-github-storage {useGithubStorage} --download-migration-logs", _tokens);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of false for useGithubStorge it shouldn't add the option at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants