Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot scan self-hosted (private) GitLab repositories #3696

Closed
mwager opened this issue Nov 28, 2023 · 19 comments · Fixed by #3819
Closed

Cannot scan self-hosted (private) GitLab repositories #3696

mwager opened this issue Nov 28, 2023 · 19 comments · Fixed by #3819
Labels
gitlab Issue related to Scorecard's GitLab client kind/bug Something isn't working

Comments

@mwager
Copy link

mwager commented Nov 28, 2023

Describe the bug
Cannot scan self-hosted (private) GitLab repositories

Reproduction steps
Steps to reproduce the behavior:

  1. See screenshot

Shouldnt this be possible acc. to your blogpost?

You can run Scorecard today on a GitLab.com (or self-hosted GitLab) repository by running Scorecard as you normally would

https://openssf.org/blog/2023/08/28/openssf-scorecard-launches-v4-12-with-support-for-gitlab/

Thank you!

grafik

@mwager mwager added the kind/bug Something isn't working label Nov 28, 2023
@spencerschrock spencerschrock added the gitlab Issue related to Scorecard's GitLab client label Nov 28, 2023
@spencerschrock
Copy link
Contributor

Our code tries to create a GitLab client first, and then falls back to GitHub.

repo, makeRepoError = glrepo.MakeGitlabRepo(repoURI)
if repo != nil && makeRepoError == nil {
repoClient, makeRepoError = glrepo.CreateGitlabClient(ctx, repo.Host())
}
if makeRepoError != nil || repo == nil {
repo, makeRepoError = ghrepo.MakeGithubRepo(repoURI)
if makeRepoError != nil {
return repo,
nil,
nil,
nil,
nil,
fmt.Errorf("error making github repo: %w", makeRepoError)
}
repoClient = ghrepo.CreateGithubRepoClient(ctx, logger)
}

Since it's falling back to GitHub, there was some sort of error making the GitLab one, which we unfortunately aren't surfacing.

My guess is this has to do with auth tokens since we don't pass one in here.

// IsValid implements Repo.IsValid.
func (r *repoURL) IsValid() error {
if strings.Contains(r.host, "gitlab.") {
return nil
}
if strings.EqualFold(r.host, "github.com") {
return fmt.Errorf("%w: %s", errInvalidGitlabRepoURL, r.host)
}
client, err := gitlab.NewClient("", gitlab.WithBaseURL(fmt.Sprintf("%s://%s", r.scheme, r.host)))
if err != nil {
return sce.WithMessage(err,
fmt.Sprintf("couldn't create gitlab client for %s", r.host),
)
}

@spencerschrock
Copy link
Contributor

If you're able to try building locally, can you try this patch?

diff --git a/checker/client.go b/checker/client.go
index 1b5d28a3..d9570773 100644
--- a/checker/client.go
+++ b/checker/client.go
@@ -61,6 +61,7 @@ func GetClients(ctx context.Context, repoURI, localURI string, logger *log.Logge
 	}
 
 	if makeRepoError != nil || repo == nil {
+		fmt.Println(makeRepoError)
 		repo, makeRepoError = ghrepo.MakeGithubRepo(repoURI)
 		if makeRepoError != nil {
 			return repo,
diff --git a/clients/gitlabrepo/repo.go b/clients/gitlabrepo/repo.go
index 65a44783..1a371ed6 100644
--- a/clients/gitlabrepo/repo.go
+++ b/clients/gitlabrepo/repo.go
@@ -97,7 +97,7 @@ func (r *repoURL) String() string {
 
 // IsValid implements Repo.IsValid.
 func (r *repoURL) IsValid() error {
-	if strings.Contains(r.host, "gitlab.") {
+	if strings.Contains(r.host, "gitlab") {
 		return nil
 	}
 

@mwager
Copy link
Author

mwager commented Nov 30, 2023

Hey, thanks for the fast answer! I tried to install but I am not very familiar with go, getting lot of errors... I guess you will be faster getting the makeRepoError printed

Your patch makes sense to me, the url looks like this:

https://foo.bar.com/gitlab/org/repo

@mwager
Copy link
Author

mwager commented Nov 30, 2023

@spencerschrock Building or running on the machine (centos) needed to access the private gitlab instance I am just getting this strange error, and I cannot find any useful info how to fix that:

grafik

grafik

I tried on my personal laptop adding your print statement, but of course there it is not reachable:

grafik

If you know how to fix this asm error I could try again

@spencerschrock
Copy link
Contributor

If you know how to fix this asm error I could try again

Hmm, I can't say I've seen the error before. A quick search says it's something preventing you from running the asm binary, either antivirus, or maybe all of the Go binaries aren't in your path, or there's conflicting versions of Go installed?

Maybe check you're seeing the expected things when running:

which go

Depending on the results, you can try setting your GOROOT and GOPATH.

export GOROOT=`which go`
export PATH=$GOROOT/bin:$PATH

Another alternative is compiling on your laptop and transfering the binary to the centos machine (if possible)

@mwager
Copy link
Author

mwager commented Dec 11, 2023

I fixed it via installing a fresh go bin and running scorecard directly.

Seems like there is an issue reaching the gitlab instance, but strange error regarding "invalid character <" is thrown.

grafik

However trying wget the connection works:

grafik

Update: I exported a valid gitlab token before running scorecard, like mentioned in the docs export GITLAB_AUTH_TOKEN=glpat-xxxx...

@spencerschrock
Copy link
Contributor

Seems like there is an issue reaching the gitlab instance, but strange error regarding "invalid character <" is thrown.

Thanks, I think this error helps confirm the GitLab server is responding with HTML (possibly a 404 since Scorecard isn't using the user-provided token for this call). The library we use is trying to parse the HTML as a JSON response, leading to the invalid character message.

Update: I exported a valid gitlab token

Just to confirm, exporting the token didn't fix it? If you want to try something while we discuss fixing it, i would try replacing "" in this line with os.Getenv("GITLAB_AUTH_TOKEN"), in order to send your PAT to your instance:

client, err := gitlab.NewClient("", gitlab.WithBaseURL(fmt.Sprintf("%s://%s", r.scheme, r.host)))

Resulting in:

client, err := gitlab.NewClient(os.Getenv("GITLAB_AUTH_TOKEN"), gitlab.WithBaseURL(fmt.Sprintf("%s://%s", r.scheme, r.host)))

@raghavkaul Thoughts on solving this? My suggested patch in the other comment (#3696 (comment)) wouldn't be enough since the "gitlab" part of the URL is in the path not the host. We could try a GL_HOST env var similar to what we do in the github version of IsValid() (GH_HOST)?

Or there's the fix to the "liveness" check I mention in this comment, but are there situations where we should be worried about doing this, in terms of sending a PAT to an instance it may not correspond to? Although I'm not sure any solution would prevent that, as the actual analysis part of the code will send the token.

@mwager
Copy link
Author

mwager commented Dec 12, 2023

Oh yes I thought this error reminds me on something :)

I tried your patch passing the token as first paramneter to NewClient(, it does not solve the issue:

grafik

@spencerschrock
Copy link
Contributor

Hmm, i wonder if the API link is different from what we're expecting:

So with gitlab.com, the API link is: https://gitlab.com/api/v4/ and then the rest of the path, for example: https://gitlab.com/api/v4/projects
But based on your screenshots, your instance is at https://foo.com/gitlab/, so I'm guessing the api link is different.
Maybe something like: https://foo.com/gitlab/api/v4/?

Does https://foo.com/gitlab/api/v4/projects resolve for you (after replacing foo.com with your domain?)

If so, the problematic code changes to here, where we just add api/v4/ to the end of whatever the host is:

if repo != nil && makeRepoError == nil {
repoClient, makeRepoError = glrepo.CreateGitlabClient(ctx, repo.Host())
}

@mwager
Copy link
Author

mwager commented Dec 20, 2023

Hi,

yes https://foo.com/gitlab/api/v4/projects resolves for me!

I added this code:

grafik

repoURI is smt like https://foo.com/gitlab/someorg/someproject

Still same error. Token exported.

Maybe you are also missing projects like documented here: https://docs.gitlab.com/ee/api/api_resources.html#project-resources

See this curl command works for me:

grafik

So the URL has to be https://foo.com/gitlab/api/v4/projects/namespace%2Fname

See this: https://stackoverflow.com/questions/54717065/get-the-id-of-gitlab-project-via-gitlab-api-or-gitlab-cli

id: The ID or URL-encoded path of the project

Let me know if I can help testing or anything.

@spencerschrock
Copy link
Contributor

yes https://foo.com/gitlab/api/v4/projects resolves for me!
See this curl command works for me:

I think we've found the issue. Your code patch was close, but repo.Host() only takes the (sub)domain portion of the URL (foo.com in this case), which doesn't include the gitlab part of your URL. So you still ended up with https://foo.com/api/v4/ instead of https://foo.com/gitlab/api/v4/

I've pushed a quick workaround to a branch in my fork, where the only patch compared to main is 22bc273.

Can you give that a shot, but make sure you set the GL_HOST environment variable to foo.com/gitlab/

GL_HOST=foo.com/gitlab/ go run main.go --repo https://foo.com/gitlab/ssdlc/scorecard-scanner --format json

If that works, I'll clean it up before sending a PR.

@spencerschrock
Copy link
Contributor

spencerschrock commented Dec 28, 2023

And if that complains about "couldn't reach gitlab instance", it probably needs to be combined with token patch above.

host := r.host
if h := os.Getenv("GL_HOST"); h != "" {
	// avoid duplication of the scheme when constructing baseURL below
	host = strings.TrimPrefix(h, r.scheme+"://")
}
baseURL := fmt.Sprintf("%s://%s", r.scheme, host)
client, err := gitlab.NewClient(os.Getenv("GITLAB_AUTH_TOKEN"), gitlab.WithBaseURL(baseURL))

@mwager
Copy link
Author

mwager commented Dec 29, 2023

Tried your patch, incl the token patch above. Looks better now, now it is using the host including /gitlab but then also tries to fetch gitlab/ssdlc/scorecard-scanner instead of just ssdlc/scorecard-scanner

Did I do smt wrong?

grafik

@spencerschrock
Copy link
Contributor

Tried your patch, incl the token patch above. Looks better now, now it is using the host including /gitlab but then also tries to fetch gitlab/ssdlc/scorecard-scanner instead of just ssdlc/scorecard-scanner

Did I do smt wrong?

Nope, my patch was just an incomplete solution, but we've confirmed the issue in question. Will need to think about how to handle this. Probably would involve changes to the how we parse the repo:

func (r *repoURL) parse(input string) error {

I might have some time later this week to poke around again.

@spencerschrock
Copy link
Contributor

spencerschrock commented Jan 19, 2024

Can you give it another shot with the changes I pushed to the same branch today? (sorry for all the "try this", don't have a setup to test on)

I made sure to add the token changes too. So hopefully the branch HEAD (d3d5ba6) works.

If it doesn't, give HEAD~1 (22d788c) a try too.

@mwager
Copy link
Author

mwager commented Jan 23, 2024

No problem, I would be very glad to help :)

I pulled your repo, and did git checkout fix/gitlab-host-path

Looks good!

grafik

Throws an error after the results:

grafik

@spencerschrock
Copy link
Contributor

Throws an error after the results

In this particular case, I think the fix is in main already (I haven't pulled in updates to this branch since I forked it ~1 month ago). I'll work later this week on getting this up as a PR

@mwager
Copy link
Author

mwager commented Jan 24, 2024

Nice, let me know if I can help review & test the PR...

@mwager
Copy link
Author

mwager commented Feb 1, 2024

Thank you so much! 🫶🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gitlab Issue related to Scorecard's GitLab client kind/bug Something isn't working
Projects
None yet
2 participants