Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for using customized HTTP headers in download_file #3472

Merged
merged 19 commits into from
Feb 17, 2021

Conversation

Louwrensth
Copy link
Contributor

Here is an attempt to provide some customization of the HTTP headers sent by EB when downloading sources. Previously it was hardcoded with the User-Agent and Accept header fields.

In our setup we can use it to add a header field Authorization: Bearer <tokenhash> to permit downloading the sources from a private git server that supports tokens.

Any HTTP header field can be supplied via the string value of a new option --http-header-fields=str.
If the string is a path to an existing file, it's content will be used instead. This is useful to prevent the logs exposing sensitive data.

Please let me know how to improve this attempt.

For one it may not be so good that the HTTP headers are sent to any host listed in the sources...

@SimonPinches
Copy link

@boegel, @Louwrensth has kindly provided a first attempt at the changes we would need to allow authentication to our protected repositories. Please let us have any feedback and we can try to refine further. Thanks.

@boegel boegel added this to the next release (4.3.1) milestone Oct 16, 2020
@boegel boegel changed the title Customize HTTP headers add support for using customized HTTP headers in download_file Oct 16, 2020
@boegel boegel modified the milestones: next release (4.3.1), 4.x Oct 16, 2020
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but a couple of important suggestions to look into.

We definitely need to cover this properly in the tests too, let us know if you need any help with that.

easybuild/tools/filetools.py Outdated Show resolved Hide resolved
easybuild/tools/filetools.py Outdated Show resolved Hide resolved
easybuild/tools/filetools.py Outdated Show resolved Hide resolved
@boegel
Copy link
Member

boegel commented Oct 16, 2020

For one it may not be so good that the HTTP headers are sent to any host listed in the sources...

That's a very good point...

The only way I see to prevent that from happening is to add support for something like --http-header-fields-url-pattern where you can specify a pattern for URLs, and then only use the custom HTTP headers for URLs that match that pattern?

@Louwrensth
Copy link
Contributor Author

Louwrensth commented Oct 19, 2020 via email

@Louwrensth
Copy link
Contributor Author

Louwrensth commented Oct 19, 2020 via email

@Louwrensth Louwrensth force-pushed the customize_http_headers branch 2 times, most recently from 4464157 to 317316c Compare October 23, 2020 23:27
@Louwrensth
Copy link
Contributor Author

Is it a potential problem that values can not include : characters?
I just saw that according to rfc7230#section-3.2.6 there is not supposed to be a colon in the field value (also some other delimiters are excluded). The reason why I included the hf.count(":") == 1 test was to ignore lines that do not have any colon, i.e. they are not a valid key: value pair (and dict(...) will barf) Nevertheless Thanks, I didn't know you could do split(':', 1). Will push later.

Update: my statement is not true. Consider this header, for example: Referer: http://www.example.com/ Colons are clearly acceptable.

I'm using a double colon to separate the URL pattern from the header field, I think it is hardly used in HTTP header fields, but I could be wrong. Plus, I'm using split('::', 1), so what remains as a bug: if some user has '::' in their field and did not specify a URL pattern on that line.... Acceptable limitation?

Todo: make unit tests...

@easybuilders easybuilders deleted a comment from boegelbot Oct 25, 2020
@Louwrensth Louwrensth force-pushed the customize_http_headers branch from 317316c to fd9b7e3 Compare October 28, 2020 12:47
@Louwrensth
Copy link
Contributor Author

Not sure why this failure appears. Any idea?

I was trying to run the unit tests myself, but for some reason I get the error "Toolchain system not found, available toolchains: GCCcore" I guess my config is corrupted?

I installed an EasyBuild/4.3.0 using eb --pretend, loaded the module and then issued:

$ export TEST_EASYBUILD_MODULES_TOOL=EnvironmentModules TEST_EASYBUILD_MODULE_SYNTAX=Tcl
$ python -m test.framework.options

@SimonPinches : are you able to run the framework tests on our HPC?

@boegel
Copy link
Member

boegel commented Nov 12, 2020

@Louwrensth Others (cc @deniskristak) have reported similar issues, but we haven't been able to reproduce that problem...

Can you provide some more information about your setup?
Which Python version are you using here, what does eb --show-config produce, what is listed in $PYTHONPATH, etc.?

Also, what does the following command produce?

python -c "from easybuild.tools.toolchain.utilities import search_toolchain; import easybuild.toolchains; print(easybuild.toolchains.__path__); print(search_toolchain('system'))[0]"

@Louwrensth
Copy link
Contributor Author

Thanks for checking in on this! I guess I'm seeing something opposite of code rot. I tried to provide you with the infos after restarting my environment. I think the problem as disappeared :) Maybe due to some system changes that were rolling out (without my knowledge)? I had restarted my environment before too...

Anyhow, now I seem to be able to reproduce the same error as the CI agent:

[vandell@sdcc-login02 easybuild-framework]$ m use ~/easybuildinstall/modules/all
[vandell@sdcc-login02 easybuild-framework]$ m load EasyBuild
[vandell@sdcc-login02 easybuild-framework]$ export TEST_EASYBUILD_MODULES_TOOL=EnvironmentModules TEST_EASYBUILD_MODULE_SYNTAX=Tcl
[vandell@sdcc-login02 easybuild-framework]$ python -m test.framework.options test_http_header_fields_urlpat
Filtered CommandLineOptionsTest tests using 'test_http_header_fields_urlpat', retained 1/114 tests: test_http_header_fields_urlpat
F
======================================================================
FAIL: test_http_header_fields_urlpat (__main__.CommandLineOptionsTest)
Test use of --http-header-fields-urlpat.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ITER/vandell/GIT/easybuilders/easybuild-framework/test/framework/options.py", line 2413, in test_http_header_fields_urlpat
    self.assertTrue(test_applied_hdr_regex.search(stdout))
AssertionError: None is not true

----------------------------------------------------------------------
Ran 1 test in 0.696s

FAILED (failures=1)

I am new to testing with EB, does this message mean that the particular regex search failed to find matches on stdout ? seems so.
I copied some snippets from other test functions and tried:

        stdout, stderr = self._run_mock_eb(args, do_build=True, raise_error=True, testing=False)

on the gzip easyconfig, with some arguments including --http-header-fields-urlpat.

Any hints to debug this would be welcome! otherwise I will just try printing stuff and see what comes out.

@Louwrensth
Copy link
Contributor Author

@boegel : I found a way to reproduce the error Toolchain system not found, available toolchains: GCCcore, just change out of the easybuild-framework directory:
https://gist.github.com/Louwrensth/d4f3eef01e941ba8ba5bd904cc63d115

Maybe this error is trivial to you now. Please let me know if you need me to do any test beyond this.

I don't quite understand why some check runs fail and others don't...

@Louwrensth
Copy link
Contributor Author

@migueldiascosta please have a look at this update, I've tried to replace the false positive tests with true positive tests and to add the file inclusion if it was set in the header field value... Hope I caught them all. Many thanks for your review.

@easybuilders easybuilders deleted a comment from boegelbot Feb 8, 2021
@easybuilders easybuilders deleted a comment from boegelbot Feb 8, 2021
easybuild/tools/filetools.py Outdated Show resolved Hide resolved
easybuild/tools/filetools.py Outdated Show resolved Hide resolved
@easybuilders easybuilders deleted a comment from boegelbot Feb 13, 2021
@boegel boegel modified the milestones: 4.x, 4.3.3 (next release) Feb 13, 2021
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Louwrensth I would still like to get this in for the next EasyBuild release.

I think the suggestions I made a relatively minor, there's clearly a lot of work behind this already!
The security part of it (avoiding that secrets get logged) looks OK, that was a big concern for me in an earlier iteration of this.

Please let us know if you'll be able to go through this again in the short term, since we would like to push out EasyBuild v4.3.3 soon...

easybuild/tools/filetools.py Show resolved Hide resolved
easybuild/tools/filetools.py Outdated Show resolved Hide resolved
easybuild/tools/filetools.py Outdated Show resolved Hide resolved
test/framework/options.py Show resolved Hide resolved
easybuild/tools/filetools.py Show resolved Hide resolved
self.assertEqual({urlgnu: ["%s:%s" % (hdragent, valagent)]}, urlpat_headers)

# Case B: urlpat has another urlpat: retain deepest level
args = "%s::%s::%s::%s:%s" % (urlgnu, urlgnu, urlex, hdragent, valagent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: what's the use case for nesting of URL patterns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is not really useful on the same line, but it to facilitate the case where another URL pattern is specified in a file that is read. If recursive calls to several files occur, it should pick up the deepest level url pattern for the header field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense.

args = list(common_args)
args.extend([
'--http-header-fields-urlpat=gnu.org::%s:%s' % (testdohdr, testdoval),
'--http-header-fields-urlpat=nomatch.com::%s:%s' % (testdonthdr, testdontval),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also check with single --http-header-fields-urlpat that has both entries, comma-separated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, but decided not to use the strlist style option, because the comma could be used as a field value. The separation character is \n, but that is awkward to use on the command line. The user can still use the option twice or more times if needed, or just use a file to specify an array of fields... So there is no support for --http-header-fields-urlpat=a,b,c at the moment. Is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a good technical reason not to support it, so fine by me :)

test/framework/options.py Outdated Show resolved Hide resolved
test/framework/options.py Outdated Show resolved Hide resolved
test/framework/options.py Outdated Show resolved Hide resolved
@Louwrensth
Copy link
Contributor Author

Please let us know if you'll be able to go through this again in the short term, since we would like to push out EasyBuild v4.3.3 soon...

Yes, I think I will be able to manage your good suggestions tonight.
Many thanks for the review.

Would you recommend that I will the squash the commits after this?

@boegel
Copy link
Member

boegel commented Feb 15, 2021

Would you recommend that I will the squash the commits after this?

No please don't, that makes re-review of the PR harder for us. Don't worry about keeping commit history "clean" too much. :)

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Thanks a lot for your efforts on this @Louwrensth, and @migueldiascosta for helping out with the reviewing!

@boegel boegel dismissed migueldiascosta’s stale review February 17, 2021 20:12

suggestions are taken into account

@boegel boegel merged commit d28ad15 into easybuilders:develop Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants