-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for using customized HTTP headers in download_file #3472
add support for using customized HTTP headers in download_file #3472
Conversation
@boegel, @Louwrensth has kindly provided a first attempt at the changes we would need to allow authentication to our protected repositories. Please let us have any feedback and we can try to refine further. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, but a couple of important suggestions to look into.
We definitely need to cover this properly in the tests too, let us know if you need any help with that.
That's a very good point... The only way I see to prevent that from happening is to add support for something like |
Thanks very much for your fast and useful inspection.
We definitely need to cover this properly in the tests too, let us know if you need any help with that.
I will try to add some tests myself first, I'll learn and use what I can from [#3248](#3248) and related.
Is it a potential problem that values can not include : characters?
I just saw that according to [rfc7230#section-3.2.6](https://tools.ietf.org/html/rfc7230#section-3.2.6) there is not supposed to be a colon in the field value (also some other delimiters are excluded).
The reason why I included the ``hf.count(":") == 1`` test was to ignore lines that do not have any colon, i.e. they are not a valid key: value pair (and ``dict(...)`` will barf)
Nevertheless Thanks, I didn't know you could do ``split(':', 1)``.
Will push later.
|
Thanks, yes, I was thinking along those lines.
It would be best if we can have multiple of something like --http-header-fields-url-pattern where each URL pattern has it's own custom header string/file... not quite sure if I can find a similar option to borrow code from, maybe --amend=param_a=value_a (i.e. something like --http-header-fields=urlpat=headerfile that can be used multiple times)?
(When installing a lot of software via --robot I think it is fairly possible that you would want to define custom headers for different hosts at once)
Please let me know if you have a good suggestion here.
|
4464157
to
317316c
Compare
Update: my statement is not true. Consider this header, for example: I'm using a double colon to separate the URL pattern from the header field, I think it is hardly used in HTTP header fields, but I could be wrong. Plus, I'm using Todo: make unit tests... |
317316c
to
fd9b7e3
Compare
Not sure why this failure appears. Any idea? I was trying to run the unit tests myself, but for some reason I get the error "Toolchain system not found, available toolchains: GCCcore" I guess my config is corrupted? I installed an EasyBuild/4.3.0 using
@SimonPinches : are you able to run the framework tests on our HPC? |
@Louwrensth Others (cc @deniskristak) have reported similar issues, but we haven't been able to reproduce that problem... Can you provide some more information about your setup? Also, what does the following command produce?
|
Thanks for checking in on this! I guess I'm seeing something opposite of code rot. I tried to provide you with the infos after restarting my environment. I think the problem as disappeared :) Maybe due to some system changes that were rolling out (without my knowledge)? I had restarted my environment before too... Anyhow, now I seem to be able to reproduce the same error as the CI agent:
I am new to testing with EB, does this message mean that the particular regex search failed to find matches on
on the Any hints to debug this would be welcome! otherwise I will just try printing stuff and see what comes out. |
@boegel : I found a way to reproduce the error Maybe this error is trivial to you now. Please let me know if you need me to do any test beyond this. I don't quite understand why some check runs fail and others don't... |
add true positive tests on header and file inclusion
@migueldiascosta please have a look at this update, I've tried to replace the false positive tests with true positive tests and to add the file inclusion if it was set in the header field value... Hope I caught them all. Many thanks for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Louwrensth I would still like to get this in for the next EasyBuild release.
I think the suggestions I made a relatively minor, there's clearly a lot of work behind this already!
The security part of it (avoiding that secrets get logged) looks OK, that was a big concern for me in an earlier iteration of this.
Please let us know if you'll be able to go through this again in the short term, since we would like to push out EasyBuild v4.3.3 soon...
self.assertEqual({urlgnu: ["%s:%s" % (hdragent, valagent)]}, urlpat_headers) | ||
|
||
# Case B: urlpat has another urlpat: retain deepest level | ||
args = "%s::%s::%s::%s:%s" % (urlgnu, urlgnu, urlex, hdragent, valagent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering: what's the use case for nesting of URL patterns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it is not really useful on the same line, but it to facilitate the case where another URL pattern is specified in a file that is read. If recursive calls to several files occur, it should pick up the deepest level url pattern for the header field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense.
args = list(common_args) | ||
args.extend([ | ||
'--http-header-fields-urlpat=gnu.org::%s:%s' % (testdohdr, testdoval), | ||
'--http-header-fields-urlpat=nomatch.com::%s:%s' % (testdonthdr, testdontval), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also check with single --http-header-fields-urlpat
that has both entries, comma-separated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, but decided not to use the strlist
style option, because the comma could be used as a field value. The separation character is \n
, but that is awkward to use on the command line. The user can still use the option twice or more times if needed, or just use a file to specify an array of fields... So there is no support for --http-header-fields-urlpat=a,b,c
at the moment. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a good technical reason not to support it, so fine by me :)
Yes, I think I will be able to manage your good suggestions tonight. Would you recommend that I will the squash the commits after this? |
No please don't, that makes re-review of the PR harder for us. Don't worry about keeping commit history "clean" too much. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks a lot for your efforts on this @Louwrensth, and @migueldiascosta for helping out with the reviewing!
suggestions are taken into account
Here is an attempt to provide some customization of the HTTP headers sent by EB when downloading sources. Previously it was hardcoded with the
User-Agent
andAccept
header fields.In our setup we can use it to add a header field
Authorization: Bearer <tokenhash>
to permit downloading the sources from a private git server that supports tokens.Any HTTP header field can be supplied via the string value of a new option
--http-header-fields=str
.If the string is a path to an existing file, it's content will be used instead. This is useful to prevent the logs exposing sensitive data.
Please let me know how to improve this attempt.
For one it may not be so good that the HTTP headers are sent to any host listed in the sources...