Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude some commands from online check. #1514

Closed

Conversation

j-bennet
Copy link
Contributor

@j-bennet j-bennet commented Jun 8, 2022

Some of the commands don't need Rally to be online.

Fixes #1506.

@j-bennet j-bennet marked this pull request as draft June 8, 2022 23:16
@j-bennet j-bennet force-pushed the j-bennet/1506-offline-commands branch 2 times, most recently from 9dab8fb to 753bd55 Compare June 9, 2022 00:00
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic that you're tackling this, thanks! Three notes:

  • This breaks an integration test that was relying on esrally list tracks to test the internet connection behavior.
  • Maybe we don't need tests for this in two places, but I think keeping the unit test to completely test the logic is a nice touch. Up to you. If you keep the test however, please mock the net.has_internet_connection call too, or running unit tests will get three seconds slower for me, which would be ironic as I run tests more than offline Rally commands.
  • Is this still a draft? Do you expect review on drafts?

@j-bennet
Copy link
Contributor Author

j-bennet commented Jun 9, 2022

@pquentin This is a draft because I wanted to get all tests passing before requesting a review. In fact, I'd rather remove the integration tests and replace them with unit tests where we can, in this case we can. Integration tests are slow.

@j-bennet
Copy link
Contributor Author

j-bennet commented Jun 9, 2022

@pquentin

Maybe we don't need tests for this in two places

What do you think about removing some of the integration tests, because the new unit test replaces them? I think that these can go away:

  • rally/it/proxy_test.py

    Lines 72 to 94 in 28eea6c

    def test_run_with_direct_internet_connection(cfg, http_proxy, fresh_log_file):
    assert it.esrally(cfg, "list tracks") == 0
    assert_log_line_present(fresh_log_file, "Connecting directly to the Internet")
    @it.rally_in_mem
    def test_anonymous_proxy_no_connection(cfg, http_proxy, fresh_log_file):
    env = dict(os.environ)
    env["http_proxy"] = http_proxy.anonymous_url
    assert process.run_subprocess_with_logging(it.esrally_command_line_for(cfg, "list tracks"), env=env) == 0
    assert_log_line_present(fresh_log_file, f"Connecting via proxy URL [{http_proxy.anonymous_url}] to the Internet")
    # unauthenticated proxy access is prevented
    assert_log_line_present(fresh_log_file, "No Internet connection detected. Specify --offline")
    @it.rally_in_mem
    def test_authenticated_proxy_user_can_connect(cfg, http_proxy, fresh_log_file):
    env = dict(os.environ)
    env["http_proxy"] = http_proxy.authenticated_url
    assert process.run_subprocess_with_logging(it.esrally_command_line_for(cfg, "list tracks"), env=env) == 0
    assert_log_line_present(fresh_log_file, f"Connecting via proxy URL [{http_proxy.authenticated_url}] to the Internet")
    # authenticated proxy access is allowed
    assert_log_line_present(fresh_log_file, "Detected a working Internet connection")
  • rally/it/basic_test.py

    Lines 39 to 43 in 28eea6c

    def test_run_without_http_connection(cfg):
    cmd = it.esrally_command_line_for(cfg, "list races")
    output = process.run_subprocess_with_output(cmd, {"http_proxy": "http://invalid"})
    expected = "No Internet connection detected. Specify --offline"
    assert expected in "\n".join(output)

@j-bennet j-bennet marked this pull request as ready for review June 9, 2022 23:11
@j-bennet j-bennet requested a review from pquentin June 9, 2022 23:12
@j-bennet
Copy link
Contributor Author

j-bennet commented Jun 9, 2022

@pquentin Tests now pass, net.has_internet_connection is mocked, and I marked the PR as ready for review.

You'll notice that I removed test_run_without_http_connection in it/basic_tests.py, because the new unit test covers that.

I didn't remove it/proxy_test.py, because it seems like a lot of work went into these integration tests, but TBH, I think they are not necessary. What do you think?

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Unfortunately, digging deeper it turns out that many commands do need Internet connection in special cases. I mentioned this inline. (To be honest I'm not sure why we need to look before leaping, we could just try the command and report a connection error if it failed due to a bad connection...)

Regarding unit tests vs. integration tests, this is highly subjective. A few comments:

  1. The effort made for those integration tests is not a consideration, code is code, and removing code is always nice.
  2. But then, integration tests actually test the functionality end-to-end, eg. they exercise the has_internet_connection function that is otherwise mocked
  3. We also don't have any unit test that test the proxy case?
  4. And yes, it tests are so slow that I don't run them locally.

So I would tend towards leaving proxy_test.py. What do you think?

@@ -51,6 +51,8 @@
from esrally.tracker import tracker
from esrally.utils import console, convert, io, net, opts, process, versions

OFFLINE_SUBCOMMANDS = ("compare", "list", "start", "stop", "generate")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is complicated:

  • start can pull Docker images,
  • stop writes to the metrics store which can be a remote cluster.
  • list races and list tracks support fetching the data from a remote metrics cluster
  • list tracks, list cars and list elasticsearch-plugins need to fetch data from GitHub

So the commands that are truly offline are: compare, list telemetry, list pipelines, and generate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks, that's good to know. I'll make changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I struggled with this one. I'll think about it some more.

@@ -898,6 +900,21 @@ def configure_reporting_params(args, cfg):
cfg.add(config.Scope.applicationOverride, "reporting", "numbers.align", args.report_numbers_align)


def ensure_internet_connection(args, cfg, logger):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I'm not really happy with the name, there are two cases where no Internet connection is needed at all, but I haven't found a better name! 🤷

@j-bennet
Copy link
Contributor Author

j-bennet commented Jun 13, 2022

To be honest I'm not sure why we need to look before leaping, we could just try the command and report a connection error if it failed due to a bad connection...

Same here, I'm starting to wonder if we need this check at all. It feels like we should be able to do without this check, or at least, if it's necessary for some reason, do it lazily as needed. I'm going to look closer at the code and figure out what's the point of the offline mode. There's probably something I'm not seeing.

@j-bennet j-bennet force-pushed the j-bennet/1506-offline-commands branch from 8be8c3c to 53de26b Compare June 13, 2022 21:52
@j-bennet
Copy link
Contributor Author

j-bennet commented Jun 13, 2022

@pquentin

  1. But then, integration tests actually test the functionality end-to-end, eg. they exercise the has_internet_connection function that is otherwise mocked

Good point.

  1. We also don't have any unit test that test the proxy case?

Ah. In that case, we should definitely keep them.

So I would tend towards leaving proxy_test.py. What do you think?

I agree.

I looked through the code to figure out what the offline mode actually does. A couple of things:

  1. RallyRepository (teams or tracks) won't attempt to update itself.
  2. A custom TrackProcessor with a downloader will raise an error when download is called.

The offline feature was designed for corporate environments, where benchmarking clients are not able to access the Internet.

I don't see a good reason to keep the method net.check_internet_connection. By default, it checks if we have connection to github.com, or user can assign a custom probing.url in config, for example if their teams and tracks repos are hosted on the internal network.

But why an extra check? We might as well try and fail when updating these repos.

I think the only purpose of the extra check was to automatically set Rally to offline mode, even when the user didn't specify it, but since this functionality was removed in #1503, we don't need to do this anymore.

At this point, I think I would:

  1. remove probing.url and net.check_internet_connection
  2. if offline mode is specified, rally shouldn't attempt to update repos, whether they are internal or not (and in any case, there's no good way to tell if they're internal)
  3. if offline mode is not specified, rally should try updating repos, and fail on error.

What do you think? This would change the PR, but it makes more sense to me.

Edited: Here is the alternative PR: #1517

@pquentin
Copy link
Member

I do prefer #1517, but asked other to chime in first.

@pquentin
Copy link
Member

Closing as #1517 is indeed the way to go.

@pquentin pquentin closed this Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rally checks for internet connection even when it's not needed
2 participants