Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo: cockroach demo attempts to obtain a temporary license upon startup and enables telemetry #40273

Merged
merged 1 commit into from
Aug 30, 2019

Conversation

rohany
Copy link
Contributor

@rohany rohany commented Aug 27, 2019

There is a wider discussion/design going on to serve licenses in
a more general way in the future, so this commit is not aiming
for a future-proof design, but instead an MVP to allow users to
demo enterprise features within cockroach demo.

We are not concerned about offline usage of enterprise features
as users can obtain a license and enable features manually using SET.

Fixes #40222.

Release note (cli change): cockroach demo attempts to contact a license
server to obtain a temporary license. cockroach demo now enables
telemetry for the demo cluster. This feature can be opted out of by
setting the COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING environment variable
(https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html).

@rohany rohany requested review from dt, jordanlewis and a team August 27, 2019 23:14
@rohany rohany requested a review from a team as a code owner August 27, 2019 23:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

pkg/cli/demo.go Show resolved Hide resolved
pkg/cli/demo.go Outdated Show resolved Hide resolved
@rohany
Copy link
Contributor Author

rohany commented Aug 28, 2019

RFAL

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, the number one and two rules we've had for 4 years is 1) "no phoning home without informing the user" and 2) "no phoning home without a way for the user to opt out".

Are you sure this implementation as-is is a good idea? What was the rationale? (tip: the rationale must be in the commit message)

Maybe @bdarnell has an opinion on this.

My opinion is that we should hardcode a license string that's good for just the demo session, with no phoning home whatsoever.

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @rohany)


pkg/cli/demo.go, line 224 at r2 (raw file):

		return ``, ``, cleanup, err
	}
	defer db.Close()

You're leaking the db object across the entire function. Was that intentional?


pkg/cli/demo.go, line 230 at r2 (raw file):

		license, err := getLicense()
		if err == nil {
			if _, err := db.Exec(`SET CLUSTER SETTING cluster.organization = ` + lex.EscapeSQLString(demoOrg)); err != nil {

IIRC SET CLUSTER SETTING accepts placeholders, so you can pass a proper SQL argument here

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation decision was based off of some offline discussions with David Taylor, Jordan and Roland -- the end goal was to provide an API endpoint specifically for the demo, and then record some telemetry using the licenses given out there.

This issue with a license string that is just for the demo is that what is stopping users from just taking that license and using it on real clusters?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @jordanlewis, and @knz)


pkg/cli/demo.go, line 236 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Probably would be a good idea to log here.

Done.


pkg/cli/demo.go, line 224 at r2 (raw file):

Previously, knz (kena) wrote…

You're leaking the db object across the entire function. Was that intentional?

Yes -- I don't see a reason in making another db object for just setting the cluster. Future changes I'm making will also be using this db object, and I don't want to keep creating db connections for each of the settings that might need the db.


pkg/cli/demo.go, line 230 at r2 (raw file):

Previously, knz (kena) wrote…

IIRC SET CLUSTER SETTING accepts placeholders, so you can pass a proper SQL argument here

Ok, will fix.

@knz
Copy link
Contributor

knz commented Aug 28, 2019

This implementation decision was based off of some offline discussions with David Taylor, Jordan and Roland -- the end goal was to provide an API endpoint specifically for the demo, and then record some telemetry using the licenses given out there.

  1. this discussion + the resulting rationale must be recorded in writing, preferably in the commit message, and ultimately in docs

  2. the rules still stand - inform the user + give an opt-out

This issue with a license string that is just for the demo is that what is stopping users from just taking that license and using it on real clusters?

There's nothing that stops the user from querying the same URL manually and get a string they can reuse in their real cluster. There is no real security here!!

Look the security of the license key is a complete red herring.

What you want (= what seems to motivate this work) is telemetry, not key confidentiality.

If that's indeed what you want, you can set up proper telemetry (I believe the test server used by demo disables telemetry by default, but you could plumb a flag that keeps it enabled and even perhaps increase the reporting frequency).

This will in turn will give the user the opportunity to disable said telemetry using the standard cluster setting and/or environment variable.

@knz
Copy link
Contributor

knz commented Aug 28, 2019

the resulting rationale must be recorded in writing, preferably in the commit message, and ultimately in docs

This reminds me, you must expand the release note and explain that this is really recording telemetry and explain (in the release note) how to opt out.

@bdarnell
Copy link
Contributor

There's nothing that stops the user from querying the same URL manually and get a string they can reuse in their real cluster. There is no real security here!!

Exactly. I'd rather not use a license key at all here - just enable enterprise features for demo in the same way we do for unit tests. If we want to get telemetry for demo usage, do that separately instead of as a side effect of a license key.

@knz
Copy link
Contributor

knz commented Aug 28, 2019

I'd rather not use a license key at all here - just enable enterprise features for demo in the same way we do for unit tests. If we want to get telemetry for demo usage, do that separately instead of as a side effect of a license key.

We've been discussing this in the original issue at #40222 and @dt convinced me that 1) it makes the code simpler and 2) it's interesting to use a license key for other reasons (simplicity of managing what gets enabled etc).

If you wish we can resume/continue the conversation about design and motivation on the original issue #40222 instead of this PR.

Assuming this PR continues in the same direction, here are the remaining points for the work:

  • activate telemetry in demo like in start (currently disabled - that was an oversight)

  • gate the auto-license code behind the same check as used to enable telemetry

  • explain in the --help text for demo that this is happening and suggest/remind the user how to opt out

  • document in the commit message what is going on:

    • that there is a wider project to do self-serve licenses in a different, more general way in the future, and therefore we are not aiming for a future-proof design here
    • that we're not concerned about offline usage because it's still possible to request a trial license separately and enable it manually via SET
    • the release note should explain that demo now enables telemetry
    • the release note should explain that demo now (attempts to) contacts a license server
    • the release note should remind the reader how to opt out (with perhaps a link to the relevant doc page which we have already)

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We enable enterprise features in unit tests without a license key - how does that work? It still seems like it ought to be simpler to do something similar here but maybe I'm just not familiar enough with how this part works.

If it's not feasible to make this work like unit tests, I'm OK with it as long as it's gated and documented similarly to other telemetry usage.

pkg/cli/demo.go Outdated
return ``, ``, cleanup, err
}
} else {
log.Warningf(ctx, "error when attempting to acquire demo license: %+v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a message like "enterprise features are not enabled in this session".

pkg/cli/demo.go Outdated
@@ -95,6 +103,25 @@ func init() {
}
}

func getLicense() (string, error) {
client := &http.Client{
Timeout: time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely have a fairly short timeout (or be asynchronous?) so that you don't get blocked if you're trying to use it offline, but I think 1 second is too short (consider the number of round trips in the TCP and TLS handshakes coming from Australia. This consideration would of course change if we made our reg service global :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd push against it being asynchronous -- I think it is better for the cluster to a little slow on starting up than a few partition commands failing at the start while the license is being requested.

Copy link
Member

@dt dt Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to wait much more than 1, maybe 2s if I'm offline though.

I think we could actually get away with async as long as the window where behavior can change is <5s, just since the likelihood someone actually managing to enter and try to run an enterprise command that soon, while non-zero, is small enough that the expected value of confusion caused would be small too.

I wouldn't try to do anything fancier though -- 20.1 should have a more full-featured solution for this and stuff beyond demo so whatever is simplest MVP should be our goal for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I updated the code to perform the request asynchronously.

@dt
Copy link
Member

dt commented Aug 28, 2019

I objected to using the same boolean that unit tests use for uniformity/maintainability.

Unit tests typically test things in relative isolation -- testing just one feature -- so the fact that the special-case isn't the real thing is usually OK -- for the purpose of that test is is good enough. In demo we're expecting a real user, who can go around and interact with ~anything, and my worry is that if we mock out the license incompletely, they might stumble on something we missed e.g. we might add a piece of code in the admin UI that, if enterprise features are detected as on, assumes it can access license.type, but then NPEs in demo where it was never tested.

That is why I'd prefer that we just use a real, normal license installed in the normal place, instead of a special-purpose bool. How we get that license I think it separable question that has significant discussion over on #40222

@rohany
Copy link
Contributor Author

rohany commented Aug 28, 2019

Thank you for the reviews and all the discussion. I've updated the PR and commit message based on all the comments. PTAL.

Regarding telemetry, we check it using the 'COCKROACH_SKIP_UPDATE_CHECK' environement variable, but our docs mention something else -- https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html. What is the right thing?

@rohany rohany changed the title demo: Try to get a temporary license upon demo startup demo: cockroch demo attempts to obtain a temporary license upon startup Aug 28, 2019
@rohany rohany changed the title demo: cockroch demo attempts to obtain a temporary license upon startup demo: cockroch demo attempts to obtain a temporary license upon startup and enables telemetry Aug 28, 2019
@rohany rohany force-pushed the demo-license branch 2 times, most recently from ce612c1 to 1291c5f Compare August 28, 2019 23:25
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks for all the changes.

pkg/cli/demo.go Outdated Show resolved Hide resolved
pkg/cli/demo.go Outdated Show resolved Hide resolved
pkg/cli/demo.go Outdated Show resolved Hide resolved
pkg/cli/demo.go Outdated Show resolved Hide resolved
pkg/cli/demo.go Outdated Show resolved Hide resolved
pkg/cli/demo.go Outdated Show resolved Hide resolved
pkg/cli/demo.go Show resolved Hide resolved
@knz
Copy link
Contributor

knz commented Aug 29, 2019

Please also add at least one or two tests to confirm that the telemetry and licensing are effectively disabled under the advertised conditions.

@rohany
Copy link
Contributor Author

rohany commented Aug 29, 2019

A mail discussion reminded me to ask you to check whether the cockroach binary still builds with "make buildoss" I.e. with CCL code left out (I don't recall if the cluster setting can be set without ccl code). (I'm away from the computer so I'll complete the review tomorrow)

Yeah, I ran into some problems with that and testing, so I moved around some code to separate this license acquisition into the cliccl package.

@rohany
Copy link
Contributor Author

rohany commented Aug 29, 2019

Actually, I'm running into some problems with writing some tests for this. Is it possible to have the CLI tests run using the CCL binary instead of the OSS binary?

@bdarnell
Copy link
Contributor

The simplest way to test this may be to use the TCL interactive_test framework. This uses a CCL binary: https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/interactive_tests/test_demo.tcl

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, and @rohany)


pkg/ccl/cliccl/demo.go, line 33 at r5 (raw file):

	}
	// Send some extra information to the endpoint.
	params := fmt.Sprintf("?type=demo&version=%s&clusterid=%s", build.VersionPrefix(), clusterID.String())

You'll need a healthy dose of escaping on some of these strings.


pkg/ccl/cliccl/demo.go, line 51 at r5 (raw file):

func getAndApplyLicense(db *gosql.DB, clusterID uuid.UUID, org string) (bool, error) {
	license, err := getLicense(clusterID)
	if err != nil {
  1. I may recommend to print out the error instead of forgetting it entirely.

  2. just FYI (no change needed here) it is generally beneficial to further narrow down the case of ignored error to the things that are exactly symptomatic of a failed license request. A connection error in the http client would be an ignorable error, however if the connection succeeds but the server returns a non-OK http error code perhaps you want a more explicit error. But then if the HTTP code was OK but you get a short read (again suggesting a network problem) that can also be ignored. All this is subtle stuff, but generally in the long term it pays off to not fully drop errors on the floor - some are really indicative of a programming bug in CockroachDB (or one of the license servers).


pkg/cli/demo.go, line 49 at r4 (raw file):

Previously, knz (kena) wrote…

attempts to connect to a Cockroach Labs server to obtain a temporary enterprise license

(otherwise it's not clear there's a remote roundtrip available.)

Is it possible that there are some trailing spaces remaining in this string? (Not sure, the reviewable output confuses me)


pkg/cli/demo.go, line 55 at r5 (raw file):

temporary enterprise license for demoing enterprise features and enable 
telemetry back to Cockroach Labs. In order to disable this behavior, set the 
environment variable "COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING".

@dt was that the env var we're settling on for this release?


pkg/cli/demo.go, line 227 at r5 (raw file):

				success, err := GetAndApplyLicense(db, s.ClusterID(), demoOrg)
				// TODO (rohany): How to report this error and exit when license
				//  acquisition is performed asynchronously?

If this wasn't meant to be asynchronous, you'd use an error channel.

However for here it's a bit more subtle. I would recommend this:

  1. edit cli.go: Main() to extract the error handling under the call to Run() up to and including os.Exit() into a separate exitWithError(...) function. Call it there.
  2. call that function here too.

pkg/cli/demo.go, line 232 at r5 (raw file):

				}
				if !success {
					msg := "Unable to acquire demo license. Enterprise features are not enabled in this session.\n"

const msg =


pkg/cli/interactive_tests/test_demo_telemetry.tcl, line 15 at r5 (raw file):

send "alter table vehicles partition by list (city) (partition p1 values in ('nyc'));\n"
# expect that it failed, as no license was requested.
eexpect "use of partitions requires an enterprise license"

add:

interrupt
eexpect eof

(to clean up the process)

then end_test at the end

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, and @rohany)


pkg/ccl/cliccl/demo.go, line 21 at r5 (raw file):

	"github.com/cockroachdb/cockroach/pkg/cli"
	"github.com/cockroachdb/cockroach/pkg/util/uuid"
	"github.com/pkg/errors"

github.com/cockroachdb/errors

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, @knz, and @rohany)


pkg/ccl/cliccl/demo.go, line 33 at r5 (raw file):

Previously, knz (kena) wrote…

You'll need a healthy dose of escaping on some of these strings.

oh, yeah, sorry, my comment was pseudo-code: to do this for real we should use the net/url.Values{}, .Set each of them and then .Encode() for the actual string instead of doing any sprintf'ing/escaping ourselves.

There is a wider discussion/design going on to serve licenses in
a more general way in the future, so this commit is not aiming
for a future-proof design, but instead an MVP to allow users to
demo enterprise features within `cockroach demo`.

We are not concerned about offline usage of enterprise features
as users can obtain a license and enable features manually using SET.

Fixes cockroachdb#40222.

Release note (cli change): cockroach demo attempts to contact a license
server to obtain a temporary license. cockroach demo now enables
telemetry for the demo cluster. This feature can be opted out of by
setting the `COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING` environment variable
(https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html).
@rohany
Copy link
Contributor Author

rohany commented Aug 30, 2019

RFAL!

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but please also get a greenlight from david

Reviewed 4 of 4 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, and @rohany)

@rohany rohany requested a review from dt August 30, 2019 19:42
@rohany rohany changed the title demo: cockroch demo attempts to obtain a temporary license upon startup and enables telemetry demo: cockroach demo attempts to obtain a temporary license upon startup and enables telemetry Aug 30, 2019
@rohany
Copy link
Contributor Author

rohany commented Aug 30, 2019

Thanks for all the reviews!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 30, 2019

Build failed

@rohany
Copy link
Contributor Author

rohany commented Aug 30, 2019

bors r+

craig bot pushed a commit that referenced this pull request Aug 30, 2019
40273: demo: cockroach demo attempts to obtain a temporary license upon startup and enables telemetry r=rohany a=rohany

There is a wider discussion/design going on to serve licenses in
a more general way in the future, so this commit is not aiming
for a future-proof design, but instead an MVP to allow users to
demo enterprise features within `cockroach demo`.

We are not concerned about offline usage of enterprise features
as users can obtain a license and enable features manually using SET.

Fixes #40222.

Release note (cli change): cockroach demo attempts to contact a license
server to obtain a temporary license. cockroach demo now enables
telemetry for the demo cluster. This feature can be opted out of by
setting the `COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING` environment variable
(https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html).

Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@craig
Copy link
Contributor

craig bot commented Aug 30, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

demo: add enterprise license that automatically applies to cockroach demo
6 participants