Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlproxyccl: the reference script run_proxy.sh is outdated #71385

Closed
knz opened this issue Oct 11, 2021 · 9 comments · Fixed by #71760
Closed

sqlproxyccl: the reference script run_proxy.sh is outdated #71385

knz opened this issue Oct 11, 2021 · 9 comments · Fixed by #71760
Assignees
Labels
A-multitenancy Related to multi-tenancy C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@knz
Copy link
Contributor

knz commented Oct 11, 2021

I was trying to test things out with #71248 and I ran into the following issues:

  • mt start-sql now needs a separate --store flag from the server (since it uses disk storage now)
  • proxy does not support --target-addr any more
  • --logtostderr is not the correct log flag any more
  • the proxy attempts to start a listener on port 8080, and this fails because there's already a server on that port, and I could not find a way to configure it

It would also be useful if the script was updated to use separate certs directories to reflect realistic deployment usage.

@knz knz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 11, 2021
@jaylim-crl
Copy link
Collaborator

mt start-sql now needs a separate --store flag from the server (since it uses disk storage now)

Is it necessary to have a separate --store flag? When we start the single KV node, we don't pass in any custom store as well, so we'll default to disk storage.

Ack on the rest.

cc: @darinpp re: proxy does not support --target-addr any more

@knz
Copy link
Contributor Author

knz commented Oct 11, 2021

Is it necessary to have a separate --store flag?

Yes because both processes are started from the same current directory, so they both try to share the same cockroach-data sub-directory which is invalid.

So we need either separate --store or separate current directory.

@jaylim-crl
Copy link
Collaborator

Good point. Also, see commit message here, which might be helpful: 6d024ae

@knz
Copy link
Contributor Author

knz commented Oct 11, 2021

Thanks for this link, but i don't know how to use the test directory with the tenant 123 that's defined in the run_proxy script.

@knz
Copy link
Contributor Author

knz commented Oct 11, 2021

Ok I think the directory auto-creates tenants.
However, I am seeing the following error in the SQL proxy logs:

I211011 03:49:32.017739 122 ccl/sqlproxyccl/tenant/directory.go:263  [client=127.0.0.1:31797,cluster=dim-dog,tenant=123] 104  error initializing tenant {123}: rpc error: code = Unavailable desc = connection closed

@knz knz added the A-multitenancy Related to multi-tenancy label Oct 11, 2021
@jaylim-crl
Copy link
Collaborator

OK - here're some steps without the tenant directory; I'm not too familiar with the test directory. This only works with one tenant. I'll look into a script that reflects a realistic deployment usage sometime this week:

COCKROACH=${1:-'./cockroach-darwin'}

# Create certificates.
export CERTSDIR=<update>
export COCKROACH_CA_KEY=$CERTSDIR/ca.key
$COCKROACH cert create-ca --certs-dir=$CERTSDIR
$COCKROACH cert create-client root --certs-dir=$CERTSDIR
$COCKROACH cert create-node 127.0.0.1 localhost --certs-dir=$CERTSDIR
$COCKROACH mt cert create-tenant-client 123 --certs-dir=$CERTSDIR

#  Start KV layer. (:26257)
$COCKROACH start-single-node \
    --host 127.0.0.1 \
    --log="{sinks: {stderr: {filter: info}}}" \
    --certs-dir=$CERTSDIR

# Create tenant
$COCKROACH sql \
    --host 127.0.0.1 \
    -e 'select crdb_internal.create_tenant(123);' \
    --certs-dir=$CERTSDIR

# Spawn tenant SQL server (:46257) pointing at KV layer (:26257)
COCKROACH_TRUST_CLIENT_PROVIDED_SQL_REMOTE_ADDR=true $COCKROACH mt start-sql \
    --tenant-id 123 \
    --kv-addrs 127.0.0.1:26257 \
    --sql-addr 127.0.0.1:46257 \
    --log="{sinks: {stderr: {filter: info}}}" \
    --store=type=mem,size=640MiB \
    --certs-dir=$CERTSDIR

# Create user on tenant (proxy does not forward client certs)
$COCKROACH sql \
    --port 46257 \
    -e "create user bob with password 'builder';" \
    --certs-dir=$CERTSDIR

# Start the proxy, forwarding all requests to the SQL tenant.
$COCKROACH mt start-proxy \
    --routing-rule="127.0.0.1:46257" \
    --listen-metrics=:8081 \
    --log="{sinks: {stderr: {filter: info}}}" \
    --listen-addr 127.0.0.1:56257 \
    --listen-cert $CERTSDIR/node.crt \
    --listen-key $CERTSDIR/node.key \
    --skip-verify

# Connect to the tenant.
$COCKROACH sql \
    --url "postgresql://bob:builder@127.0.0.1:56257/tenant-cluster-2.defaultdb/?sslmode=require" \
    --certs-dir=$CERTSDIR

@knz
Copy link
Contributor Author

knz commented Oct 11, 2021

we need to split the certs directory into 3, to

  1. highlight that node.crt is only available on KV nodes
  2. ensure that mt start-sql can start without node certs
  3. ensure that sql proxy can start with separate server cert than node.crt

@knz
Copy link
Contributor Author

knz commented Oct 11, 2021

this will need an acceptance test, presumably via a TCL test

@jaylim-crl
Copy link
Collaborator

It looks like the test directory won't work with certs:

process := &Process{SQL: sql.Addr(), FakeLoad: 0.01}
args := []string{
"mt", "start-sql", "--kv-addrs=127.0.0.1:26257", "--idle-exit-after=30s",
fmt.Sprintf("--sql-addr=%s", sql.Addr().String()),
fmt.Sprintf("--http-addr=%s", http.Addr().String()),
fmt.Sprintf("--tenant-id=%d", tenantID),
"--insecure",
}

That will need to be updated as well if we wanted something for a realistic deployment.

@andy-kimball andy-kimball assigned darinpp and unassigned jaylim-crl Oct 12, 2021
darinpp added a commit to darinpp/cockroach that referenced this issue Oct 20, 2021
The pkg/ccl/sqlproxyccl/run_proxy.sh sets a multi-instance environment
with a host server, proxy server and a test directory server that can
start tenant processes on demand. Some recent changes prevent the script
from running due to the issues described in cockroachdb#71385.
This PR does the following:
- changes the script so it uses secure connections between all processes
- uses separate CAs when possible
- adds a cert directory option to the test directory server so tenant
  processes can be started with the correct set of certs
- adds a KV addrs flag to the test directory so the tenant processes can
  target the correct host server
- changes the capturing of the tenant processes output to continiously
  flow into test directory server log. Previously that was happening
  when the tenant process terminates.
- adds a store argument to the tenant process
- removes the --logtostderr proxy option
- changes the proxy http listen port to not interfere with default
  cockroach db port
- minor fix for the "tenant client cert not found" message to show the
  directory that is being searched for the cert

Fixes cockroachdb#71385

Release note: None
darinpp added a commit to darinpp/cockroach that referenced this issue Oct 21, 2021
The pkg/ccl/sqlproxyccl/run_proxy.sh sets a multi-instance environment
with a host server, proxy server and a test directory server that can
start tenant processes on demand. Some recent changes prevent the script
from running due to the issues described in cockroachdb#71385.
This PR does the following:
- changes the script so it uses secure connections between all processes
- uses separate CAs when possible
- adds a cert directory option to the test directory server so tenant
  processes can be started with the correct set of certs
- adds a KV addrs flag to the test directory so the tenant processes can
  target the correct host server
- changes the capturing of the tenant processes output to continiously
  flow into test directory server log. Previously that was happening
  when the tenant process terminates.
- adds a store argument to the tenant process
- removes the --logtostderr proxy option
- changes the proxy http listen port to not interfere with default
  cockroach db port
- minor fix for the "tenant client cert not found" message to show the
  directory that is being searched for the cert

Fixes cockroachdb#71385

Release note: None
darinpp added a commit to darinpp/cockroach that referenced this issue Dec 7, 2021
The pkg/ccl/sqlproxyccl/run_proxy_(in)direct_crdb.sh sets a
multi-instance environment with a host server, proxy server and a test
directory server that can start tenant processes on demand.

This PR does the following:
- changes the script so it uses secure connections between all processes
- uses separate CAs when possible
- adds a cert directory option to the test directory server so tenant
  processes can be started with the correct set of certs
- adds a KV addrs flag to the test directory so the tenant processes can
  target the correct host server
- changes the capturing of the tenant processes output to continiously
  flow into test directory server log. Previously that was happening
  when the tenant process terminates.
- adds a store argument to the tenant process
- removes the --logtostderr proxy option
- changes the proxy http listen port to not interfere with default
  cockroach db port
- minor fix for the "tenant client cert not found" message to show the
  directory that is being searched for the cert
- separates the run_proxy.sh script into two script -
  run_proxy_direct_crdb and run_proxy_indirect_crdb. The first one relies
  on the test directory server to launch tenant sql processes by
  using the cockroach executable directly. The second does that by
  executing a shell script run_tenant. Using a shell script allows a
  better customization of the tenant startup, just in time cert generation
  and allows multiple tenants (vs just one tenant in the first case).
- adds a trap to the shell scripts to shutdown all processes started by
  the script
- the scripts now wait for a key press and then do the shutdown and the
  cleanup once a key is pressed.
Fixes cockroachdb#71385

Release note: None
craig bot pushed a commit that referenced this issue Dec 7, 2021
71760: sqlproxyccl: fix the run proxy sample script r=darinpp a=darinpp

The pkg/ccl/sqlproxyccl/run_proxy_(in)direct_crdb.sh sets a
multi-instance environment with a host server, proxy server and a test
directory server that can start tenant processes on demand.

This PR does the following:
- changes the script so it uses secure connections between all processes
- uses separate CAs when possible
- adds a cert directory option to the test directory server so tenant
  processes can be started with the correct set of certs
- adds a KV addrs flag to the test directory so the tenant processes can
  target the correct host server
- changes the capturing of the tenant processes output to continiously
  flow into test directory server log. Previously that was happening
  when the tenant process terminates.
- adds a store argument to the tenant process
- removes the --logtostderr proxy option
- changes the proxy http listen port to not interfere with default
  cockroach db port
- minor fix for the "tenant client cert not found" message to show the
  directory that is being searched for the cert
- separates the run_proxy.sh script into two script -
  run_proxy_direct_crdb and run_proxy_indirect_crdb. The first one relies
  on the test directory server to launch tenant sql processes by
  using the cockroach executable directly. The second does that by
  executing a shell script run_tenant. Using a shell script allows a
  better customization of the tenant startup, just in time cert generation
  and allows multiple tenants (vs just one tenant in the first case).
- adds a trap to the shell scripts to shutdown all processes started by
  the script
- the scripts will now wait for a key press and then do the shutdown and the
  cleanup once a key is pressed.

Fixes #71385

Release note: None

```
$ ./run_proxy_indirect_crdb.sh 
Create node CA and node cert
Create client CA
Create tenant CA
Create client tenant root cert
Start KV layer
Create client host root cert
Start test directory server
Start the sql proxy server (with self signed client facing cert)
All files are in /tmp/tenant-test-Wgl3
To connect to a specific tenant (123 for example):
  cockroach sql --url="postgresql://root:secret@127.0.0.1:55506?sslmode=require&sslrootcert=a&options=--cluster%3Dtenant-cluster-123"
Press any key to shutdown all processes and cleanup.

initiating graceful shutdown of server
server drained and shutdown completed
server shutdown completed
cleaning up...
```
```
cockroach sql --url="postgresql://root:secret@127.0.0.1:55506?sslmode=require&sslrootcert=a&options=--cluster%3Dtenant-cluster-123" -e 'select 1'
  ?column?
------------
         1
(1 row)


Time: 1ms

```

73398: tracing: optimize span.Finish r=andreimatei a=andreimatei

Don't read the clock unnecessarily and also use the more efficient
timeutil.Since(t) when reading it is necessary.

This clock reading was showing up in profiles when running with tracing
enabled, and is confirmed by a micro-benchmark:

Tracer_StartSpanCtx/opts=real-32                  696ns ± 1%     593ns ± 1%  -14.74%  (p=0.008 n=5+5)

Release note: None

73443: stmtdiagnostics: version-gate recent conditional diagnostics r=yuzefovich a=yuzefovich

We recently merged a change to support conditional statement diagnostics
which required a migration to the corresponding system table to include
a couple of new columns. However, we forgot to add a version gate for
the queries reading from that table, and as a result in a mixed
21.2-22.1 cluster, before the corresponding migration has completed, the
stmt diagnostics became broken. This is now fixed. I manually tested
that the fix does work and filed an issue to add the corresponding
operations to the mixed version roachtest.

Release note: None (no stable release with this bug)

73474: cli: fix Tracer init for multi-tenant r=andreimatei a=andreimatei

Before this patch, a sql pod process ended up using two Tracers:
one created by the Stopper, and another one created by the serverCfg.
These two Tracers were sometimes both used within a single trace, which
is illegal - the multi-tenant process would thus crash if tracing was
enabled. This patch fixes the issue by overriding the Stopper's tracer
early in the sql pod's initialization. This is also how the setup of a
regular server works.

It's messy that we have two Tracers even temporarily, but refactoring
the initialization sufficiently to avoid that seems hard. I'll probably
do something better in the future cause it's itching me, though.

Release note: None

73476: kvclient: fix span capture in heartbeat interceptor r=andreimatei a=andreimatei

The txnHeartbeater was capturing the tracing span of the first writing
request in a txn, holding on to it for a second, and then using it to
fork a child span. Since the parent span was long closed by then,
creating the child was a use-after-Finish. This has been tolerated, as
all use-after-Finishes have been tolerate, but I'm cracking down on them
because of a tracing agenda. Even if this kind of fork from finishes
span were allowed, it still probably wouldn't be a good idea to hold on
to the parent's memory for so long for very little benefit (the benefits
of the parent-child relationship after a fork are kinda theoretical,
particularly if the parent has finished by the time the child starts).

So, this patch makes the heartbeat loop's span be a root.

Release note: None

73528: kv: don't set CanForwardReadTimestamp on implicit->explicit EndTxn transition r=andreimatei a=nvanbenschoten

This commit removes logic that was setting the `CanForwardReadTimestamp` flag on the batch header of the EndTxn request in `makeTxnCommitExplicitLocked` to whatever the flag was set to in the batch that implicitly committed the txn. This did not make sense, as we don't expect an implicit->explicit EndTxn to forward the commit timestamp. In fact, such behavior would be a major problem, as the transaction commit timestamp should not change during this transition.

This was not being set for a great reason. It was originally added in cd37f94 when we fixed a bug that was allowing these requests to avoid checking for retryable errors. I checked out that commit and determined that it seems to have been necessary to avoid hitting retryable errors on the implicit->explicit transition in the case when the initial staging record had avoided a txn retry error on [this code path](https://github.com/cockroachdb/cockroach/blob/cd37f94f097abb605907ff4b41b29f2e8e361f01/pkg/storage/batcheval/cmd_end_transaction.go#L407). The use of the flag was necessary because on that path, we weren't telling the client about the refresh. We were just accepting its Txn with a split read timestamp and write timestamp but not returning an updated Transaction proto. This was later rationalized in 901d87f. In that later commit, we consolidated server-side refreshes and ensured that the `BatchResponse.Txn` of a successful `EndTxn` reflected the post-refresh state. So setting the flag in `makeTxnCommitExplicitLocked` has not been necessary since, because this post-refresh proto is what we attach to the implicit->explicit EndTxn request.

Co-authored-by: Darin Peshev <darinp@gmail.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in 5ff7660 Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants