sqlproxyccl: the reference script `run_proxy.sh` is outdated #71385

knz · 2021-10-11T03:34:42Z

I was trying to test things out with #71248 and I ran into the following issues:

mt start-sql now needs a separate --store flag from the server (since it uses disk storage now)
proxy does not support --target-addr any more
--logtostderr is not the correct log flag any more
the proxy attempts to start a listener on port 8080, and this fails because there's already a server on that port, and I could not find a way to configure it

It would also be useful if the script was updated to use separate certs directories to reflect realistic deployment usage.

The text was updated successfully, but these errors were encountered:

jaylim-crl · 2021-10-11T03:42:57Z

mt start-sql now needs a separate --store flag from the server (since it uses disk storage now)

Is it necessary to have a separate --store flag? When we start the single KV node, we don't pass in any custom store as well, so we'll default to disk storage.

Ack on the rest.

cc: @darinpp re: proxy does not support --target-addr any more

knz · 2021-10-11T03:43:47Z

Is it necessary to have a separate --store flag?

Yes because both processes are started from the same current directory, so they both try to share the same cockroach-data sub-directory which is invalid.

So we need either separate --store or separate current directory.

jaylim-crl · 2021-10-11T03:44:16Z

Good point. Also, see commit message here, which might be helpful: 6d024ae

knz · 2021-10-11T03:49:00Z

Thanks for this link, but i don't know how to use the test directory with the tenant 123 that's defined in the run_proxy script.

knz · 2021-10-11T03:50:12Z

Ok I think the directory auto-creates tenants.
However, I am seeing the following error in the SQL proxy logs:

I211011 03:49:32.017739 122 ccl/sqlproxyccl/tenant/directory.go:263  [client=127.0.0.1:31797,cluster=dim-dog,tenant=123] 104  error initializing tenant {123}: rpc error: code = Unavailable desc = connection closed

jaylim-crl · 2021-10-11T04:30:41Z

OK - here're some steps without the tenant directory; I'm not too familiar with the test directory. This only works with one tenant. I'll look into a script that reflects a realistic deployment usage sometime this week:

COCKROACH=${1:-'./cockroach-darwin'}

# Create certificates.
export CERTSDIR=<update>
export COCKROACH_CA_KEY=$CERTSDIR/ca.key
$COCKROACH cert create-ca --certs-dir=$CERTSDIR
$COCKROACH cert create-client root --certs-dir=$CERTSDIR
$COCKROACH cert create-node 127.0.0.1 localhost --certs-dir=$CERTSDIR
$COCKROACH mt cert create-tenant-client 123 --certs-dir=$CERTSDIR

#  Start KV layer. (:26257)
$COCKROACH start-single-node \
    --host 127.0.0.1 \
    --log="{sinks: {stderr: {filter: info}}}" \
    --certs-dir=$CERTSDIR

# Create tenant
$COCKROACH sql \
    --host 127.0.0.1 \
    -e 'select crdb_internal.create_tenant(123);' \
    --certs-dir=$CERTSDIR

# Spawn tenant SQL server (:46257) pointing at KV layer (:26257)
COCKROACH_TRUST_CLIENT_PROVIDED_SQL_REMOTE_ADDR=true $COCKROACH mt start-sql \
    --tenant-id 123 \
    --kv-addrs 127.0.0.1:26257 \
    --sql-addr 127.0.0.1:46257 \
    --log="{sinks: {stderr: {filter: info}}}" \
    --store=type=mem,size=640MiB \
    --certs-dir=$CERTSDIR

# Create user on tenant (proxy does not forward client certs)
$COCKROACH sql \
    --port 46257 \
    -e "create user bob with password 'builder';" \
    --certs-dir=$CERTSDIR

# Start the proxy, forwarding all requests to the SQL tenant.
$COCKROACH mt start-proxy \
    --routing-rule="127.0.0.1:46257" \
    --listen-metrics=:8081 \
    --log="{sinks: {stderr: {filter: info}}}" \
    --listen-addr 127.0.0.1:56257 \
    --listen-cert $CERTSDIR/node.crt \
    --listen-key $CERTSDIR/node.key \
    --skip-verify

# Connect to the tenant.
$COCKROACH sql \
    --url "postgresql://bob:builder@127.0.0.1:56257/tenant-cluster-2.defaultdb/?sslmode=require" \
    --certs-dir=$CERTSDIR

knz · 2021-10-11T04:43:09Z

we need to split the certs directory into 3, to

highlight that node.crt is only available on KV nodes
ensure that mt start-sql can start without node certs
ensure that sql proxy can start with separate server cert than node.crt

knz · 2021-10-11T04:43:41Z

this will need an acceptance test, presumably via a TCL test

jaylim-crl · 2021-10-11T04:45:43Z

It looks like the test directory won't work with certs:

cockroach/pkg/ccl/sqlproxyccl/tenantdirsvr/test_directory_svr.go

Lines 399 to 406 in fffd927

    
           process := &Process{SQL: sql.Addr(), FakeLoad: 0.01} 
        
           args := []string{ 
        
           	"mt", "start-sql", "--kv-addrs=127.0.0.1:26257", "--idle-exit-after=30s", 
        
           	fmt.Sprintf("--sql-addr=%s", sql.Addr().String()), 
        
           	fmt.Sprintf("--http-addr=%s", http.Addr().String()), 
        
           	fmt.Sprintf("--tenant-id=%d", tenantID), 
        
           	"--insecure", 
        
           }

That will need to be updated as well if we wanted something for a realistic deployment.

The pkg/ccl/sqlproxyccl/run_proxy.sh sets a multi-instance environment with a host server, proxy server and a test directory server that can start tenant processes on demand. Some recent changes prevent the script from running due to the issues described in cockroachdb#71385. This PR does the following: - changes the script so it uses secure connections between all processes - uses separate CAs when possible - adds a cert directory option to the test directory server so tenant processes can be started with the correct set of certs - adds a KV addrs flag to the test directory so the tenant processes can target the correct host server - changes the capturing of the tenant processes output to continiously flow into test directory server log. Previously that was happening when the tenant process terminates. - adds a store argument to the tenant process - removes the --logtostderr proxy option - changes the proxy http listen port to not interfere with default cockroach db port - minor fix for the "tenant client cert not found" message to show the directory that is being searched for the cert Fixes cockroachdb#71385 Release note: None

The pkg/ccl/sqlproxyccl/run_proxy_(in)direct_crdb.sh sets a multi-instance environment with a host server, proxy server and a test directory server that can start tenant processes on demand. This PR does the following: - changes the script so it uses secure connections between all processes - uses separate CAs when possible - adds a cert directory option to the test directory server so tenant processes can be started with the correct set of certs - adds a KV addrs flag to the test directory so the tenant processes can target the correct host server - changes the capturing of the tenant processes output to continiously flow into test directory server log. Previously that was happening when the tenant process terminates. - adds a store argument to the tenant process - removes the --logtostderr proxy option - changes the proxy http listen port to not interfere with default cockroach db port - minor fix for the "tenant client cert not found" message to show the directory that is being searched for the cert - separates the run_proxy.sh script into two script - run_proxy_direct_crdb and run_proxy_indirect_crdb. The first one relies on the test directory server to launch tenant sql processes by using the cockroach executable directly. The second does that by executing a shell script run_tenant. Using a shell script allows a better customization of the tenant startup, just in time cert generation and allows multiple tenants (vs just one tenant in the first case). - adds a trap to the shell scripts to shutdown all processes started by the script - the scripts now wait for a key press and then do the shutdown and the cleanup once a key is pressed. Fixes cockroachdb#71385 Release note: None

71760: sqlproxyccl: fix the run proxy sample script r=darinpp a=darinpp The pkg/ccl/sqlproxyccl/run_proxy_(in)direct_crdb.sh sets a multi-instance environment with a host server, proxy server and a test directory server that can start tenant processes on demand. This PR does the following: - changes the script so it uses secure connections between all processes - uses separate CAs when possible - adds a cert directory option to the test directory server so tenant processes can be started with the correct set of certs - adds a KV addrs flag to the test directory so the tenant processes can target the correct host server - changes the capturing of the tenant processes output to continiously flow into test directory server log. Previously that was happening when the tenant process terminates. - adds a store argument to the tenant process - removes the --logtostderr proxy option - changes the proxy http listen port to not interfere with default cockroach db port - minor fix for the "tenant client cert not found" message to show the directory that is being searched for the cert - separates the run_proxy.sh script into two script - run_proxy_direct_crdb and run_proxy_indirect_crdb. The first one relies on the test directory server to launch tenant sql processes by using the cockroach executable directly. The second does that by executing a shell script run_tenant. Using a shell script allows a better customization of the tenant startup, just in time cert generation and allows multiple tenants (vs just one tenant in the first case). - adds a trap to the shell scripts to shutdown all processes started by the script - the scripts will now wait for a key press and then do the shutdown and the cleanup once a key is pressed. Fixes #71385 Release note: None ``` $ ./run_proxy_indirect_crdb.sh Create node CA and node cert Create client CA Create tenant CA Create client tenant root cert Start KV layer Create client host root cert Start test directory server Start the sql proxy server (with self signed client facing cert) All files are in /tmp/tenant-test-Wgl3 To connect to a specific tenant (123 for example): cockroach sql --url="postgresql://root:secret@127.0.0.1:55506?sslmode=require&sslrootcert=a&options=--cluster%3Dtenant-cluster-123" Press any key to shutdown all processes and cleanup. initiating graceful shutdown of server server drained and shutdown completed server shutdown completed cleaning up... ``` ``` cockroach sql --url="postgresql://root:secret@127.0.0.1:55506?sslmode=require&sslrootcert=a&options=--cluster%3Dtenant-cluster-123" -e 'select 1' ?column? ------------ 1 (1 row) Time: 1ms ``` 73398: tracing: optimize span.Finish r=andreimatei a=andreimatei Don't read the clock unnecessarily and also use the more efficient timeutil.Since(t) when reading it is necessary. This clock reading was showing up in profiles when running with tracing enabled, and is confirmed by a micro-benchmark: Tracer_StartSpanCtx/opts=real-32 696ns ± 1% 593ns ± 1% -14.74% (p=0.008 n=5+5) Release note: None 73443: stmtdiagnostics: version-gate recent conditional diagnostics r=yuzefovich a=yuzefovich We recently merged a change to support conditional statement diagnostics which required a migration to the corresponding system table to include a couple of new columns. However, we forgot to add a version gate for the queries reading from that table, and as a result in a mixed 21.2-22.1 cluster, before the corresponding migration has completed, the stmt diagnostics became broken. This is now fixed. I manually tested that the fix does work and filed an issue to add the corresponding operations to the mixed version roachtest. Release note: None (no stable release with this bug) 73474: cli: fix Tracer init for multi-tenant r=andreimatei a=andreimatei Before this patch, a sql pod process ended up using two Tracers: one created by the Stopper, and another one created by the serverCfg. These two Tracers were sometimes both used within a single trace, which is illegal - the multi-tenant process would thus crash if tracing was enabled. This patch fixes the issue by overriding the Stopper's tracer early in the sql pod's initialization. This is also how the setup of a regular server works. It's messy that we have two Tracers even temporarily, but refactoring the initialization sufficiently to avoid that seems hard. I'll probably do something better in the future cause it's itching me, though. Release note: None 73476: kvclient: fix span capture in heartbeat interceptor r=andreimatei a=andreimatei The txnHeartbeater was capturing the tracing span of the first writing request in a txn, holding on to it for a second, and then using it to fork a child span. Since the parent span was long closed by then, creating the child was a use-after-Finish. This has been tolerated, as all use-after-Finishes have been tolerate, but I'm cracking down on them because of a tracing agenda. Even if this kind of fork from finishes span were allowed, it still probably wouldn't be a good idea to hold on to the parent's memory for so long for very little benefit (the benefits of the parent-child relationship after a fork are kinda theoretical, particularly if the parent has finished by the time the child starts). So, this patch makes the heartbeat loop's span be a root. Release note: None 73528: kv: don't set CanForwardReadTimestamp on implicit->explicit EndTxn transition r=andreimatei a=nvanbenschoten This commit removes logic that was setting the `CanForwardReadTimestamp` flag on the batch header of the EndTxn request in `makeTxnCommitExplicitLocked` to whatever the flag was set to in the batch that implicitly committed the txn. This did not make sense, as we don't expect an implicit->explicit EndTxn to forward the commit timestamp. In fact, such behavior would be a major problem, as the transaction commit timestamp should not change during this transition. This was not being set for a great reason. It was originally added in cd37f94 when we fixed a bug that was allowing these requests to avoid checking for retryable errors. I checked out that commit and determined that it seems to have been necessary to avoid hitting retryable errors on the implicit->explicit transition in the case when the initial staging record had avoided a txn retry error on [this code path](https://github.com/cockroachdb/cockroach/blob/cd37f94f097abb605907ff4b41b29f2e8e361f01/pkg/storage/batcheval/cmd_end_transaction.go#L407). The use of the flag was necessary because on that path, we weren't telling the client about the refresh. We were just accepting its Txn with a split read timestamp and write timestamp but not returning an updated Transaction proto. This was later rationalized in 901d87f. In that later commit, we consolidated server-side refreshes and ensured that the `BatchResponse.Txn` of a successful `EndTxn` reflected the post-refresh state. So setting the flag in `makeTxnCommitExplicitLocked` has not been necessary since, because this post-refresh proto is what we attach to the implicit->explicit EndTxn request. Co-authored-by: Darin Peshev <darinp@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

knz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 11, 2021

knz assigned jaylim-crl Oct 11, 2021

knz added the A-multitenancy Related to multi-tenancy label Oct 11, 2021

andy-kimball assigned darinpp and unassigned jaylim-crl Oct 12, 2021

darinpp mentioned this issue Oct 20, 2021

sqlproxyccl: fix the run proxy sample script #71760

Merged

craig bot closed this as completed in 5ff7660 Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlproxyccl: the reference script `run_proxy.sh` is outdated #71385

sqlproxyccl: the reference script `run_proxy.sh` is outdated #71385

knz commented Oct 11, 2021 •

edited

Loading

jaylim-crl commented Oct 11, 2021

knz commented Oct 11, 2021 •

edited

Loading

jaylim-crl commented Oct 11, 2021

knz commented Oct 11, 2021

knz commented Oct 11, 2021

jaylim-crl commented Oct 11, 2021

knz commented Oct 11, 2021

knz commented Oct 11, 2021

jaylim-crl commented Oct 11, 2021

sqlproxyccl: the reference script run_proxy.sh is outdated #71385

sqlproxyccl: the reference script run_proxy.sh is outdated #71385

Comments

knz commented Oct 11, 2021 • edited Loading

jaylim-crl commented Oct 11, 2021

knz commented Oct 11, 2021 • edited Loading

jaylim-crl commented Oct 11, 2021

knz commented Oct 11, 2021

knz commented Oct 11, 2021

jaylim-crl commented Oct 11, 2021

knz commented Oct 11, 2021

knz commented Oct 11, 2021

jaylim-crl commented Oct 11, 2021

sqlproxyccl: the reference script `run_proxy.sh` is outdated #71385

sqlproxyccl: the reference script `run_proxy.sh` is outdated #71385

knz commented Oct 11, 2021 •

edited

Loading

knz commented Oct 11, 2021 •

edited

Loading