Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close the connection after sending tls alerts in the queue #1323

Merged
merged 5 commits into from
Sep 6, 2019

Conversation

avbelov23
Copy link
Contributor

@avbelov23 avbelov23 commented Jul 24, 2019

#1308

  • Close the socket after sending all pending data on TCP_CLOSE_WAIT and internal errors.

  • Close the connection after sending fatal tls alerts and close_notify alert in the queue

  • Not encryption for TTLS_SERVER_FINISHED, because encryption is done in ttls_write_finished().

  • Sending close_notify alert on close_notify alert.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is no locking or other issues like reference counters, but I'm wondering why don't you replace ss_linkerror() by ss_close() in TCP_CLOSE_WAIT in ss_tcp_state_change() - if we receive FIN, we still should send our data.

tempesta_fw/sock.c Show resolved Hide resolved
tempesta_fw/sock.c Show resolved Hide resolved
@avbelov23 avbelov23 force-pushed the avb-1308 branch 3 times, most recently from 9d3944d to ee55cd2 Compare July 30, 2019 16:48
tempesta_fw/sock.c Outdated Show resolved Hide resolved
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed once more and I see no issues with the code. However, I didn't review for locks and reference counting, only TCP stuff. So another review with deep attention to locking and reference counting is required.

Also please make sure that all functional tests pass and there are no deadlocks, stalled sockets and other anomalies in dmesg.

A backport to 0.6 is required.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed once more and deeper. I paid attention to locking and reference counting this time and didn't find any issues there. However, there are other issues. Some of them we're discussed today.

The difference between ss_close() and ss_linkerror() is only in callback called - connection_error or connection_drop. Both the callbacks are the same for client side (but this might change in future). But with the change we break server connection failovering: for ss_tcp_data_ready() change if there is some error and ss_tcp_process_data() returns bad code, then we don't try to reestablish connection to a server. Similarly, for ss_tcp_state_change() change if a server closes connection (e.g. for system restart) we won't reconnect it.

We do send alert with SS_F_CONN_CLOSE: ttls_send_alert() -> ttls_write_record() -> __ttls_send_record() -> ttls_send_cb() -> tfw_tls_send() sets SS_F_CONN_CLOSE for ss_send(). Not all the allerts must lead to connection closing, but according to RFC 5246 7.2.2 transmission of a fatal alert must be followed by connection closing.

The same chapter says

Upon transmission or receipt of a fatal alert message, both parties immediately close the connection.

Chapter 7.2.1 says (see also TLS truncation attack discussion)

each party is required to send a close_notify alert before closing the write side of the connection. The other party MUST respond with a close_notify alert of its own and close down the connection immediately, discarding any pending writes. It is not required for the initiator of the close to wait for the responding close_notify alert before closing the read side of the connection.

This means that we do not complain to the RFC since ttls_handle_alert() doesn't reply with close_notify message. We should set Conn_Stop on receiving such alerts. This must be fixed and a functional test for connection closing, ensuring that we do send close_notify in response to fatal and close_notify alerts, must be added to tempesta-test.

RFC allows us to set Conn_Stop with sending fatal or close_notify alert - there is no sense to read next records if we failed on processing current one. In this case we just send responses for previous records, send the alert and close TCP connection.

Also don't forget to enable tests from tempesta-test/tests_disabled.json marked by #1308.

@avbelov23 avbelov23 force-pushed the avb-1308 branch 2 times, most recently from 5050fa9 to adf359b Compare August 3, 2019 07:06
@krizhanovsky
Copy link
Contributor

Just saw kernel oops on CI http://93.115.28.191:4010/#/builders/5/builds/810/steps/21/logs/stdio:

[  479.832011] ------------[ cut here ]------------
[  479.834442] kernel BUG at /root/tempesta/tempesta/tempesta_fw/http.c:3058!
[  479.837687] invalid opcode: 0000 [#1] SMP PTI
[  479.840053] Modules linked in: tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tempesta_lib(O) sha256_ssse3 sha512_ssse3 sha512_generic ccm fuse ata_generic intel_rapl sb_edac crct10dif_pclmul crc32_pclmul joydev ghash_clmulni_intel ata_piix cirrus ttm libata drm_kms_helper xen_netfront intel_rapl_perf psmouse pcspkr drm scsi_mod i2c_piix4 floppy button ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crc32c_intel aesni_intel evdev xen_blkfront aes_x86_64 crypto_simd cryptd glue_helper serio_raw [last unloaded: tempesta_lib]
[  479.860163] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           O    4.14.0-tempesta-amd64 #1 Debian 4.14.32-tfw6-1
[  479.864413] Hardware name: Xen HVM domU, BIOS 4.8.5 01/11/2019
[  479.867288] task: ffff9799007b9e00 task.stack: ffffb6ff80698000
[  479.870208] RIP: 0010:tfw_http_msg_process_generic+0x968/0xd00 [tempesta_fw]
[  479.873531] RSP: 0018:ffff97990f103970 EFLAGS: 00010216
[  479.876275] RAX: 0000000000000007 RBX: ffff9799019215f0 RCX: 0000000000000010
[  479.879614] RDX: ffff979904029000 RSI: ffff97990f103b50 RDI: ffff979904029600
[  479.882933] RBP: ffff9798b1682020 R08: 0000000000000001 R09: 0000000000000000
[  479.886277] R10: 0000000000000000 R11: ffff9798b16b6140 R12: ffff97990f103b50
[  479.889603] R13: 0000000000000000 R14: 0000000000000000 R15: ffff979904029600
[  479.892912] FS:  0000000000000000(0000) GS:ffff97990f100000(0000) knlGS:0000000000000000
[  479.896496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  479.899417] CR2: 00007ff8c31215d8 CR3: 000000005c20a005 CR4: 00000000001606e0
[  479.902747] Call Trace:
[  479.904583]  <IRQ>
[  479.906282]  ? generic_gcmaes_decrypt+0x5f/0x80 [aesni_intel]
[  479.909139]  ? ttls_decrypt+0x297/0x580 [tempesta_tls]
[  479.911828]  tfw_http_msg_process+0x9e/0xe0 [tempesta_fw]
[  479.914610]  ? pg_skb_alloc+0x253/0x470
[  479.916842]  __gfsm_fsm_exec+0x56/0x90 [tempesta_fw]
[  479.919446]  ? skb_split+0x1ff/0x2e0
[  479.921594]  tfw_gfsm_move+0x132/0x160 [tempesta_fw]
[  479.924197]  tfw_tls_msg_process+0x1de/0x360 [tempesta_fw]
[  479.926930]  __gfsm_fsm_exec+0x56/0x90 [tempesta_fw]
[  479.929535]  tfw_connection_recv+0x4e/0x70 [tempesta_fw]
[  479.932232]  ? tfw_connection_send+0x30/0x30 [tempesta_fw]
[  479.935009]  ss_tcp_process_data+0x1db/0x440 [tempesta_fw]
[  479.937729]  ss_tcp_data_ready+0x43/0x90 [tempesta_fw]
[  479.940413]  tcp_rcv_established+0x4d2/0x570
[  479.942795]  tcp_v4_do_rcv+0x129/0x1d0
[  479.944992]  tcp_v4_rcv+0x947/0xa50
[  479.947092]  ip_local_deliver_finish+0x9a/0x1c0
[  479.949529]  ip_local_deliver+0x6b/0xe0
[  479.951714]  ? tcp_v4_early_demux+0x112/0x150
[  479.954011]  ? ip_rcv_finish+0x17a/0x400
[  479.956197]  ip_rcv+0x289/0x3c0
[  479.958134]  ? inet_del_offload+0x40/0x40
[  479.960302]  __netif_receive_skb_core+0x84f/0xb30
[  479.962713]  ? process_backlog+0xa3/0x160
[  479.964849]  process_backlog+0xa3/0x160
[  479.966961]  net_rx_action+0x28e/0x3f0
[  479.969026]  __do_softirq+0x10f/0x2a8
[  479.971078]  irq_exit+0xae/0xb0
[  479.972953]  xen_evtchn_do_upcall+0x2c/0x40
[  479.975072]  xen_hvm_callback_vector+0x7d/0x90
[  479.977231]  </IRQ>
[  479.978667] RIP: 0010:native_safe_halt+0x2/0x10
[  479.980787] RSP: 0018:ffffb6ff8069beb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
[  479.983908] RAX: ffffffffbc897e60 RBX: ffff9799007b9e00 RCX: 0000000000000000
[  479.986907] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  479.989835] RBP: 0000000000000001 R08: 00000000593296c0 R09: ffff979901a3ef00
[  479.992813] R10: 0000000000000000 R11: 0000013cfadf1cfb R12: ffff9799007b9e00
[  479.995780] R13: ffff9799007b9e00 R14: 0000000000000000 R15: 0000000000000000
[  479.998696]  ? __sched_text_end+0x3/0x3
[  480.000585]  default_idle+0x1a/0xf0
[  480.002339]  do_idle+0x16e/0x1f0
[  480.004029]  cpu_startup_entry+0x6f/0x80
[  480.005937]  start_secondary+0x1a9/0x200
[  480.007837]  secondary_startup_64+0xa5/0xb0
[  480.009798] Code: 00 00 49 8b 95 80 00 00 00 48 8d b5 c8 00 00 00 e8 6e 86 fe ff 84 c0 0f 84 d4 fd ff ff f0 41 80 a5 a9 00 00 00 fd e9 c6 fd ff ff <0f> 0b 83 f8 03 75 32 41 b8 01 00 00 00 b9 01 00 00 00 48 c7 c2 
[  480.016945] RIP: tfw_http_msg_process_generic+0x968/0xd00 [tempesta_fw] RSP: ffff97990f103970
[  480.020320] ---[ end trace 8c443069c4c3736c ]---
[  480.022479] Kernel panic - not syncing: Fatal exception in interrupt
[  480.025191] Kernel Offset: 0x3b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

@vankoven
Copy link
Contributor

vankoven commented Aug 5, 2019

Just saw kernel oops

It seems like it doesn't related to this PR, I've got the same on master branch, #1283 (comment)

Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me passing callback function into ss_close() function looks ugly. This callback is stored inside TfwConnection, but you pass it second time via ss_close(). It becomes very unclear what is the proper way to close the connection: call the hook or pass a function as argument and call it.

If the desire is to control, which callback should be used, at the ss_close() step, may be we should introduce some state of the connection and leave only one connection_error() callback? And this callback would be responsible to check the connection state and chose drop/error routines?

tls/ttls.c Outdated
@@ -1334,7 +1334,21 @@ ttls_send_alert(TlsCtx *tls, unsigned char lvl, unsigned char msg)
io->alert[0] = lvl;
io->alert[1] = msg;

if (msg == TTLS_ALERT_MSG_CLOSE_NOTIFY)
if (msg == TTLS_ALERT_MSG_UNEXPECTED_MESSAGE ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not switch-case statement here?

do { \
ss_do_close(sk); \
bh_unlock_sock(sk); \
SS_CALL_GUARD_EXIT(connection_drop, sk); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to drop call of SS_CALL_GUARD_EXIT?

Copy link
Contributor Author

@avbelov23 avbelov23 Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Did not have time to push fix

@avbelov23 avbelov23 force-pushed the avb-1308 branch 3 times, most recently from 46fb8ab to 3728df5 Compare August 13, 2019 14:26
@avbelov23 avbelov23 force-pushed the avb-1308 branch 4 times, most recently from 30c1546 to a274cd7 Compare August 20, 2019 09:32
static void
tfw_sock_clnt_error(struct sock *sk)
{
tfw_sock_clnt_do_drop(sk, "connection error");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tfw_sock_clnt_do_drop() function was introduced to call the same function on connection_error and connection_drop callbacks, but only one of them exists now and we don't need special tfw_sock_clnt_do_drop() function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the current PR state I can say that this was addressed.

tls/ttls.c Outdated
close = true;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good practice is to provide default branch even there is no actions required.

@@ -1962,6 +1962,7 @@ tfw_http_conn_init(TfwConn *conn)
static int
tfw_http_conn_close(TfwConn *conn, bool sync)
{
SS_CONN_TYPE(conn->sk) |= Conn_Shutdown;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this flag is issued for every tfw_http_conn_close() (tfw_connection_close() () call? The flag is not needed for the client connections, only for server ones to highlight that the server connections to be dropped and re-establishing it is not needed.

We have discussed the flag in private chat and it looked much better than the passing connection_error|connection_drop pointers as arguments into ss_close. But now the flag Conn_Shutdown duplicates the meaning of the TFW_CONN_B_DEL flag. But implementation of that flags is completely different and some collisions are possible.

Copy link
Contributor

@i-rinat i-rinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, I guess? I'm not sure, those are parts I hoped I'd never touch, so I can easily miss obvious issues.

Also, there are some comments I'd like you to look at.

tempesta_fw/sock.c Outdated Show resolved Hide resolved
tempesta_fw/sync_socket.h Outdated Show resolved Hide resolved
tempesta_fw/sync_socket.h Outdated Show resolved Hide resolved
tempesta_fw/sync_socket.h Outdated Show resolved Hide resolved
tempesta_fw/sock.c Outdated Show resolved Hide resolved
tls/ttls.c Outdated Show resolved Hide resolved
tempesta_fw/tls.c Outdated Show resolved Hide resolved
flags |= SS_F_ENCRYPT;
TFW_CONN_TYPE(&conn->cli_conn) |= Conn_Stop;
}
if (ttls_xfrm_ready(tls) && !was_encrypt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both ttls_xfrm_ready() and was_encrypt looks redundant and wrong.

We already know which records we have to encrypt and which not. In the T_FSM_STATE(TTLS_SERVER_FINISHED) we change tls->state before __ttls_send_record() is called, so here tls->state is a bit off. But order can be changed. Here is what I'm talking about:

diff --git a/tls/tls_srv.c b/tls/tls_srv.c
index 58699482..b9e2cb7f 100644
--- a/tls/tls_srv.c
+++ b/tls/tls_srv.c
@@ -2262,19 +2262,20 @@ ttls_handshake_finished(TlsCtx *tls)
 	}
 	T_FSM_STATE(TTLS_SERVER_FINISHED) {
 		if ((r = ttls_write_finished(tls, &sgt, &p)))
 			return r;
 		CHECK_STATE(TLS_HEADER_SIZE + TTLS_HS_FINISHED_BODY_LEN);
+		sg_mark_end(&sgt.sgl[sgt.nents - 1]);
+		r = __ttls_send_record(tls, &sgt, false, false);
 		/*
 		 * In case of session resuming, invert the client and server
 		 * ChangeCipherSpec messages order.
 		 */
 		tls->state = tls->hs->resume
 			     ? TTLS_CLIENT_CHANGE_CIPHER_SPEC
 			     : TTLS_HANDSHAKE_WRAPUP;
-		sg_mark_end(&sgt.sgl[sgt.nents - 1]);
-		return __ttls_send_record(tls, &sgt, false, true);
+		return r;
 	}
 	}
 	T_FSM_FINISH(r, tls->state);
 
 	/* If we exit here, then something went wrong. */
diff --git a/tls/ttls.c b/tls/ttls.c
index 5041ce56..1b0c6e89 100644
--- a/tls/ttls.c
+++ b/tls/ttls.c
@@ -242,11 +242,12 @@ EXPORT_SYMBOL(ttls_register_bio);
  */
 bool
 ttls_xfrm_ready(TlsCtx *tls)
 {
 	return tls->state >= TTLS_CLIENT_FINISHED
-	       && tls->state != TTLS_SERVER_CHANGE_CIPHER_SPEC;
+	       && tls->state != TTLS_SERVER_CHANGE_CIPHER_SPEC
+	       && tls->state != TTLS_SERVER_FINISHED;
 }
 EXPORT_SYMBOL(ttls_xfrm_ready);
 
 #if defined(TTLS_CLI_C)
 static int

I think, it should work. And then was_encrypt can be removed altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe. But anyway, tls->state should be enough to figure out if a record needs to be encrypted. Perhaps we'll need a new predicate for that.

@avbelov23 avbelov23 force-pushed the avb-1308 branch 2 times, most recently from fe28860 to 5582ff7 Compare August 28, 2019 15:43
Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge, but cleanups are required.

tls/ttls.h Outdated Show resolved Hide resolved
tempesta_fw/sock.c Show resolved Hide resolved
tempesta_fw/sock.c Show resolved Hide resolved
tempesta_fw/sock.c Outdated Show resolved Hide resolved
tempesta_fw/sock_srv.c Show resolved Hide resolved
tls/ttls.c Outdated Show resolved Hide resolved
tls/tls_srv.c Outdated Show resolved Hide resolved
tls/ttls.c Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants