Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom SslContext option added to ApnsClientBuilder #1086

Closed
wants to merge 1 commit into from

Conversation

kkolyan
Copy link

@kkolyan kkolyan commented Oct 28, 2024

  1. we use a multiple APNS clients with different credentials
  2. we periodically create new one-shot clients to send a single notifications in order to check credentials.

du to the facts mentioned above, SslContext initialisation takes considerable amount of CPU. The goal of this change is to reduce it by sharing SslContext between clients (we use signingKey to authenticate clients, so seems like it's correct to share SslContext).

@jchambers
Copy link
Owner

Hello, and thank you for the contribution!

I don't think I have a clear understanding of your use case. How many ApnsClient instances are you building that SslContext construction is a significant part of your CPU load?

@kkolyan
Copy link
Author

kkolyan commented Oct 29, 2024

Hi. Thanks for quick reply!
Okay, please let me describe our case in details.

We normally use about 100 ApnsClient instances at the same time (there are 8 signing keys, but each one is used different applications with different topics, so about 100 combinations in total).

The thing we want to do is a periodic verification that all our ApnsClients are configured properly (credentials are up to date, topics are allowed, etc). The reason is because the company is big and misconfiguration (in APNS admin console) happens. The way we go is to send dummy notifications with a short period (1 minute is short enough) to each combination of singing key and topic.

If such misconfiguration happen, we want to know in advance (before main ApnsClient refreshed authentication of its connections), so we want this verification to be authenticated from scratch, without reusing of temporary tokens or any kind of session caching.
The first idea was to assign tokenExpiration with some very little value, but reading docs about it we thought that's bad idea, because:

  1. it doesn't affect TLS auth. we don't use TLS auth either, but there is a risk to miss the fact that our health check is false-positive if we decide to use TLS in future.
  2. docs for setTokenExpiration method explicitly states that tokenExpiration change is not recommended unless specific conditions (not our case). So using tokenExpiration for our goal seems a misuse of Pushy library, and software tends to misbehave when it's misused.

So we're going with approach to create multiple single-shot ApnsClient during each health check act to guarantee that every verification notification is sent with new authentication. And that activity (~100 checks per minute) takes about 0.35 CPU core in average continuously, which is almost the same as each our node consumes to send normal notifications. Profiling shows that 70-90% of time we spend in io.netty.handler.ssl.SslContextBuilder#build. Seems like sharing SslContext will solve it, since we don't use TLS auth (and in this place we probably will not miss it if we start use TLS).

@jchambers
Copy link
Owner

Thanks for the explanation! That does help me understand the problem a little better, but frankly, something's not adding up for me. You wrote:

And that activity (~100 checks per minute) takes about 0.35 CPU core in average continuously, which is almost the same as each our node consumes to send normal notifications.

It sounds like you think this time is dominated by constructed SslContext instances. For the sake of argument, let's assume that all of the CPU time is going to SslContext initialization. If that were true, I think that would mean that we could construct about 300 SslContext instances per minute (or 5/second) while completely saturating a CPU core. That seems… extremely slow to me. I wrote a quick benchmark to check how long it takes to construct an SslContext the same way we'd build one if we were using a signing key:

package com.eatthepath.pushy.apns;

import io.netty.handler.codec.http2.Http2SecurityUtil;
import io.netty.handler.ssl.*;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

import javax.net.ssl.SSLException;

@State(Scope.Thread)
public class SslContextBenchmark {

  @Benchmark
  public SslContext buildSslContext() throws SSLException {
    final SslContextBuilder sslContextBuilder = SslContextBuilder.forClient()
        .sslProvider(SslProvider.OPENSSL)
        .ciphers(Http2SecurityUtil.CIPHERS, SupportedCipherSuiteFilter.INSTANCE);

    return sslContextBuilder.build();
  }
}

…and on my humble laptop, I'm generating upwards of 50,000 SslContext instances per second on a single core:

Benchmark                             Mode  Cnt      Score     Error  Units
SslContextBenchmark.buildSslContext  thrpt   25  52587.568 ± 232.103  ops/s

That's four orders of magnitude off from what we'd expect from your numbers. I don't think SslContext construction is your main problem, and I'd encourage you to do more detailed profiling to identify the problem.

On top of that, I think it's fair to say that your specific use case is uncommon, and reusing SslContext instances may have unexpected consequences for users who are new to the space. With all of those things in mind, I must respectfully decline this contribution. All the same, thank you for your work!

@jchambers jchambers closed this Oct 31, 2024
@kkolyan
Copy link
Author

kkolyan commented Oct 31, 2024

So I see, you have two points:

  1. the connection of my problem with SslContext CPU costs looks doubtful
  2. the case is not common enough to address it in mainstream.

if the point 2 is enough for you to close this discussion, then ok, I have to address it somehow on my side.

But if the power of point 2 is enough only in a shadow of point 1, then please read my arguments.

Cost of SslContext creation depends on specific algorithms it use and its parameters. Probably, I should've said it earlier, but we use use FIPS-ed JVM due to customer security requirements, so the set of cryptographic algorithms and its parameters (key lengths for instance) is different from the ones you use in the mentioned benchmark. And probably, the one we use makes SslContext creation much more slower that default ones you used.

It sounds like you think this time is dominated by constructed SslContext instances.

Yeah, because profiling showed it.

That's the flamegraph:
image

That's CPU usage of this host:
image

The logs where we see that SslContext creation requested ~100 times per minute:
image

@jchambers
Copy link
Owner

the connection of my problem with SslContext CPU costs looks doubtful

Well, more precisely, I'd say you're having a problem that's not common or easy to reproduce. From your flamegraph, it looks like Bouncycastle is spending an enormous amount of time… decrypting a trust store, it looks like? Perhaps there's a way to address that problem separately.

It sounds like you have a good understanding of the problem, though:

we use use FIPS-ed JVM due to customer security requirements, so the set of cryptographic algorithms and its parameters (key lengths for instance) is different from the ones you use in the mentioned benchmark. And probably, the one we use makes SslContext creation much more slower that default ones you used.

But, yes:

the case is not common enough to address it in mainstream.

…you're correct that this is my main point. It sounds like you have an uncommon use case (spawning lots of single-shot clients) combined with an uncommon problem (a higher-than-usual CPU load when loading trust stores). It's entirely possible that you're the only Pushy user who would benefit from the feature you're requesting, and it would come at a cost of complexity and some potential security risk to other users. For those reasons, I still think it makes sense to leave this change out of Pushy.

I recognize that this may be disappointing for you, but I'm confident it's the right move for Pushy as a whole. I wish you luck in addressing the challenge by other means, though!

@kkolyan
Copy link
Author

kkolyan commented Nov 5, 2024

Okay, thanks for considering my arguments and explanation of your point.

@kkolyan
Copy link
Author

kkolyan commented Nov 5, 2024

I recognize that this may be disappointing for you

I worried when I thought you unintentionally silenced me by closing PR... :)

Regarding the problem - my only concern was to be sure that all arguments were considered. And they were and I completely respect your decision. Thanks for your time and attention!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants