Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting alternative base64/json encoder/decoder #168

Open
mattrobenolt opened this issue Feb 17, 2022 · 9 comments · May be fixed by #301
Open

Supporting alternative base64/json encoder/decoder #168

mattrobenolt opened this issue Feb 17, 2022 · 9 comments · May be fixed by #301

Comments

@mattrobenolt
Copy link

Would you be open to supporting alternative base64 implementations? The one we've been using is https://github.com/segmentio/asm/tree/main/base64

It's significantly faster than the stdlib encoding, and takes shortcuts that I don't see why wouldn't be acceptable here.

Ideally, if the token parsers were abstracted to accept a Encoding interface, we could pass in our own, with default behaviors to maintain encoding/base64.

In similar vein, similar can be said for JSON codecs. The stdlib is known for having sub-optimal performance here. Swapping encoding/json for something like, fastjson (https://github.com/valyala/fastjson) would be nice.

I'm new to this project and haven't dug in much, so I don't know all the consequences of this or how to gauge it yet, but I know for us internally, we've adopted these codec alternatives for packages we control.

@oxisto
Copy link
Collaborator

oxisto commented Feb 17, 2022

In theory, that could be an interesting option to explore. However, because this library is used by many users, we basically have two major design principles maybe should write them down somewhere :):

  • Zero (0, nil, null) external dependencies
  • Strict adherence to a non-API breaking way of adding features (within the same major version)

So, if we find a way that is non-breaking (and defaults for the stdlib), has no dependencies to the mentioned project (which should be doable, if we introduce interfaces here), I would not strongly object it. Personally, I do not see a major use case though. Not having any hard numbers here, but my guess is that the most expensive operation is the crypto part, so not sure how large the performance gain really is. I do get the idea of having the other libraries in your project and having this consistent.

Testing it would be a little tricky though, without importing the other libraries and we cannot really make any guarantees that everything works with external libraries.

Would you mind drafting up a PR?

@mattrobenolt
Copy link
Author

I can look through it. The nice part about working through interfaces here is we wouldn't need to add any external dependencies.

We can just define a Base64Encoder interface and JSONEncoder interface, and as long as libraries conformed to the expected methods, we'd allow them to be plugged in without strictly depending on them here.

I agree that crypto bits would likely outweigh this, but if you're operating on the order of tens of thousands of rps, these wins with encoding/decoding are likely a measurable %age gain.

I'll probably take a pass and see what a theoretical implementation would look like and throw together a benchmark to prove/disprove that this would be worthwhile.

@oxisto
Copy link
Collaborator

oxisto commented Feb 17, 2022

I had a quick look, starting with the base64 part and I fear that this is not done easily. The main encoding is done in Token.SigningString via EncodeSegment (which then calls base64.RawURLEncoding.EncodeToString). Unfortunately EncodeSegment is exported (although we want to deprecate the export), so changing it is not that easy. One way I tried is to use a function type and have this in the Token struct as a variable.

diff --git a/token.go b/token.go
index 09b4cde..d6a576e 100644
--- a/token.go
+++ b/token.go
@@ -7,7 +7,6 @@ import (
 	"time"
 )

-
 // DecodePaddingAllowed will switch the codec used for decoding JWTs respectively. Note that the JWS RFC7515
 // states that the tokens will utilize a Base64url encoding with no padding. Unfortunately, some implementations
 // of JWT are producing non-standard tokens, and thus require support for decoding. Note that this is a global
@@ -26,6 +25,9 @@ var TimeFunc = time.Now
 // Header of the token (such as `kid`) to identify which key to use.
 type Keyfunc func(*Token) (interface{}, error)

+type EncodeSegmentFunc func([]byte) string
+type DecodeSegmentFunc func(string) ([]byte, error)
+
 // Token represents a JWT Token.  Different fields will be used depending on whether you're
 // creating or parsing/verifying a token.
 type Token struct {
@@ -35,6 +37,8 @@ type Token struct {
 	Claims    Claims                 // The second segment of the token
 	Signature string                 // The third segment of the token.  Populated when you Parse a token
 	Valid     bool                   // Is the token valid?  Populated when you Parse/Verify a token
+
+	b64enc EncodeSegmentFunc
 }

 // New creates a new Token with the specified signing method and an empty map of claims.
@@ -76,15 +80,19 @@ func (t *Token) SigningString() (string, error) {
 	var err error
 	var jsonValue []byte

+	if t.b64enc == nil {
+		t.b64enc = EncodeSegment
+	}
+
 	if jsonValue, err = json.Marshal(t.Header); err != nil {
 		return "", err
 	}
-	header := EncodeSegment(jsonValue)
+	header := t.b64enc(jsonValue)

 	if jsonValue, err = json.Marshal(t.Claims); err != nil {
 		return "", err
 	}
-	claim := EncodeSegment(jsonValue)
+	claim := t.b64enc(jsonValue)

 	return strings.Join([]string{header, claim}, "."), nil
 }

This could then be configured by NewWithClaims and functional options that we explored elsewhere as well.

Problematic is the decode part. Decoding is done in 7 places (1x for the header, 1x for the claim and 5x for the individual signatures). Header and claim are probably easy because they are part of the Parser struct, which already has functional options. More tricky (or maybe close to impossible is the signature part, and you would need to extends its Verify method. Not impossible, but not one-liner either.

@dcalsky
Copy link
Contributor

dcalsky commented Mar 4, 2022

After fine-tuning the structure of Token, I tested the SigningString method that gains the significant benefit from substituting std json encoder, from std to bytedance/sonic, which is likely to improve performance for parsing and signing both.

And promising no breaking changes, I'll raise an PR to implement this idea.

func BenchmarkSigningString(t *testing.B) {
	token := &jwt.Token{
		Raw:       data.fields.Raw,
		Method:    data.fields.Method,
		Header:    data.fields.Header,
		Claims:    data.fields.Claims,
		Signature: data.fields.Signature,
		Valid:     data.fields.Valid,
	}
	t.Run("std", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			_, _ = token.SigningString()
		}
	})

	t.Run("sonic", func(b *testing.B) {
		token.JsonEncFunc = sonic.Marshal
		for i := 0; i < b.N; i++ {
			_, _ = token.SigningString()
		}
	})
}

Benchmark:

goos: darwin
goarch: amd64
pkg: github.com/golang-jwt/jwt/v4
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkStdSignString
BenchmarkStdSignString/std
BenchmarkStdSignString/std-12         	  670904	      1545 ns/op
BenchmarkStdSignString/sonic
BenchmarkStdSignString/sonic-12       	 1538072	       766.3 ns/op

This was referenced Feb 21, 2023
@oxisto
Copy link
Collaborator

oxisto commented Feb 21, 2023

#278 is a first step towards that direction. While we will probably not solve this particular issue with v5.0.0, my goal would be to get #278 accepted for v5.0.0, since it breaks some of the API, but lays the groundwork to implement the rest of it without breaking the API again.

@oxisto
Copy link
Collaborator

oxisto commented Mar 31, 2023

@mattrobenolt @dcalsky Now that this should be possible with v5 (at least for b64), would any of you be interested in drafting a PR?

@dcalsky
Copy link
Contributor

dcalsky commented Apr 2, 2023

@mattrobenolt @dcalsky Now that this should be possible with v5 (at least for b64), would any of you be interested in drafting a PR?

I'd like to. I'll submit the PR after ending my vacation.

@orangeswim
Copy link

Hi everyone. Does this issue need a new PR to keep it moving for custom Json encoder/decoder?
echo repo did something similar, labstack/echo#1880

@oxisto
Copy link
Collaborator

oxisto commented Oct 14, 2024

Hi everyone. Does this issue need a new PR to keep it moving for custom Json encoder/decoder? echo repo did something similar, labstack/echo#1880

It is actually already part of #301. Progress on this is slow for two reasons:

  • The PR was since reworked (largely my be) from the original authors contributions, but some minor things (mostly documentation, cleanup) is still missing, it is a little bit unclear who is actually doing that (or should do that), probably me 🤣
  • @mfridman was not 100 % convinced of the benefits of this feature in contrast to the inner reworking that we need to do; I wanted to do some follow up investigation whether we have any detrimental effects of this (e.g. on performance), but I also didn't do that (yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants