Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inconsistent error behavior between Resolve<T> and ResolveAll during flagd runtime #1481

Open
cupofcat opened this issue Dec 18, 2024 · 7 comments
Labels
bug Something isn't working Needs Triage This issue needs to be investigated by a maintainer

Comments

@cupofcat
Copy link

Observed behavior

For targeting rules that can return different types based on the context (which does not conform to schema but is allowed by flagd with just a warning):

  • ResolveAll checks whether the type of the evaluated variant matches the type of the defaultVariant
  • Resolve<T> checks whether the type of the evaluated variant matches <T> (defaultVariant does not matter)

Expected Behavior

I would expect the behavior to be consistent. If defaultVariant is used to infer the flag type, then Resolve<T> methods should also check whether <T> matches the type of defaultVariant and not the type of the evaluated variant.

Steps to reproduce

For the following flag config:

{
    "$schema": "https://flagd.dev/schema/v0/flags.json",
    "flags": {
      "is-enabled": {
        "defaultVariant": "off",
        "state": "ENABLED",
        "targeting": {
          "if": [
            {
              "<": [
                {
                  "%": [
                    {
                      "var": "request_id"
                    },
                    1000
                  ]
                },
                100
              ]
            },
            "on",
            "off"
          ]
        },
        "variants": {
          "on": 1,
          "off": false
        }
      }
    }
  }

flagd returns the following:

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveBoolean" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 42}}' -H "Content-Type: application/json"
{"code":"invalid_argument","message":"Type mismatch error"}%

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveBoolean" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 420}}' -H "Content-Type: application/json"
{"value":false, "reason":"TARGETING_MATCH", "variant":"off", "metadata":{}}%

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveInt" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 42}}' -H "Content-Type: application/json"
{"value":"1", "reason":"TARGETING_MATCH", "variant":"on", "metadata":{}}%

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveInt" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 420}}' -H "Content-Type: application/json"
{"code":"invalid_argument","message":"Type mismatch error"}%

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveAll" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 42}}' -H "Content-Type: application/json"
{"flags":{"is-enabled":{"reason":"ERROR", "variant":"on", "boolValue":false}}}%

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveAll" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 420}}' -H "Content-Type: application/json"
{"flags":{"is-enabled":{"reason":"TARGETING_MATCH", "variant":"off", "boolValue":false}}}%

When changing the defaultVariant in the config:

{
    "$schema": "https://flagd.dev/schema/v0/flags.json",
    "flags": {
      "is-enabled": {
        "defaultVariant": "on",
...
      }
    }
  }

ResolveAll responses:

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveAll" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 42}}' -H "Content-Type: application/json"
{"flags":{"is-enabled":{"reason":"TARGETING_MATCH", "variant":"on", "doubleValue":1}}}%

curl -X POST "http://localhost:8013/flagd.evaluation.v1.Service/ResolveAll" \
  -d '{"flagKey":"is-enabled","context":{"request_id": 420}}' -H "Content-Type: application/json"
{"flags":{"is-enabled":{"reason":"ERROR", "variant":"off", "doubleValue":0}}}%
@cupofcat cupofcat added bug Something isn't working Needs Triage This issue needs to be investigated by a maintainer labels Dec 18, 2024
@cupofcat
Copy link
Author

One way to fix this would be to modify resolve[T constraints](...) to do a type check:

if metadata[TypeMetadataKey] != reflect.TypeOf(*new(T)) {
	return value, variant, reason, metadata, errors.New(model.TypeMismatchErrorCode)
}

This would require adding the flag type to metadata, possibly during evaluation in evaluateVariant():

if value, ok := flag.Variants[flag.DefaultVariant]; ok {
	metadata[TypeMetadataKey] = reflect.TypeOf(value)
} else {
	je.Logger.ErrorWithID(reqID, fmt.Sprintf("Error inferring type for flag %s. defaultVariant (%s) not found in variants (%v)",
		flagKey, flag.DefaultVariant, flag.Variants))
	return "", flag.Variants, model.ErrorReason, metadata, errors.New(model.InvalidContextCode)
}

My question is, can we do that or would that be considered a breaking change and should be added as a new configuration option?

@aepfli
Copy link
Member

aepfli commented Dec 18, 2024

This is a somewhat interesting question; on the one hand, this flag configuration violates the JSON schema. So, this is a user error in a certain way, but it can happen, and we should (even if we might come to the conclusion it is intended) do something.

I am currently torn here a little bit. What our expected behavior is.

I think that the resolve<T>(...)-API is pretty solid, and in the end, requesting an Integer means that I want to have an Integer, and I checked against this, not against the type of the default variant. This would also mean we would need to adapt all the implementations of our in-process evaluators, as this should be consistent, regardless of whether RPC or in-process evaluation. But I do see your point, and I agree that consistency should be one of our big goals.

I want to ask if we should also evaluate whether a change in ResolveAll's behavior could reduce the pain or solve this problem. Maybe generalizing the return value and letting the implementor decide how he wants to use this type could be feasible.

Sadly, I am not profound enough with the resolveAll endpoint (aka little to no usage) to give some valuable input here.
But we should consider both sides here to find the best suitable solution, and maybe, after all, we could see this as a user error.

What if we take the route of information and use the metadata, hooks, special events, and logs to inform about inconsistencies?

@beeme1mr
Copy link
Member

Perhaps we could support a flag type in the flag definition. We're doing something similar in the code gen schema here. This would have to be optional, but we could use it to validate all the values during initialization.

@cupofcat
Copy link
Author

I do think that adding a flag type to the schema is something we should also do. However, as @beeme1mr says, it will have to be optional and so there still will be users who don't have it in their flag config. We could (and IMHO should) also add an option for stricter validation during flag sync (erroring rather than warning) but that also will not be universally used. And so, we still should think about (and hopefully improve) the current state.

@aepfli I think one challenge with Resolve<T>, from the perspective of the developer (user of the flagd provider) today is that it can start throwing an error without any changes to their application logic. Just because of a slight change to context, or because someone else added a new branch to the config and made a mistake. I would expect the defaultVariant to be much more stable. It does make some sense to use the default to infer the type and check against that.

I guess the question is, do we expect a lot of users today to actually rely on the behavior that different branches of the flag can evaluate to different types? Would anyone actually want a flag that can be either true or 42, depending on a context? If no, let's make it consistent. If yes, let's add a configuration option to control the behavior of when errors are thrown.

WDYT?

@cupofcat
Copy link
Author

I also started the second issue about the config validation behavior - #1487

@beeme1mr
Copy link
Member

I guess the question is, do we expect a lot of users today to actually rely on the behavior that different branches of the flag can evaluate to different types? Would anyone actually want a flag that can be either true or 42, depending on a context? If not, let's make it consistent. If yes, let's add a configuration option to control the behavior of when errors are thrown.

No, we don't want users to be able to have different types in the same flag configuration. I'm in favor of making this consistent. If we add the type to the schema, we can perform more sophisticated type checks during validation (and in the JSON schema itself). We could then treat this as a user opting into a more strict type-checking behavior.

Perhaps we could start by updating the flagd schema to include this new property. We may be able to leverage some of the work @anghelflorinm did in the cli schema.

Any concerns with what I outlined above? @toddbaert @aepfli @lukas-reining @cupofcat

toddbaert added a commit to open-feature/community that referenced this issue Dec 19, 2024
@cupofcat has been working on flagd and flagd providers:

- open-feature/go-sdk-contrib#598
- open-feature/flagd#1481
- open-feature/flagd#1487


Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
@toddbaert
Copy link
Member

I guess the question is, do we expect a lot of users today to actually rely on the behavior that different branches of the flag can evaluate to different types? Would anyone actually want a flag that can be either true or 42, depending on a context? If not, let's make it consistent. If yes, let's add a configuration option to control the behavior of when errors are thrown.

No, we don't want users to be able to have different types in the same flag configuration. I'm in favor of making this consistent. If we add the type to the schema, we can perform more sophisticated type checks during validation (and in the JSON schema itself). We could then treat this as a user opting into a more strict type-checking behavior.

Perhaps we could start by updating the flagd schema to include this new property. We may be able to leverage some of the work @anghelflorinm did in the cli schema.

Any concerns with what I outlined above? @toddbaert @aepfli @lukas-reining @cupofcat

No objections to adding the flag type to the schema, but if I'm following the discussion so far, I think we also agree we should "make things consistent" between resolve<T>() and resolveAll() in terms of validating against the default (as well as against T); I think we can do this now, without waiting for the schema change. Please correct me if I'm wrong.

We should also spec this and add a spec for it to the gherkin: https://github.com/open-feature/flagd-testbed/blob/main/gherkin/flagd-json-evaluator.feature#L117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Needs Triage This issue needs to be investigated by a maintainer
Projects
None yet
Development

No branches or pull requests

4 participants