Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting up traffic splitting of service #210

Closed
maximilien opened this issue Jun 27, 2019 · 15 comments · Fixed by #345
Closed

Support setting up traffic splitting of service #210

maximilien opened this issue Jun 27, 2019 · 15 comments · Fixed by #345
Assignees
Labels
kind/feature New feature or request
Milestone

Comments

@maximilien
Copy link
Contributor

maximilien commented Jun 27, 2019

With implementations of kn route in the works (issues #205, #201, #204), it's perhaps time to think about how to allow users to set traffic splitting on their services.

Not only as potentially a kn route update ... with percentage values for different revisions, but also whether other types of traffic routing splitting is warranted.

I know @duglin has ideas on "feature flag traffic splitting" so let's use this issue to discuss the client UX. Perhaps the MVP and the v2 and nice to have.

@duglin
Copy link
Contributor

duglin commented Jun 27, 2019

I'd prefer if users didn't think about traffic splitting as a "route" manipulation, because it's not. It's a modification to a KnService resource. As an example, let's say I have this KnService:

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: echo
spec:
  template:
    spec:
      containers:
        - image: duglin/echo
  traffic:
  - tag: old
    revisionName: echo-v1
    percent: 50
  - tag: new
    revisionName: echo-v2
    percent: 50
  - tag: latest
    latestRevision: true
    percent: 0

I'd like to discuss how this would appear on a kn service command - probably under update but only if the syntax fits, if not then a new verb might be needed.

Just some brainstorming/ramblings....

  • there are 3 pieces of information that are associated with each possible revisions/route: tag, revisionName and percent
  • revisionName is the only one that's (sort of) static because it acts as the unique identifier for the revision we're talking about for that traffic split/route
  • that leads me to think that each traffic split would need to be identified/configured by at least a single word value (revisionName), and at most a triple word value
  • something like: echo-v1 vs echo-v1:50 vs echo-v1:50:old
  • the reason I think things need to be merged into a single string is because the user will need to specify all possible routes on a single command line - and trying to do some sort of correlation of values across args feels like it'll be a challenge and messy for people.

So, with that, the above KnService update command might look like:
kn service update echo --traffic echo-v1:50:old --traffic echo-v2:50:new --traffic latest:0:latest

Note that I think the general syntax for the arg would be:
revisionName[:[percent][:tag]]

  • if percent is absent then it defaults to zero, just like serving does
  • if tag is absent then there is not default value and there is no custom URL for that revision
  • if you want to specify a tag w/o a percent then you can just do revision::tag. I thought about it just being revision:tag and calculating whether tag was really a tag or a percent based on whether it was an integer but i didn't think that was safe or a good UX. I think being explicit feels better.
  • I put tag at the end because between percent and tag I'm guessing tag might be left off more often than percent (even when zero, I suspect a lot of people will still want to put 0 just to be clear about things), but that's just a guess.

For clarity, other examples:

  • echo-v1 -> revision echo-v1 gets no traffic and no custom URL
  • echo-v1: -> same as previous
  • echo-v1:0 -> same as previous
  • echo-v1:100 -> revision echo-v1 gets 100% of the traffic, no custom URL
  • echo-v1:100: -> same as previous
  • echo-v1:100:echo-old -> revision echo-v1 get 100% of traffic, custom URL starts with echo-old
  • echo-v1::echo-old -> revision echo-v1 gets no traffic, custom URL starts with echo-old

@sixolet
Copy link
Contributor

sixolet commented Jul 1, 2019

First some commentary on what's here.

  • echo-v1 -> revision echo-v1 gets no traffic and no custom URL

The semantics here suggests this should be removed from the traffic block entirely, but this seems a confusing way to specify that.

  • echo-v1: -> same as previous
  • echo-v1:0 -> same as previous

This would be a clearer way to specify removing something from traffic.

The traffic assignment command surface is subject to a variety of challenges.

Some goals you might come to the task with:

  • Ramp up/down echo-v3 to 20%, adjusting other traffic to accommodate
  • Give echo-v3 the tag candidate, without otherwise changing traffic
  • Give echo-v3 the tag candidate, and 2% of traffic adjusting other traffic to accommodate
  • Give whatever has the tag candidate 10% of traffic, adjusting the traffic on whatever has the tag current to accommodate.
  • Change the tag for echo-v3 from candidate to current.
  • Remove the tag current from echo-v3.
  • Remove echo-v3 from the traffic assignments entirely, adjusting other traffic to accommodate.
  • Set the traffic assignments to a particular distribution.

There are two ways to go here — one in which you have to plan out the resulting traffic distribution on paper and put every element in your command line just right, and one in which you only have to specify what you're doing. The former is more explicit, the latter is less error-prone (because you're less likely to forget that you had a revision tagged, or had a revision in your list of revisions that should get some traffic, etc)

Your suggestion above can do a lot of the above in the "only specify what you're doing" way, given some modifications:
[revisionName][:[percent][:tag]]

  • Don't change anything about any un-mentioned traffic entries.
  • If revisionName is absent, tag is required, and refers to an existing entry in traffic by tag.
  • If percent is rest, this indicates "assign however much traffic causes the total to be 100%".
  • If percent is not present, do not change percent.
  • If tag is not present, do not change the entry's tag
  • If tag is blank, clear the entry's tag.

So those scenarios:

  • Ramp up/down echo-v3 to 20%, adjusting other traffic to accommodate
    --traffic echo-v3:20 --traffic echo-v2:rest
  • Give echo-v3 the tag candidate, without otherwise changing traffic
    --traffic echo-v3::candidate
  • Give echo-v3 the tag candidate, and 2% of traffic adjusting other traffic to accommodate
    --traffic echo-v3:2:candidate --traffic echo-v2:rest
  • Give whatever has the tag candidate 10% of traffic, adjusting the traffic on whatever has the tag current to accommodate.
    --traffic :10:candidate --traffic :rest:current
  • Change the tag for echo-v3 from candidate to current.
    --traffic echo-v3::current
  • Remove the tag current from echo-v3.
    --traffic echo-v3::
  • Remove echo-v3 from the traffic assignments entirely, adjusting other traffic to accommodate.
    --traffic echo-v3:0: --traffic echo-v2:rest # would leave in the list with a tag w/o trailing colon
  • Set the traffic assignments to a particular distribution.
    --traffic echo-v1:50:old --traffic echo-v2:50:new --traffic latest:0:latest

WHEW.

@maximilien
Copy link
Contributor Author

maximilien commented Jul 2, 2019

Nice update @sixolet. I have a team member here in Beijing looking at summarizing the discussions on "traffic splitting" into a proposal docs. Let me tag her to make sure she includes your comments.

CC: @zhanggbj

@duglin
Copy link
Contributor

duglin commented Jul 2, 2019

My concern with basically providing a delta is:

  • it assumes people know (for sure) what's out there already - which is risky
  • or at least it requires people to do a GET to see what's there - which is combersome
  • it assumes no one will edit the state of things between the GET and the delta - see first one

I actually disagree that delta are less error prone, I think it's far harder for people to calculate a delta instead of being explicit. This is why in many cases I've seen people provide the default value for flags instead of letting the defaults kicks in - it allows people to know for sure what's going on and that gives them a sense of comfort and control.

If we were talking about modifying one control knob (by saying "increase this by X") and it didn't have any impact on any other knobs then I think deltas make some sense. But in this case we're dealing with a potentially long list of routing paths and asking the user to calculate this delta doesn't feel trivial to me when as they change one knob they need to make sure all of the others are modified appropriately. Plus look at the list of rules you had to specify in order to explain what will happen based on the absence or presence of things - that's non-trivial. In my proposal absence means what people expect... you get nothing for that field.

@duglin
Copy link
Contributor

duglin commented Jul 2, 2019

to be clear... I'm not against some sort of mixture of the two, but i do think the delta side of it needs to be limited in scope otherwise we'll scare people with the amount of processing rules and delta calculations we're asking of them

@sixolet
Copy link
Contributor

sixolet commented Jul 2, 2019 via email

@sixolet
Copy link
Contributor

sixolet commented Jul 2, 2019 via email

@duglin
Copy link
Contributor

duglin commented Jul 2, 2019

The read-modify-write thing you're worrying about is going to be prevented by a resource version field, when done by the client.

it doesn't because you're thinking of it from a Kube perspective. I'm thinking of it from a user perspective. Yes we can prevent Kube from complaining but that won't stop the case of someone thinking one revision is at 10% when in reality it's at 90% because someone else just changed it - and not knowing this information could have changed how the user does their delta.

@sixolet sixolet added this to the v0.8.0 milestone Jul 9, 2019
@sixolet sixolet added the kind/feature New feature or request label Jul 11, 2019
@navidshaikh
Copy link
Collaborator

navidshaikh commented Jul 18, 2019

The number of combinations and their rules can be simplified if we revisit the two operations being performed, i.e. (1) doing actual traffic split (b) assign the custom traffic tags.

Roland and I spent some time on this and proposing UX for this in doc
https://docs.google.com/document/d/1ud0_Hoax4Y785fPrD_Pa9y1IufOxwBM5Tu7AZtI75PQ/

PTAL.

@zhanggbj
Copy link

Hi,

Based on the discussion above, I consolidated and drafted a proposal about traffic splitting, blue/green deployment and etc. And had an initial discussion with @duglin and @maximilien today, I would like to share the proposal and we can discuss further. Any comment is welcomed. Thanks!
(Some items marked with WIP need further investigations.)

https://docs.google.com/document/d/1oxbFqIUPEjLdTSikjNVVYGuYVeX4oHX9w1eX1kL8pxs/edit?usp=sharing

There may be some overlap with the proposal above but we can figure out later after discussions:)

@navidshaikh
Copy link
Collaborator

Thanks for the feedback on the doc

https://docs.google.com/document/d/1ud0_Hoax4Y785fPrD_Pa9y1IufOxwBM5Tu7AZtI75PQ/

Writing down the feedback below:

  1. Use entry point kn service set-target instead of kn service update. Why?

    • Having explicit set-traffic verb avoid a long list of options/flags in service update command and docs
    • set verb implies replacing existing values and using whats given and I think this is the user experience we want.
    • update on the other hand might imply that keep what's existing and append/amend whats given (that's how it is in service update)
    • Having explicit verb set-traffic should work with number of arguments and not requiring repetitive flags like --traffic
  2. Remove % from the traffic split command

  3. Rename verb kn revision tag to something near to service as the object to operate on is Service.

    • chose kn service tag-target (as the tag is set on target and not on revision)
  4. Have support for use case where a single revision might have multiple tags present in traffic block, meaning

    traffic:
     - percent: 50
       revisionName: echo-v1
       tag: prod
     - percent: 50
       revisionName: echo-v1
       tag: stage
    

Opened PR #285 as traffic splitting doc to help ease reviewing.

Examples after incorporating the feedback:

  1. Tag revisions echo-v1 and echo-v2 as stable and staging
kn service tag-target echo-v1:stable echo-v2:staging
  1. Ramp up/down revision echo-v3 to 20%, adjusting other traffic to accommodate:
kn service set-traffic svc1 echo-v3=20 echo-v2=*
  1. Give revision echo-v3 the tag candidate, without otherwise changing any traffic split:
kn service tag-target echo-v3:candidate
  1. Give echo-v3 the tag candidate, and 2% of traffic adjusting other traffic to go to revision echo-v2:
kn service tag-target echo-v3:candidate
kn service set-traffic svc1 candidate=2 echo-v2=*
  1. Give whatever revision has the tag candidate 10% of the traffic, adjusting the traffic on whatever revision has the tag current to accommodate:
kn service set-traffic svc1 candidate=10 current=*
  1. Update the tag for echo-v3 from candidate to current:
kn service untag-target echo-v3:candidate
kn service tag-target echo-v3 current
  1. Remove the tag current from echo-v3:
kn service untag-target echo-v3:current
  1. Remove echo-v3 from the traffic assignments entirely, adjusting echo-v2 to fill up:
kn service set-traffic svc1 echo-v3=0 echo-v2=*  #echo-v2 gets 100% traffic portion here
kn service untag-target echo-v3:current
  1. Tag revision echo-v1 as stable and current with 50-50% traffic split
kn service tag-target echo-v1:stable echo-v1:current
kn service set-traffic svc1 stable=50 current=50

@duglin
Copy link
Contributor

duglin commented Jul 23, 2019

See: #285 (comment) for why searching for traffic blocks isn't possible.

@navidshaikh
Copy link
Collaborator

I'd like to work on this, already got some code in place, anyone else would like to collaborate feel free to comment or just ping on slack.

/assign @navidshaikh

@zhanggbj
Copy link

I also would like to work on this item and collaborate with @navidshaikh.
/assign @zhanggbj

I think the voting for differences for the two proposals is almost done. After there's an agreement, we can break them down to specific items and pick up and work on them. Thanks!

@toVersus
Copy link
Contributor

I also would like to work together, @navidshaikh @zhanggbj.
/assign @toVersus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants