Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zync routing #145

Merged
merged 7 commits into from
Jun 14, 2019
Merged

Zync routing #145

merged 7 commits into from
Jun 14, 2019

Conversation

miguelsorianod
Copy link
Contributor

@miguelsorianod miguelsorianod commented Jun 11, 2019

This PR adds the new DeploymentConfig zync-que intented to replace WildcardRouter.
For the moment in the PR the 'zync' DeploymentConfig has not been renamed to zync-puma. If we decide to do so we should document the migration in the upgrade procedure because an existing DeploymentConfig cannot be renamed.
This PR also contains the expected needed permissions to interact with the OpenShift API. A specific ServiceAccount, Role and RoleBinding has been created for zync-que DeploymentConfig for those permissions.

This PR also updates zync and zync-que to have prometheus-related annotations

@miguelsorianod miguelsorianod requested review from mikz and eguzki June 11, 2019 10:28
@miguelsorianod
Copy link
Contributor Author

I've deployed 3scale with this changes and at the pods level they seem to start correctly (deploymentconfigs in ready status).

Logs I see when starting the zync-que pod:

[2019-06-11 08:30:16] INFO  WEBrick 1.3.1
[2019-06-11 08:30:16] INFO  ruby 2.4.5 (2018-10-18) [x86_64-linux]
[2019-06-11 08:30:16] INFO  WEBrick::HTTPServer#start: pid=1 port=9394
Que waiting for jobs...
172.17.0.1 - - [11/Jun/2019:08:30:27 +0000] "GET /metrics HTTP/1.1" 200 - 0.0265
172.17.0.1 - - [11/Jun/2019:08:30:37 +0000] "GET /metrics HTTP/1.1" 200 - 0.0596 // periodically repeated

Also I've verified that the Prometheus url and ports respond correctly:

sh-4.2$ curl 127.0.0.1:9394/metrics -Lvso /dev/null
* About to connect() to 127.0.0.1 port 9394 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 9394 (#0)
> GET /metrics HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1:9394
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain; version=0.0.4
< Server: WEBrick/1.3.1 (Ruby/2.4.5/2018-10-18)
< Date: Tue, 11 Jun 2019 08:31:15 GMT
< Content-Length: 1473
< Connection: Keep-Alive
<
{ [data not shown]

and in zync:

sh-4.2$ curl -Lvso /dev/null 127.0.0.1:9393/metrics
* About to connect() to 127.0.0.1 port 9393 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 9393 (#0)
> GET /metrics HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1:9393
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain; version=0.0.4
< Transfer-Encoding: chunked
<
{ [data not shown]
* Connection #0 to host 127.0.0.1 left intact

Although at infrastructure level seems to work correctly I think we should test the new expected functionalities at application level including the verification of the correct OpenShift permissions assigne to the zync-que ServiceAccount.

cc @mikz for this.

@mikz
Copy link
Contributor

mikz commented Jun 11, 2019

If this is going to work, we probably can remove all existing Route objects (except backend) and let zync create them.

@miguelsorianod
Copy link
Contributor Author

ok we can look at it.
I assume you are saying that zync would automatically create the routes for apicast-staging and apicast-production that are being created for the default tenant and api that is created by system when deploying.

Would Zync create the master route too? and the system-developer and system-admin routes that are created by default? What about the services related to them?

Anyway I think we should try to reproduce the new functionality before merging this so we can know that at least it does the expected basic tasks correctly. Can you tell me the steps needed to reproduce it?

@mikz
Copy link
Contributor

mikz commented Jun 11, 2019

@miguelsorianod exactly, it should be able to create routes for master and the default tenant. It does not create services, those need to be created by the deploy.

I'll deploy it and verify it myself.

@miguelsorianod
Copy link
Contributor Author

ok thank you for the information.

I'll update the PR removing the route creation for:
system master route
system admin default tenant route
system provider default tenant route
apicast-staging and apicast-production default tenant routes

and I'll keep the Service creation for them.

@miguelsorianod
Copy link
Contributor Author

I have removed the previously mentioned routes from the 3scale deployment as this should be managed by Zync now.

@eguzki
Copy link
Member

eguzki commented Jun 11, 2019

@mikz why should a component like zync have platform specific logic like creating routes? This logic should be implemented at templates or operator

@mikz
Copy link
Contributor

mikz commented Jun 11, 2019

@eguzki because routes need to be created when new Tenants / APIs are created. Operator reconciles k8s objects, not data from MySQL database of System.

edit:
Zync is a component that reliably replicates state from porta to external systems (IDPs in the beginning). Turns out that k8s is just another system. It was intended to be used as the means of porta<>apisonator synchronization. And in the future for removing data from external providers like payment gateways.

@eguzki
Copy link
Member

eguzki commented Jun 11, 2019

@mikz Then leave k8s objects to templates and operator and zync takes care of MySQL database of System. That's the idea, isn't it?

@mikz
Copy link
Contributor

mikz commented Jun 11, 2019

@eguzki ok.

  • Operator will query system and reconcile all objects (admin portal, dev portal and apicast routes)
  • Operator must not destroy system by too many calls. So simple pool is not enough. Zync was made to be push + pull model for that reason.
  • It must be supported configuration on all OCP clusters, using templates or operator in the next release.
  • Feature freeze is on Friday.

I'm ok with it if you are :)

@eguzki
Copy link
Member

eguzki commented Jun 11, 2019

If this is all about admin portal, dev portal and apicast routes, then it should be scheduled for the coming releases. Obviously, we do not have time to have it ready on Friday.

@mikz
Copy link
Contributor

mikz commented Jun 12, 2019

@miguelsorianod you can also remove apicast-wildcard-router and relevant services.

@miguelsorianod
Copy link
Contributor Author

@miguelsorianod you can also remove apicast-wildcard-router and relevant services.

Yes, I have that done in another PR #135. Let me know if this PR works and then we can merge that other PR I referenced and test this one with the updated contents of everything

@mikz
Copy link
Contributor

mikz commented Jun 12, 2019

With a few fixes, the following routes are created.

Screenshot 2019-06-12 at 16 25 49

@mikz
Copy link
Contributor

mikz commented Jun 12, 2019

And with master account set to self_managed
Screenshot 2019-06-12 at 17 04 08

@miguelsorianod
Copy link
Contributor Author

miguelsorianod commented Jun 12, 2019

I already merged the PR #135 that removes wildcardrouter. You can update and check whether everything works together correctly

@miguelsorianod miguelsorianod force-pushed the zync-routing branch 2 times, most recently from d62fd42 to 423c806 Compare June 12, 2019 19:38
@guicassolato
Copy link
Contributor

@miguelsorianod exactly, it should be able to create routes for master and the default tenant

I'm not sure if this is a good idea. If the system-zync integration fails in the deployment, then the only way to access the application will be by first using the Rails console to fix whatever is broken with the zync job or in zync itself to only then get the routes created?

@mikz
Copy link
Contributor

mikz commented Jun 13, 2019

@guicassolato if the integration is broken you'll have to fix it first, before you get access to the UI, yes.

I don't really see it as an issue. It should not be broken. Better fail early, than leaving you in some half-working state.

Zync will only manage routes created by zync, so if there are default routes they won't be deleted when default tenant, for example, changes domain.

@miguelsorianod miguelsorianod force-pushed the zync-routing branch 2 times, most recently from 2e04fe5 to b3f95ad Compare June 13, 2019 17:35
- apiGroups:
- extensions
resources:
- ingresses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for those anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@miguelsorianod miguelsorianod changed the title [WIP] Zync routing Zync routing Jun 14, 2019
@miguelsorianod
Copy link
Contributor Author

I deployed the template with specific branches that contain the changes needed at system and zync level, and I see the default routes are created.
I also created an additional service in the default tenant and an additional tenant, and the following routes are created:

msoriano@localhost:~/go/src/github.com/3scale/3scale-operator/pkg/3scale/amp/component (zync-routing)$ oc get routes
NAME                         HOST/PORT                                                             PATH      SERVICES             PORT      TERMINATION     WILDCARD
backend                      backend-3scale.msoriano.192.168.99.100.nip.io                                   backend-listener     http      edge/Allow      None
zync-3scale-api-4vn46        api-3scale-apicast-production.msoriano.192.168.99.100.nip.io                    apicast-production   gateway   edge/Redirect   None
zync-3scale-api-5nbph        api-3scale-apicast-staging.msoriano.192.168.99.100.nip.io                       apicast-staging      gateway   edge/Redirect   None
zync-3scale-api-6fj8f        api-anothertenant-apicast-staging.msoriano.192.168.99.100.nip.io                apicast-staging      gateway   edge/Redirect   None
zync-3scale-api-ggrsd        anotherapi-3scale-apicast-staging.msoriano.192.168.99.100.nip.io                apicast-staging      gateway   edge/Redirect   None
zync-3scale-api-nk2m6        api-anothertenant-apicast-production.msoriano.192.168.99.100.nip.io             apicast-production   gateway   edge/Redirect   None
zync-3scale-api-ttlhz        anotherapi-3scale-apicast-production.msoriano.192.168.99.100.nip.io             apicast-production   gateway   edge/Redirect   None
zync-3scale-master-82fmh     master.msoriano.192.168.99.100.nip.io                                           system-master        http      edge/Redirect   None
zync-3scale-provider-2jhsz   3scale-admin.msoriano.192.168.99.100.nip.io                                     system-provider      http      edge/Redirect   None
zync-3scale-provider-6zx9f   anothertenant.msoriano.192.168.99.100.nip.io                                    system-developer     http      edge/Redirect   None
zync-3scale-provider-gwmjw   anothertenant-admin.msoriano.192.168.99.100.nip.io                              system-provider      http      edge/Redirect   None
zync-3scale-provider-xx7sv   3scale.msoriano.192.168.99.100.nip.io                                           system-developer     http      edge/Redirect   None

@miguelsorianod
Copy link
Contributor Author

miguelsorianod commented Jun 14, 2019

Before merging this PR the following would have to be done:

After merging we should rebuild the nightly images so anyone using the nightly images has a working deployment.
From this point on, if anyone wants to deploy using 2.5 productized images should use the 2.5 tagged release, otherwise no routes will be created at all because the logic for the creation is on the images and 2.5 are old images

@guicassolato
Copy link
Contributor

@guicassolato if the integration is broken you'll have to fix it first, before you get access to the UI, yes.

I don't really see it as an issue. It should not be broken. Better fail early, than leaving you in some half-working state.

Zync will only manage routes created by zync, so if there are default routes they won't be deleted when default tenant, for example, changes domain.

You have a point, @mikz. Integration should work as a Swiss clock watch. We can't deny though that this increases the dependency between components, particularly of system relatively to zync, a relationship at least in Saas recently proven yet not to be as mature as we want.

Why can't zync try to create the route and, if it's already there, start managing the updates? Why does it have to be only for the routes created by zync itself in the first place?

I think having the seed routes (master's, 1st provider's and 1st developer's) created by default in the deployment, as we currently do, is safer and ensures system working OOO in a typical deployment without further coupling between the components at this critical stage. Apart from covering a significant percentage of the use cases – after all not all customers will create a second tenant right away –, it also guarantees that master can access Sidekiq Web Console including to verify possible pending zync jobs.

Maybe this is just my opinion and I'll leave it for the record.

@mikz
Copy link
Contributor

mikz commented Jun 14, 2019

Customers can change the APIcast domain and create new APIs, so that dependency is not going away, ever. They'll just spot something does not work when they start using it.

Yes, zync in still in early phase, because no one went and fixed those performance issues. I already merged some de-duplication of events, that should improve performance and more changes should be coming in the future. This is critical component and will have to be treated as such.

Zync does create route with random name, so it can first create new one and then cleanup the old ones. That makes it quite easy to work around concurrent access to the k8s API, as we are never modifying objects, just creating new ones and deleting old ones.

No one should be forced to login to the master console to see if there are pending zync jobs. Those metrics should be exposed to prometheus.

I'm for using zync for all domains because it makes it exactly one way to do things. No logic necessary, no ifs, no exceptions, no surprises. This just has to work simply because everyone will create an API.

@mikz
Copy link
Contributor

mikz commented Jun 14, 2019

3scale/porta#844 is merged

Zync is now responsible of automatically creating them.
In Kubernetes, when a new role is created the user/serviceaccount
that creates the role has to have the same permissions as the role
that it wants to create, even if he does not directly uses them.
The documentation link explaining this:
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#privilege-escalation-prevention-and-bootstrapping
There's an 'escalate' permission that might allow us to prevent
this but is only compatible with Kubernetes 1.12 or newer
and at this moment we want to maintain compatibility with
at least Kubernetes 1.11.
@codeclimate
Copy link

codeclimate bot commented Jun 14, 2019

Code Climate has analyzed commit 66eac16 and detected 4 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 1
Duplication 3

View more on Code Climate.

@eguzki
Copy link
Member

eguzki commented Jun 14, 2019

@mikz @miguelsorianod raised a concern I share.

Main question is, if route creation is delegated to zync, how is customer supposed to manage certificates??

Some context about this: on 3scale < 2.6, customer could install their certficiates on wilcard route or individual routes for each tenant created. Manual step but full flexibility. On 3scale 2.6 where are those certificates installed? in the routes created by zync after being created?

Furthermore, if customer wants to upgrade from 2.5 with installed certificates to 2.6, where wildcardrouter and wildcard route are gone, where is the customer supposed to install the certificates??

@miguelsorianod
Copy link
Contributor Author

miguelsorianod commented Jun 14, 2019

Did we officially support the client manually adding a custom certificate in the wildcard route?

What I know is that with the template this option is/was NOT supported in an automated way at template deployment time.

To make it clear: If a client for some reason has an existing environment where it has manually added a custom SSL certificate in the wildcard route after deploying the template (I don't know if we officially support that client done by the client), when upgrading from 2.5 to 2.6 that certificate will STOP to exist and thus the SSL validation with the names specified in that certificate

@mikz
Copy link
Contributor

mikz commented Jun 14, 2019

Do we have documentation about setting certificates to 3scale routes? Or to the wildcard router?

We are relying on the OpenShift Router to be configured correctly. I haven't seen any customization to the Route objects.

Was this supported scenario in the first place? Sure, those objects are there and anyone could've modified them. But we can't care about any possible modification of those objects.

I don't see any easy way how to define certificates for individual routes. Possibly by some Secrets and matching labels, but that would be up for definition. Or by admission/mutating webhook in k8s mutate incoming Route objects.

But customizing generated routes was not requested or a scenario I've seen documented/used. Hence it is not implemented. Suggestions on how to do it (if it is required by product) much appreciated.

@miguelsorianod
Copy link
Contributor Author

I'm not convinced on what's our stance on the lose of the certificate if we apply the upgrade procedure from 2.5 to 2.6 as described (recreating individual routes with zync, removing wildcardrouter and wildcard route).

I'll ping product team in our existing Jira issue to know our official position in this regard and how to proceed / what message to give in that regard

For OpenShift 4.1 there's no Wildcard Routes support right now so I think independently on our concern about losing wildcard routes in existing OpenShift 3.x installations we'll improve the situation in 4.1 letting zync manage the creation of routes so we can add this feature anyway.

@miguelsorianod miguelsorianod merged commit d7c4fa3 into master Jun 14, 2019
@eguzki eguzki deleted the zync-routing branch June 14, 2019 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants