Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing slow helm chart deployment #1597

Closed
MitchellGerdisch opened this issue May 28, 2021 · 4 comments · Fixed by pulumi/pulumi#7175
Closed

Seeing slow helm chart deployment #1597

MitchellGerdisch opened this issue May 28, 2021 · 4 comments · Fixed by pulumi/pulumi#7175
Assignees
Labels
impact/performance Something is slower than expected kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@MitchellGerdisch
Copy link

Deploying gloo-ee helm chart on EKS cluster with a set of gloo values specified takes hours to complete.
The delay occurs in both preview and pulumi up use cases and the delay occurs before any resource updates are displayed.
If the gloo values are commented out of the values file, the deployment is speedy (under 2 minutes).

Expected behavior

Be able to handle custom values for the chart in a timely fashion.

Current behavior

Takes hours to process the gloo custom values before deploying the chart components.

Steps to reproduce

  1. Launch an EKS cluster. Using the defaults for an eks cluster launched via the eks package is sufficient. Make sure it returns a stack output named kubeconfig.

  2. Use the attached files (remove the .txt) to launch the gloo-ee chart on the eks cluster from step 1.

  3. Test with the values file where the gloo: section is commented out and notice it deploys quickly.

  4. Uncomment the gloo: section and notice that it takes a long time to process before deploying the components.

Context (Environment)

Affected feature

@MitchellGerdisch MitchellGerdisch added the kind/bug Some behavior is incorrect or out of spec label May 28, 2021
@lukehoban
Copy link
Contributor

This is hanging (or just taking a really long time) in Output.from_input.

The call that hangs is

def to_json(self):
return pulumi.Output.from_input(self.__dict__).apply(
lambda x: json.dumps(x, default=lambda o: {k: v for (k, v) in o.__dict__.items() if v is not None}))
.

The value of print(self.__dict) there is this:

{'namespace': 'gloo-system', 'include_test_hook_resources': None, 'skip_crd_rendering': None, 'values': {'gloo': None, 'gatewayProxies': {'gatewayProxy': {'gatewaySettings': {'customHttpGateway': {'options': {'httpConnectionManagerSettings': {'tracing': {'verbose': True, 'requestHeadersForTags': ['x-user-id'], 'datadogConfig': {'clusterName': 'datadog_agent', 'service_name': 'envoy'}}}}}, 'customHttpsGateway': {'options': {'httpConnectionManagerSettings': {'tracing': {'verbose': True, 'requestHeadersForTags': ['x-user-id'], 'datadogConfig': {'clusterName': 'datadog_agent', 'service_name': 'envoy'}}}}}, 'options': {'accessLoggingService': {'accessLog': [{'fileSink': {'path': '/dev/stdout', 'jsonFormat': {'startTime': '%START_TIME(%Y/%m/%dT%H:%M:%S%z %s)%', 'requestType': '%REQ(:METHOD)%', 'requestPath': '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%', 'protocol': '%PROTOCOL%', 'duration': '%DURATION%', 'responseCode': '%RESPONSE_CODE%', 'upstreamCluster': '%UPSTREAM_CLUSTER%', 'requestSize': '%BYTES_RECEIVED%', 'responseSize': '%BYTES_SENT%', 'clientAddress': '%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%', 'userID': '%REQ(X-USER-ID)%'}}}]}}}, 'kind': {'deployment': None, 'replicas': 10, 'customEnv': [{'name': 'DD_ENV', 'value': 'env'}, {'name': 'DD_AGENT_HOST', 'valueFrom': {'fieldRef': {'fieldPath': 'status.hostIP'}}}]}, 'service': {'extraAnnotations': {'service.beta.kubernetes.io/aws-load-balancer-type': 'nlb', 'service.beta.kubernetes.io/aws-load-balancer-proxy-protocol': '*', 'service.beta.kubernetes.io/aws-load-balancer-access-log-enabled': 'true', 'service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval': '5', 'service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name': 'gloo-access-logs.env.pinecone.io'}}}}, 'tracing': {'provider': {'name': 'envoy.tracers.datadog', 'typed_config': {'@type': 'type.googleapis.com/envoy.config.trace.v3.DatadogConfig', 'collector_cluster': 'datadog_agent', 'service_name': 'envoy'}}, 'cluster': [{'name': 'datadog_agent', 'connect_timeout': '1s', 'type': 'STRICT_DNS', 'lb_policy': 'ROUND_ROBIN', 'load_assignment': {'cluster_name': 'datadog_agent', 'endpoints': [{'lb_endpoints': [{'endpoint': {'address': {'socket_address': {'address': 'datadog-tracing.datadog', 'port_value': 8126}}}}]}]}}]}, 'discovery': {'enabled': False}, 'crds': {'create': True}, 'grafana': {'defaultInstallationEnabled': False}, 'prometheus': {'enabled': False}, 'observability': {'enabled': False}, 'apiServer': {'enable': False, 'enterprise': False}, 'settings': {'replaceInvalidRoutes': True, 'invalidConfigPolicy': {'replaceInvalidRoutes': True, 'invalidRouteResponseCode': 404, 'invalidRouteResponseBody': '{"message": "Not found"}'}}, 'global': {'extensions': {'extAuth': {'deployment': {'replicas': 20}, 'envoySidecar': True, 'standaloneDeployment': False}}}}, 'transformations': None, 'resource_prefix': None, 'api_versions': None, 'chart': 'gloo-ee', 'repo': None, 'version': 'v1.6.2', 'fetch_opts': <pulumi_kubernetes.helm.v3.helm.FetchOpts object at 0x10f070340>, 'release_name': 'glooe-helm-chart'}

It is not at all clear why Output.from_input would take hours to process this value.

@lukehoban
Copy link
Contributor

Here's a reduced program that doesn't use Kubernetes at all - but still hangs for at least several minutes (might well be much longer):

import yaml
import pulumi
loaded_values = yaml.load(open('./values.yaml'), Loader=yaml.FullLoader)
pulumi.Output.from_input(loaded_values)

With the values.yaml file containing just:

gatewayProxies:
  gatewayProxy:
    gatewaySettings:
      options:
        accessLoggingService:
          accessLog:
            - fileSink:
                path: /dev/stdout
                jsonFormat:
                  startTime: "%START_TIME(%Y/%m/%dT%H:%M:%S%z %s)%"

The leaf node value "/dev/stdout" for example is visited hundreds of times by from_input in the first minute of execution.

@lukehoban
Copy link
Contributor

Output.from_input appears to be exponential in the depth of nested objects!

Screen Shot 2021-05-29 at 1 55 28 PM

@joeduffy
Copy link
Member

Wow, that's wild! I couldn't help but look. It appears Output.all takes Input[T]s, so not clear why from_input is also recursively calling from_input, before invoking Output.all -- which itself will also call from_input recursively.

I'm sure there's some subtlety with unwrapping nested outputs or somesuch, however simplifying from_input to the following causes your repro above, Luke, to drop to under 1s for me.

diff --git a/sdk/python/lib/pulumi/output.py b/sdk/python/lib/pulumi/output.py
index e7cf52faa..658320bf2 100644
--- a/sdk/python/lib/pulumi/output.py
+++ b/sdk/python/lib/pulumi/output.py
@@ -255,7 +255,7 @@ class Output(Generic[T]):
         if _types.is_input_type(typ):
             # Since Output.all works on lists early, serialize the class's __dict__ into a list of lists first.
             # Once we have a output of the list of properties, we can use an apply to re-hydrate it back as an instance.
-            items = [[k, Output.from_input(v)] for k, v in val.__dict__.items()]
+            items = val.__dict__.items()

             # pylint: disable=unnecessary-comprehension
             fn = cast(Callable[[List[Any]], T], lambda props: typ(**{k: v for k, v in props})) # type: ignore
@@ -265,15 +265,14 @@ class Output(Generic[T]):
         if isinstance(val, dict):
             # Since Output.all works on lists early, serialize this dictionary into a list of lists first.
             # Once we have a output of the list of properties, we can use an apply to re-hydrate it back into a dict.
-            dict_items = [[k, Output.from_input(v)] for k, v in val.items()]
+            dict_items = val.items()
             # type checker doesn't like returning a Dict in the apply callback
             fn = cast(Callable[[List[Any]], T], lambda props: {k: v for k, v in props}) # pylint: disable=unnecessary-comprehension
             return Output.all(*dict_items).apply(fn, True)

         if isinstance(val, list):
-            list_items: List[Union[Any, Awaitable[Any], Output[Any]]] = [Output.from_input(v) for v in val]
             # invariant: http://mypy.readthedocs.io/en/latest/common_issues.html#variance
-            output: Output[T] = cast(Output[T], Output.all(*list(list_items))) # type: ignore
+            output: Output[T] = cast(Output[T], Output.all(*list(val))) # type: ignore
             return output

         # If it's not an output, list, or dict, it must be known and not secret

lukehoban pushed a commit to pulumi/pulumi that referenced this issue May 31, 2021
These mutually recursive functions unintentionally had exponential complexity in nesting depth of objects, arg types and most likely arrays.

Remove the exponential complexity by avoiding direct recursion of `from_input` on itself, and relying on mutual recursion with `all` alone to reduce nested substrcture.

Also simplify the implementation to aid readability.

Fixes pulumi/pulumi-kubernetes#1597
lukehoban pushed a commit to pulumi/pulumi that referenced this issue May 31, 2021
These mutually recursive functions unintentionally had exponential complexity in nesting depth of objects, arg types and most likely arrays.

Remove the exponential complexity by avoiding direct recursion of from_input on itself, and relying on mutual recursion with all alone to reduce nested substrcture.

Also simplify the implementation to aid readability.

Fixes pulumi/pulumi-kubernetes#1597.
lukehoban pushed a commit to pulumi/pulumi that referenced this issue Jun 1, 2021
These mutually recursive functions unintentionally had exponential complexity in nesting depth of objects, arg types and most likely arrays.

Remove the exponential complexity by avoiding direct recursion of from_input on itself, and relying on mutual recursion with all alone to reduce nested substructure.

Also simplify the implementation to aid readability.

Fixes pulumi/pulumi-kubernetes#1597.
Fixes pulumi/pulumi-kubernetes#1425.
Fixes pulumi/pulumi-kubernetes#1372.
Fixes #3987.
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/performance Something is slower than expected kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants