Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 253: CDK Metadata v2 #254

Merged
merged 9 commits into from
Nov 26, 2020
Merged

RFC 253: CDK Metadata v2 #254

merged 9 commits into from
Nov 26, 2020

Conversation

rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Sep 24, 2020


By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license

@rix0rrr rix0rrr self-assigned this Sep 24, 2020
@rix0rrr rix0rrr mentioned this pull request Sep 24, 2020
7 tasks
Copy link
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not gone through entirely. Comments so far...

text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
@rix0rrr rix0rrr requested a review from a team September 28, 2020 11:24
* Verify metrics are coming in
* Remove the old `Modules` field from the metadata resource.

# Appendix 1: Reference implementation for tree-encoding used above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't really understand what this code is doing, or why it's necessary in the RFC.

Copy link
Contributor Author

@rix0rrr rix0rrr Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is going to come in useful for the final implementation if it's someone other than me going to be doing it.

@rix0rrr rix0rrr requested review from a team and NetaNir September 30, 2020 07:54
@mergify mergify bot dismissed nija-at’s stale review September 30, 2020 07:55

Pull request has been modified.

Copy link
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments so far...

text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
Comment on lines 124 to 125
files like `index.ts`). If it is, take the file with the file path deepest
in the directory hierarchy. That approach is going to help us identify the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is deepest in the directory correct?

I'm a little concerned about the assumptions being made in this algorithm in terms of performance of these scans and jumping into the Nodejs internals. Are they guaranteed to be stable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems so. The API is declared "Stable": https://nodejs.org/api/modules.html

const richBucket = new RichBucket(this, 'RichBucket', { ... });
```

### Identifier components
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability: Lift this section further up in the document and merge it perhaps with "Collecting construct identifiers"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm yeah. I've been debating the order to introduce stuff here. I feel like I have a cyclic dependency between what I want to collect (specifically: submodule name) and why I decide that I need to collect that (it's because the only thing I can get from NodeJS reflection is the filename).

I guess outlining the subproblems we have to solve first is going to make this section easier to follow.

to obtain the module name", for example. This reliance on convention seems
brittle and unflexible. Instead, let's use a configuration file.

One possible candidate is the jsii manifest. It contains the right module names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the jsii manifest for monocdk-experiment and here's a snippet for eks Cluster and ecs Cluster -

      "name": "Cluster",
      "namespace": "aws_ecs",
      "properties": [
        {
          "docs": {
            "stability": "experimental",
            "summary": "The Amazon Resource Name (ARN) that identifies the cluster."
          },
          "immutable": true,
          "locationInModule": {
            "filename": "lib/aws-ecs/lib/cluster.ts",
            "line": 85
          },
          "name": "clusterArn",
          "overrides": "monocdk-experiment.aws_ecs.ICluster",
          "type": {
            "primitive": "string"
          }
        },
      "name": "Cluster",
      "namespace": "aws_eks",
      "properties": [
        {
          "docs": {
            "remarks": "This role also has `systems:master` permissions.",
            "stability": "experimental",
            "summary": "An IAM role with administrative permissions to create or update the cluster."
          },
          "immutable": true,
          "locationInModule": {
            "filename": "lib/aws-eks/lib/cluster.ts",
            "line": 685
          },
          "name": "adminRole",
          "type": {
            "fqn": "monocdk-experiment.aws_iam.Role"
          }
        },

It seems to me that from here we can derive the construct name (name property), module it is in (namespace property) and source file (locationInModule property).

Maybe we can update the JSII manifest to contain a locationInModule property for where the construct itself is, instead of individual properties.

What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locationInModule is not guaranteed to exist. Only current purpose is to generate links to the GitHub repo in the docs.

Using the JSII manifest limits us to JSII libraries. We might say "well always compile with jsii" but what if for some reason some sister team doesn't want to, but does want tracking?

The JSII manifests are ginormous (22M) and I'd rather avoid reading it if I can.

We would have to make assumptions on filenames that I'm not comfortable with. Today, locationInModule contains a source path (lib/aws-eks/lib/cluster.ts), while the runtime path (we still have to read!) is going to refer to the compilation target path /path/to/node_modules/aws-cdk-lib/lib/aws-eks/lib/cluster.js. So now we have to map .js to .ts.

To improve loading speed by a factor of 10 I'd want us to webpack the individual modules in the future (NodeJS takes a rather long time to load each and every individual source file), which means the jsii manifest will contain lib/aws-cdk/lib/cluster.ts but the runtime path we detect will look like aws-cdk/lib/aws-eks/index.js. Now we have to map that in addition.

I'd rather not tie these things together.

Copy link
Contributor

@nija-at nija-at Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if for some reason some sister team doesn't want to, but does want tracking?

I don't see a use case yet. This will be a nice-to-have, but we probably shouldn't index on this.

To improve loading speed by a factor of 10 I'd want us to webpack the individual modules in the future (NodeJS takes a rather long time to load each and every individual source file), which means the jsii manifest will contain lib/aws-cdk/lib/cluster.ts but the runtime path we detect will look like aws-cdk/lib/aws-eks/index.js. Now we have to map that in addition.

Isn't your current algorithm also subject to the same problems if we change how we pack our output artifacts (webpack or otherwise). Things like - determining module name from file path, using the deepest export - come to mind.


Between this comment and #254 (comment), an option to generate the necessary metadata at compile time or even pack time (so that the path-to-module mapping is consistent) feels worth exploring. Don't you think? At least called out as an alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just getting into the cdk-metadata.json section. Maybe this is what you're already doing that file.

rix0rrr and others added 3 commits October 6, 2020 13:24
Co-authored-by: Niranjan Jayakar <nija@amazon.com>
Co-authored-by: Niranjan Jayakar <nija@amazon.com>
@rix0rrr rix0rrr requested a review from nija-at October 6, 2020 13:17
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved

For every construct used in a Stack from a library containing a
`cdk-metadata.json` file, we will collect the library name, version and
qualified class name (using a generated lookup table to obtain submodule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're measuring constructs. class is just the impl.

Suggested change
qualified class name (using a generated lookup table to obtain submodule
constructs used from the library (using a generated lookup table to obtain submodule

text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
When we pick option 2, every stack is bound to end up with the constructs `Construct`
and `Resource` (resulting form example code as in `(a)` and `(b)`), leading to noise.

Nevertheless, we will still pick option 2 (following up the inheritance chain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it makes sense to follow up the inheritance chain and just filter out certain base classes like Construct and `Resource.

text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
to obtain the module name", for example. This reliance on convention seems
brittle and inflexible. Instead, let's use a configuration file.

#### jsii manifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to an appendix of "alternatives considered" since it's just a discussion

text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
@rix0rrr
Copy link
Contributor Author

rix0rrr commented Oct 21, 2020

Points to discuss:

  • require jsii or not
  • hardcoded package list or open system
  • compression

  • Only customer-instantiated constructs? What about konstruk.AwsSqsLambda ?

mergify bot pushed a commit to aws/aws-cdk that referenced this pull request Nov 6, 2020
In v2, the `@aws-cdk/core` and `@aws-cdk/cx-api` modules are
private.
Update the tests so that it will only operate on v1. In v2, we will need
to design a new set of tests. This will be included in the re-design of
metadata collection - aws/aws-cdk-rfcs#254


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Show resolved Hide resolved
text/0253-cdk-metadata-v2.md Outdated Show resolved Hide resolved

export class SomeClass {
// Every class gets an additional field added to it
private static [vfqnSym] = 'module.SomeClass@1.2.3';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work for "pure" interfaces? This includes cases where we return a private type that implements a public interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. This is already explained elsewhere, and is a regrettable problem.

We might conceivably add a feature to jsii-rtti to manually annotate an anonymous object instance. But that seems like a v2 concern.

Comment on lines +351 to +357
/**
* How many classes were skipped before finding an ancestor with type information
*
* This number will be 0 if the object itself was an instance of a class with
* type information.
*/
readonly classesSkipped: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for this information? Smells of future-proofing?

Copy link
Contributor Author

@rix0rrr rix0rrr Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given an inheritance chain that potentially looks like this:

Root -> ClassA -> ClassB -> (unknown class) -> (unknown class) -> INSTANCE
                     ↑
                     |
              this is the first one we can return

Just saying it's a ClassB is not accurate.

I'm not 100% sure if it makes sense to report ClassB in this case or simply not report anything. Having this information gives us the flexibility to easily switch between them.


The list of construct identifiers we have discovered may be sizeable. A rough
estimate of the maximum number of constructs used in a stack will be about
400 (maximum of 200 L1s with a corresponding L2 wrapper). We will therefore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to record both L2 and L1 in case the L2 is used? It actually makes sense to omit the L1 in this case I think so we can easily identify direct L1 usage (which is of particular interest for our construct library planning).

The simplest solution would be to omit L1s that are named Resource and that their patent construct inherits from the Resource base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been on the fence about this, and trying to come up with a more general rule that would also cover this (but failing to).

But I guess it makes sense to just special-case it, you're right.


Encoding may change the type (for `afx` the encoded value is a `string` but the decoded
value is a `string[]`).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern this might raise is that the metadata sent to aws will become opaque. As mitigation maybe we can emit an additional file to the cloud assembly (analytics.json) with the unencoded value. This way people will have visibility to it if they examine cdk.out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that, and I'm not opposed to it. I'm not convinced that it'll be necessary though, given that the behavior of the afx-encoded string will be such that if you paste it into bash by doing echo <afx-encoded-string-here>, bash brace expansion is going to reconstruct the full list.

It's not like it's a secret, we're publishing the spec out in the open.

But sure, if you feel it should be MORE readable than that, we can also emit the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will add a level of trust

text/0253-cdk-metadata-v2.md Show resolved Hide resolved
@rix0rrr rix0rrr merged commit cb62847 into master Nov 26, 2020
@rix0rrr rix0rrr deleted the huijbers/0253-cdk-metadata-v2 branch November 26, 2020 10:31
njlynch added a commit to aws/aws-cdk that referenced this pull request Mar 5, 2021
See CDK RFC 253 (aws/aws-cdk-rfcs#254) for background and details.

Currently -- if a user has not opted out -- an AWS::CDK::Metadata resource
is added to each generated stack template with details about each loaded module
and version that matches an Amazon-specific allow list.

This modules list is used to:

- Track what library versions customers are using so they can be contacted in
  the event of a severe (security) issue with a library.
- Get business metrics on the adoption of CDK and its libraries.

This modules list is sometimes inaccurate (a module may be loaded into memory
without actually being used) and too braod to support CDK v2.

This feature implements (mostly) the specification proposed in RFC 253 to
include metadata about what constructs are present in each stack, rather than
modules loaded into memory. The allow-list is still used to ensure only CDK/AWS
constructs are reported on.

Implementation notes:
- The format of the Analytics property has changed slightly since the RFC. See
  the service-side code for justification and latest spec.
- How to handle the jsii runtime information was left un-spec'd. I've chosen to
  create a psuedo-Construct to add to the list as the simplest solution.
- `runtime-info.test.ts` leaps through some serious hoops to work equally well
  for both v1 and v2, and to fail somewhat gracefully locally if `tsc` was used
  to compile the module instead of `jsii`. Critques of this approach welcome!
- I removed an annoyance from `resolve-version-lib.js` that produced error
  messages when running unit tests.
mergify bot pushed a commit to aws/aws-cdk that referenced this pull request Mar 10, 2021
…13423)

See CDK RFC 253 (aws/aws-cdk-rfcs#254) for background and details.

Currently -- if a user has not opted out -- an AWS::CDK::Metadata resource
is added to each generated stack template with details about each loaded module
and version that matches an Amazon-specific allow list.

This modules list is used to:

- Track what library versions customers are using so they can be contacted in
  the event of a severe (security) issue with a library.
- Get business metrics on the adoption of CDK and its libraries.

This modules list is sometimes inaccurate (a module may be loaded into memory
without actually being used) and too braod to support CDK v2.

This feature (mostly) implements the specification proposed in RFC 253 to
include metadata about what constructs are present in each stack, rather than
modules loaded into memory. The allow-list is still used to ensure only CDK/AWS
constructs are reported on.

Implementation notes:
- The format of the Analytics property has changed slightly since the RFC. See
  the service-side code for justification and latest spec.
- How to handle the jsii runtime information was left un-spec'd. I've chosen to
  create a psuedo-Construct to add to the list as the simplest solution.
- `runtime-info.test.ts` leaps through some serious hoops to work equally well
  for both v1 and v2, and to fail somewhat gracefully locally if `tsc` was used
  to compile the module instead of `jsii`. Critques of this approach welcome!
- I removed an annoyance from `resolve-version-lib.js` that produced error
  messages when running unit tests.


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants