fix(python): Handle Duplicative Import Names #5377

noanflaherty · 2024-12-10T00:27:40Z

Description

Sibling PR: #5365

The goal of this PR is to handle auto-aliasing of imports when there are naming conflicts.

Changes Made

Track ref name overrides on the Writer
Build up the map of overrides as part of PythonFile's write method

Testing

Unit tests added/updated

noanflaherty · 2024-12-10T00:28:13Z

generators/python-v2/ast/src/PythonFile.ts

@@ -44,47 +48,99 @@ export class PythonFile extends AstNode {
    }

    public write(writer: Writer): void {
+        const uniqueReferences = this.deduplicateReferences();
+
+        this.updateWriterRefNameOverrides({ writer, uniqueReferences });


This method is where the magic happens. Within it, we add state to Writer

noanflaherty · 2024-12-10T00:28:24Z

generators/python-v2/ast/src/PythonFile.ts

        this.statements.forEach((statement, idx) => {
            statement.write(writer);
            writer.newLine();
            if (idx < this.statements.length - 1) {
                writer.newLine();
            }
        });
+
+        writer.unsetRefNameOverrides();


Wipe the state we had set on the writer just to be safe.

noanflaherty · 2024-12-10T00:29:36Z

generators/python-v2/ast/src/__test__/__snapshots__/PythonFile.test.ts.snap

+"from .cars import Car as Car_1
+from .vehicles import Car as Car_2


I chose a _n suffix, but am open to suggestions here.

@dvargas92495 lmk if you have a preferred naming convention here.

during normal development, I usually do something module dependent. eg

from .cars import Car as CarsCar from .vehicles import Car as VehiclesCar

noanflaherty · 2024-12-10T00:30:06Z

generators/python-v2/ast/src/Reference.ts

+        const nameOverride = writer.getRefNameOverride(this);
+        writer.write(nameOverride.name);


Here's the second (of 2) place where we use state that was added to the Writer. Now, a Reference object gets its name from the Writer during its own write method.

noanflaherty · 2024-12-10T00:33:30Z

generators/python-v2/ast/src/PythonFile.ts

@@ -44,47 +48,99 @@ export class PythonFile extends AstNode {
    }

    public write(writer: Writer): void {
+        const uniqueReferences = this.deduplicateReferences();


I hope to get rid of the need for this method altogether later. Right now, the data structure we use for ref tracking allows for duplicative refs, but there's no reason it needs to. We can deduplicate upon adding a new ref.

In any case, this is just a refactor to take existing logic that was there and move it into a helper so its return value can be used in two places directly below.

noanflaherty · 2024-12-10T00:34:07Z

generators/python-v2/ast/src/PythonFile.ts

+    }
+
+    private getImportName({ writer, reference }: { writer: Writer; reference: Reference }): string {
+        const nameOverride = writer.getRefNameOverride(reference);


Here is one of two places where we use the state that was added to the Writer.

generators/python-v2/ast/src/core/Writer.ts

amckinney · 2024-12-10T18:02:41Z

generators/python-v2/ast/src/PythonFile.ts

+
+        // Build up a map of refs to their name overrides, keeping track of howmany times we've seen a name as we go.
+        const completeRefPathsToNameOverrides: Record<string, { name: string; isAlias: boolean }> = {};
+        const nameUsageCounts: Record<string, number> = {};


Do we actually need to track the counts? We only care if it's used at all right? Could we just use reservedNames instead?

We do for the current naming convention scheme, but once I adapt to match Vargas's proposal, we won't need to.

generators/python-v2/ast/src/Reference.ts

generators/python-v2/ast/src/core/Writer.ts

generators/python-v2/ast/src/PythonFile.ts

…e-names

amckinney

Stamping to unblock, but there's a few things I think we can improve here. I trust you to consider them before merging, but I'll leave you to making a judgement call for your use case.

amckinney · 2024-12-11T14:39:08Z

generators/python-v2/ast/src/PythonFile.ts

+        uniqueReferences
+    }: {
+        writer: Writer;
+        uniqueReferences: Map<string, { modulePath: ModulePath; references: Reference[]; referenceNames: Set<string> }>;


nit: Should we define an interface for this map value?

amckinney · 2024-12-11T14:40:52Z

generators/python-v2/ast/src/Reference.ts

-    public getFullyQualifiedModulePath(): string {
+    public getFullyQualifiedPath(): string {
        return this.modulePath.join(".");
    }
+
+    public getCompletePath(): string {
+        return `${this.getFullyQualifiedPath()}.${this.name}`;
+    }


nit: I would use getFullyQualifiedModulePath and getFullyQualifiedPath (since omitting the Module component implies its for the reference), but feel free to ignore if you disagree. Not a big deal either way, and it's easy to change later (if ever).

amckinney · 2024-12-11T14:42:11Z

generators/python-v2/ast/src/__test__/__snapshots__/PythonFile.test.ts.snap

+class Car:
+    car = CarsCar()
+    automobile = AutomobilesCar()
+    vehicle = VehiclesAutomobilesCar()


amckinney · 2024-12-11T14:49:29Z

generators/python-v2/ast/src/core/utils.ts

+export function createPythonClassName(input: string): string {
+    // Handle empty input
+    if (!input) {
+        return "Class";
+    }
+
+    // Clean up the input string
+    let cleanedInput = input
+        .replace(/[^a-zA-Z0-9\s_-]/g, " ") // Replace special characters with spaces
+        .replace(/[-_\s]+/g, " ") // Replace hyphens, underscores and multiple spaces with single space
+        .trim(); // Remove leading/trailing spaces
+
+    // Handle numeric-only or empty string after cleanup
+    if (!cleanedInput || /^\d+$/.test(cleanedInput)) {
+        return "Class" + (cleanedInput || "");
+    }
+
+    // Handle strings starting with numbers
+    if (/^\d/.test(cleanedInput)) {
+        cleanedInput = "Class" + cleanedInput;
+    }
+
+    // Split into words and handle special cases
+    const words = cleanedInput
+        .split(/(?=[A-Z])|[-_\s]+/)
+        .filter((word) => word.length > 0)
+        .map((word) => {
+            // Fix any garbled text by splitting on number boundaries
+            return word.split(/(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)/).filter((w) => w.length > 0);
+        })
+        .flat();
+
+    // Process each word
+    return words
+        .map((word, index) => {
+            // If it's the first word and starts with a number, prepend "Class"
+            if (index === 0 && /^\d/.test(word)) {
+                return "Class" + word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
+            }
+            // Preserve words that are all uppercase and longer than one character
+            if (word.length > 1 && word === word.toUpperCase() && !/^\d+$/.test(word)) {
+                return word;
+            }
+            // Capitalize first letter, lowercase rest
+            return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
+        })
+        .join("");
+}


There's a lot of custom regex in here - could we have used some combination of lodash functions for the same effect? This might be as simple as just calling upperFirst(camelCase(name)) to generate PascalCase like this (plus potentially some other minor cleanups).

Would you mind looking into this a bit to see if we can remove this altogether?

The current implementation is one we use ourselves and was the result of a lot of collabing with cursor until it passed all test cases. I'll make a timebound effort to see if it can be simplified with lodash functions, but default plan will be to roll with this as is, relying on the thorough tests in this PR to protect against future refactors if they happen.

amckinney · 2024-12-11T14:55:05Z

generators/python-v2/ast/src/PythonFile.ts

+                if (modulePathIdx < 0 || !module) {
+                    nameOverride = `_${nameOverride}`;
+                } else {
+                    nameOverride = `${createPythonClassName(module)}${nameOverride}`;


Note that if we don't use a simple _ strategy, the generated names can potentially get long unless you maintain a collision map and assign type names more carefully.

For example, imagine the following module name (which we actually sometimes see in practice):

from .resources.v1.api.vehicles.automobiles import Car as ResourcesV1ApiVehiclesAutomobilesCar class Car: vehicle = ResourcesV1ApiVehiclesAutomobilesCar()

For what it's worth, we do this in the original Go generator but only prepend names as needed (until we find a unique name). Given that we traverse the statements in a deterministic order, the import alias assignment is always guaranteed to be unique. You can check out the method docs here, and this test case.

Perhaps I'm misunderstanding, but I assert the current implementation does only prepend names as needed until it comes up with one that's unique. It goes from right to left, adding one module to the name at a time, until it finds that it's come up with a unique name.

d'oh! I completely misread the implementation - that's on me. It might be worth adding a test case to confirm that the module paths are as succinct as possible, but only if it's straightforward for you. It looks like the cases in PythonFile.test.ts.snap all use the full module path.

Updating the test to make this clear makes sense. Will do!

noanflaherty added 2 commits December 9, 2024 19:19

Get it working

0deba8d

Optimize and simplify code

b0f58b1

noanflaherty commented Dec 10, 2024

View reviewed changes

noanflaherty requested a review from amckinney December 10, 2024 00:30

noanflaherty commented Dec 10, 2024

View reviewed changes

generators/python-v2/ast/src/core/Writer.ts Outdated Show resolved Hide resolved

noanflaherty marked this pull request as ready for review December 10, 2024 00:35

noanflaherty requested a review from dsinghvi as a code owner December 10, 2024 00:35

amckinney reviewed Dec 10, 2024

View reviewed changes

noanflaherty added 5 commits December 10, 2024 14:37

Merge remote-tracking branch 'origin/main' into noa/handle-duplicativ…

86949d0

…e-names

Address nits

9892bf1

Add util for generating python class names

6432ff1

Integrate and update tests

9dfa81c

Update test

ddf4a63

noanflaherty force-pushed the noa/handle-duplicative-names branch from 2430804 to ddf4a63 Compare December 11, 2024 01:15

noanflaherty mentioned this pull request Dec 11, 2024

[Codegen] De-Duplicate Node Module Names vellum-ai/vellum-python-sdks#365

Merged

amckinney approved these changes Dec 11, 2024

View reviewed changes

noanflaherty added 3 commits December 11, 2024 12:09

Use named interface for UniqueReferenceValue

4a93b69

Rename to getFullyQualifiedModulePath

5d4658b

Add test case

6e269be

noanflaherty enabled auto-merge (squash) December 11, 2024 17:23

noanflaherty merged commit 7687e8b into main Dec 11, 2024
49 checks passed

noanflaherty deleted the noa/handle-duplicative-names branch December 11, 2024 17:27

noanflaherty mentioned this pull request Dec 12, 2024

fix(python): Fix Case of Multiple Star Imports #5394

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(python): Handle Duplicative Import Names #5377

fix(python): Handle Duplicative Import Names #5377

noanflaherty commented Dec 10, 2024

noanflaherty Dec 10, 2024

noanflaherty Dec 10, 2024

noanflaherty Dec 10, 2024

noanflaherty Dec 10, 2024

dvargas92495 Dec 10, 2024 •

edited

Loading

noanflaherty Dec 10, 2024 •

edited

Loading

noanflaherty Dec 10, 2024

noanflaherty Dec 10, 2024

amckinney Dec 10, 2024

noanflaherty Dec 10, 2024

amckinney left a comment

amckinney Dec 11, 2024

amckinney Dec 11, 2024

amckinney Dec 11, 2024

amckinney Dec 11, 2024

noanflaherty Dec 11, 2024

amckinney Dec 11, 2024

noanflaherty Dec 11, 2024

amckinney Dec 11, 2024

noanflaherty Dec 11, 2024

		"from .cars import Car as Car_1
		from .vehicles import Car as Car_2

		const nameOverride = writer.getRefNameOverride(this);
		writer.write(nameOverride.name);

fix(python): Handle Duplicative Import Names #5377

fix(python): Handle Duplicative Import Names #5377

Conversation

noanflaherty commented Dec 10, 2024

Description

Changes Made

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dvargas92495 Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

noanflaherty Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amckinney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dvargas92495 Dec 10, 2024 •

edited

Loading

noanflaherty Dec 10, 2024 •

edited

Loading