Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Handle Duplicative Import Names #5377

Merged
merged 10 commits into from
Dec 11, 2024

Conversation

noanflaherty
Copy link
Collaborator

Description

Sibling PR: #5365

The goal of this PR is to handle auto-aliasing of imports when there are naming conflicts.

Changes Made

  • Track ref name overrides on the Writer
  • Build up the map of overrides as part of PythonFile's write method

Testing

  • Unit tests added/updated

@@ -44,47 +48,99 @@ export class PythonFile extends AstNode {
}

public write(writer: Writer): void {
const uniqueReferences = this.deduplicateReferences();

this.updateWriterRefNameOverrides({ writer, uniqueReferences });
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is where the magic happens. Within it, we add state to Writer

this.statements.forEach((statement, idx) => {
statement.write(writer);
writer.newLine();
if (idx < this.statements.length - 1) {
writer.newLine();
}
});

writer.unsetRefNameOverrides();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wipe the state we had set on the writer just to be safe.

Comment on lines 116 to 117
"from .cars import Car as Car_1
from .vehicles import Car as Car_2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose a _n suffix, but am open to suggestions here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvargas92495 lmk if you have a preferred naming convention here.

Copy link
Collaborator

@dvargas92495 dvargas92495 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during normal development, I usually do something module dependent. eg

from .cars import Car as CarsCar
from .vehicles import Car as VehiclesCar

Comment on lines +46 to +47
const nameOverride = writer.getRefNameOverride(this);
writer.write(nameOverride.name);
Copy link
Collaborator Author

@noanflaherty noanflaherty Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the second (of 2) place where we use state that was added to the Writer. Now, a Reference object gets its name from the Writer during its own write method.

@@ -44,47 +48,99 @@ export class PythonFile extends AstNode {
}

public write(writer: Writer): void {
const uniqueReferences = this.deduplicateReferences();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to get rid of the need for this method altogether later. Right now, the data structure we use for ref tracking allows for duplicative refs, but there's no reason it needs to. We can deduplicate upon adding a new ref.

In any case, this is just a refactor to take existing logic that was there and move it into a helper so its return value can be used in two places directly below.

}

private getImportName({ writer, reference }: { writer: Writer; reference: Reference }): string {
const nameOverride = writer.getRefNameOverride(reference);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one of two places where we use the state that was added to the Writer.

@noanflaherty noanflaherty marked this pull request as ready for review December 10, 2024 00:35

// Build up a map of refs to their name overrides, keeping track of howmany times we've seen a name as we go.
const completeRefPathsToNameOverrides: Record<string, { name: string; isAlias: boolean }> = {};
const nameUsageCounts: Record<string, number> = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to track the counts? We only care if it's used at all right? Could we just use reservedNames instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do for the current naming convention scheme, but once I adapt to match Vargas's proposal, we won't need to.

generators/python-v2/ast/src/Reference.ts Outdated Show resolved Hide resolved
generators/python-v2/ast/src/core/Writer.ts Outdated Show resolved Hide resolved
generators/python-v2/ast/src/PythonFile.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@amckinney amckinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to unblock, but there's a few things I think we can improve here. I trust you to consider them before merging, but I'll leave you to making a judgement call for your use case.

uniqueReferences
}: {
writer: Writer;
uniqueReferences: Map<string, { modulePath: ModulePath; references: Reference[]; referenceNames: Set<string> }>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we define an interface for this map value?

Comment on lines 70 to 77
public getFullyQualifiedModulePath(): string {
public getFullyQualifiedPath(): string {
return this.modulePath.join(".");
}

public getCompletePath(): string {
return `${this.getFullyQualifiedPath()}.${this.name}`;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would use getFullyQualifiedModulePath and getFullyQualifiedPath (since omitting the Module component implies its for the reference), but feel free to ignore if you disagree. Not a big deal either way, and it's easy to change later (if ever).

Comment on lines 121 to 124
class Car:
car = CarsCar()
automobile = AutomobilesCar()
vehicle = VehiclesAutomobilesCar()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Comment on lines +1 to +48
export function createPythonClassName(input: string): string {
// Handle empty input
if (!input) {
return "Class";
}

// Clean up the input string
let cleanedInput = input
.replace(/[^a-zA-Z0-9\s_-]/g, " ") // Replace special characters with spaces
.replace(/[-_\s]+/g, " ") // Replace hyphens, underscores and multiple spaces with single space
.trim(); // Remove leading/trailing spaces

// Handle numeric-only or empty string after cleanup
if (!cleanedInput || /^\d+$/.test(cleanedInput)) {
return "Class" + (cleanedInput || "");
}

// Handle strings starting with numbers
if (/^\d/.test(cleanedInput)) {
cleanedInput = "Class" + cleanedInput;
}

// Split into words and handle special cases
const words = cleanedInput
.split(/(?=[A-Z])|[-_\s]+/)
.filter((word) => word.length > 0)
.map((word) => {
// Fix any garbled text by splitting on number boundaries
return word.split(/(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)/).filter((w) => w.length > 0);
})
.flat();

// Process each word
return words
.map((word, index) => {
// If it's the first word and starts with a number, prepend "Class"
if (index === 0 && /^\d/.test(word)) {
return "Class" + word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
}
// Preserve words that are all uppercase and longer than one character
if (word.length > 1 && word === word.toUpperCase() && !/^\d+$/.test(word)) {
return word;
}
// Capitalize first letter, lowercase rest
return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
})
.join("");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of custom regex in here - could we have used some combination of lodash functions for the same effect? This might be as simple as just calling upperFirst(camelCase(name)) to generate PascalCase like this (plus potentially some other minor cleanups).

Would you mind looking into this a bit to see if we can remove this altogether?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is one we use ourselves and was the result of a lot of collabing with cursor until it passed all test cases. I'll make a timebound effort to see if it can be simplified with lodash functions, but default plan will be to roll with this as is, relying on the thorough tests in this PR to protect against future refactors if they happen.

if (modulePathIdx < 0 || !module) {
nameOverride = `_${nameOverride}`;
} else {
nameOverride = `${createPythonClassName(module)}${nameOverride}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if we don't use a simple _ strategy, the generated names can potentially get long unless you maintain a collision map and assign type names more carefully.

For example, imagine the following module name (which we actually sometimes see in practice):

from .resources.v1.api.vehicles.automobiles import Car as ResourcesV1ApiVehiclesAutomobilesCar
class Car:
    vehicle = ResourcesV1ApiVehiclesAutomobilesCar()

For what it's worth, we do this in the original Go generator but only prepend names as needed (until we find a unique name). Given that we traverse the statements in a deterministic order, the import alias assignment is always guaranteed to be unique. You can check out the method docs here, and this test case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm misunderstanding, but I assert the current implementation does only prepend names as needed until it comes up with one that's unique. It goes from right to left, adding one module to the name at a time, until it finds that it's come up with a unique name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d'oh! I completely misread the implementation - that's on me. It might be worth adding a test case to confirm that the module paths are as succinct as possible, but only if it's straightforward for you. It looks like the cases in PythonFile.test.ts.snap all use the full module path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the test to make this clear makes sense. Will do!

@noanflaherty noanflaherty enabled auto-merge (squash) December 11, 2024 17:23
@noanflaherty noanflaherty merged commit 7687e8b into main Dec 11, 2024
49 checks passed
@noanflaherty noanflaherty deleted the noa/handle-duplicative-names branch December 11, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants