Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sagemaker compiler #662

Merged
merged 9 commits into from
Nov 23, 2023
Merged

Feature/sagemaker compiler #662

merged 9 commits into from
Nov 23, 2023

Conversation

GeorgesLorre
Copy link
Collaborator

resolves: #661

msg,
)

def build_command(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this could be reused by other compilers and be abstracted more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make it an internal function and rename it to get_build_command()


steps: t.List[t.Any] = []

with tempfile.TemporaryDirectory(dir=os.getcwd()) as tmpdirname:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scripts are uploaded to s3 by sagemaker once we compile the pipeline that is why we save the files to a temp folder that only gets deleted at the end.

@GeorgesLorre
Copy link
Collaborator Author

So the coverage dropping is quite painful but the coverage overall is pretty bad for compiler.py. Once we have all the pieces in place (updated runners, abstracted command building) we can take another good look on how to test the compilers.

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Georges! Left a few comments.

output_path: the path where to save the Kubeflow pipeline spec.
instance_type: the instance type to use for the processing steps
(see: https://aws.amazon.com/ec2/instance-types/ for options).
role_arn: the role arn to use for the processing steps,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
role_arn: the role arn to use for the processing steps,
role_arn: the Amazon Resource Name role to use for the processing steps,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't very clear what "arn" is

Comment on lines 581 to 587
def compile(
self,
pipeline: Pipeline,
output_path: str,
instance_type: str = "ml.t3.medium",
role_arn: t.Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def compile(
self,
pipeline: Pipeline,
output_path: str,
instance_type: str = "ml.t3.medium",
role_arn: t.Optional[str] = None,
def compile(
self,
pipeline: Pipeline,
*,
output_path: str,
instance_type: str = "ml.t3.medium",
role_arn: t.Optional[str] = None,

)

if not role_arn:
role_arn = self.sagemaker.get_execution_role()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on what this does exactly?

]

# with dependencies
dependencies = ["component_2"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpick but would name it component_1. Right now it looks like the component is dependent on itself

component_name: str,
command: t.List[str],
directory: str,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a return type?

)
depends_on = [steps[-1]] if component["dependencies"] else []

script = self.generate_component_script(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename script to script_path. At first glance it seems like the function is returning the script string

msg,
)

def build_command(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make it an internal function and rename it to get_build_command()

@@ -65,7 +65,7 @@ def run(
experiment = self.client.get_experiment(experiment_name=experiment_name)
except ValueError:
logger.info(
f"Defined experiment '{experiment_name}' not found. Creating new experiment"
f"defined experiment '{experiment_name}' not found. creating new experiment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous formatting was ok

compiler = SagemakerCompiler()
command = ["echo", "hello world"]
with tmp_path_factory.mktemp("temp") as fn:
script = compiler.generate_component_script("component_1", command, fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with the path

Suggested change
script = compiler.generate_component_script("component_1", command, fn)
script_path = compiler.generate_component_script("component_1", command, fn)

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Georges!

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre!

@GeorgesLorre GeorgesLorre merged commit c01dd02 into main Nov 23, 2023
5 of 6 checks passed
@GeorgesLorre GeorgesLorre deleted the feature/sagemaker-compiler branch November 23, 2023 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Sagemaker compiler that loops over fondant pipeline and creates a Sagemaker pipeline spec
4 participants