Skip to content

Commit

Permalink
docs: modernize py dependencies docs and example (#32345)
Browse files Browse the repository at this point in the history
* feat: update Python multifile docs

A more common approach to packaging Python package is leveraging
pyproject.toml files and having a src directory (instead of a flat
directory). This change intends to update the documentation and examples
to match this way of packaging Python packages.

* fix: fix juliaset package path

* cleanup: move main file outside src

* docs: address feedback #32345

Add build-system to pyproject.toml.
Improve wording on documentation.
Add extra step when using custom images.

* fix: fix juliaset path

* nit: remove extra space

* lint: format setup.py

* nit: reorder entries in pyproject.toml

* update the description

---------

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
  • Loading branch information
nitobuendia and tvalentyn committed Aug 29, 2024
1 parent a895469 commit 28f2d47
Show file tree
Hide file tree
Showing 8 changed files with 91 additions and 42 deletions.
33 changes: 33 additions & 0 deletions sdks/python/apache_beam/examples/complete/juliaset/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

[project]
name = "juliaset"
version = "0.0.1"
description = "Julia set workflow package."

# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
dependencies = [
"numpy"
]

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
26 changes: 8 additions & 18 deletions sdks/python/apache_beam/examples/complete/juliaset/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,16 @@
# limitations under the License.
#

"""Setup.py module for the workflow's worker utilities.
"""setup.py module for the pipeline package.
All the workflow related code is gathered in a package that will be built as a
source distribution, staged in the staging area for the workflow being run and
then installed in the workers when they start running.
In this example, the pipeline code is gathered in a package that can be built
as source distribution and installed on the workers. The package is defined
in the pyproject.toml file. You can use setup.py file for defining
configuration that needs to be determined programatically, for example,
custom commands to run when a package is installed.
This behavior is triggered by specifying the --setup_file command line option
when running the workflow for remote execution.
You can install this package into the workers at runtime by using
the --setup_file pipeline option.
"""

# pytype: skip-file
Expand Down Expand Up @@ -107,19 +109,7 @@ def run(self):
self.RunCustomCommand(command)


# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = [
'numpy',
]

setuptools.setup(
name='juliaset',
version='0.0.1',
description='Julia set workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
# Command class instantiated and run during pip install scenarios.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

import pytest

from apache_beam.examples.complete.juliaset.juliaset import juliaset
from apache_beam.examples.complete.juliaset.src.juliaset import juliaset
from apache_beam.testing.util import open_shards


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
import pytest
from hamcrest.core.core.allof import all_of

from apache_beam.examples.complete.juliaset.juliaset import juliaset
from apache_beam.examples.complete.juliaset.src.juliaset import juliaset
from apache_beam.io.filesystems import FileSystems
from apache_beam.runners.runner import PipelineState
from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,43 +95,53 @@ If your pipeline uses packages that are not available publicly (e.g. packages th

Often, your pipeline code spans multiple files. To run your project remotely, you must group these files as a Python package and specify the package when you run your pipeline. When the remote workers start, they will install your package. To group your files as a Python package and make it available remotely, perform the following steps:

1. Create a [setup.py](https://pythonhosted.org/an_example_pypi_project/setuptools.html) file for your project. The following is a very basic `setup.py` file.
1. Create a [pyproject.toml](https://packaging.python.org/en/latest/tutorials/packaging-projects/) file for your project. The following is a very basic `pyproject.toml` file.

import setuptools
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "PACKAGE-NAME"
version = "PACKAGE-VERSION"
dependencies = [
# List Python packages your pipeline depends on.
]

setuptools.setup(
name='PACKAGE-NAME',
version='PACKAGE-VERSION',
install_requires=[
# List Python packages your pipeline depends on.
],
packages=setuptools.find_packages(),
)
2. If your package requires if some programmatic configuration, or you need to use the `--setup_file` pipeline option, create a setup.py file for your project.

2. Structure your project so that the root directory contains the `setup.py` file, the main workflow file, and a directory with the rest of the files, for example:
# Note that the package can be completely defined by pyproject.toml.
# This file is optional.
import setuptools
setuptools.setup()

3. Structure your project so that the root directory contains the `pyproject.toml`, the `setup.py` file, and a `src/` directory with the rest of the files. For example:

root_dir/
pyproject.toml
setup.py
main.py
my_package/
my_pipeline_launcher.py
my_custom_dofns_and_transforms.py
other_utils_and_helpers.py
src/
main.py
my_package/
my_pipeline_launcher.py
my_custom_dofns_and_transforms.py
other_utils_and_helpers.py

See [Juliaset](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset) for an example that follows this project structure.

3. Install your package in the submission environment, for example by using the following command:
4. Install your package in the submission environment, for example by using the following command:

pip install -e .

4. Run your pipeline with the following command-line option:
5. If you use a [custom container](#custom-containers), copy and install the package in the container as well.

6. Run your pipeline with the following command-line option:

--setup_file /path/to/setup.py

**Note:** It is not necessary to supply the `--requirements_file` [option](#pypi-dependencies) if the dependencies of your package are defined in the `install_requires` field of the `setup.py` file (see step 1).
However unlike with the `--requirements_file` option, when you use the `--setup_file` option, Beam doesn't stage the dependent packages to the runner.
Only the pipeline package is staged. If they aren't already provided in the runtime environment,
the package dependencies are installed from PyPI at runtime.
**Note:** It is not necessary to supply the `--requirements_file` [option](#pypi-dependencies) if the dependencies of your package are defined in the
`dependencies` field of the `pyproject.toml` file (see step 1). However unlike with the `--requirements_file` option, when you use the `--setup_file` option, Beam doesn't stage the dependent packages to the runner.
Only the pipeline package is staged. If they aren't already provided in the runtime environment, the package dependencies are installed from PyPI at runtime.


## Non-Python Dependencies or PyPI Dependencies with Non-Python Dependencies {#nonpython}
Expand Down

0 comments on commit 28f2d47

Please sign in to comment.