diff --git a/docs/basics/core_concepts.md b/docs/basics/core_concepts.md index 6087e520..784a6658 100644 --- a/docs/basics/core_concepts.md +++ b/docs/basics/core_concepts.md @@ -8,232 +8,219 @@ sidebar_position: 2 # Core concepts -## `@fused.udf` +Client systems call Fused API endpoints to trigger and load data from serverless Python operations. +These operations can interact with data across any systems that can be interfaced with Python. For example, load a table from Salesforce, perform a transformation, then write it into another data store or even directly to a consuming application like Google Sheets. -User Defined Functions (UDFs) are building blocks of geospatial operations that integrate across the stack. They run Python functions over any size dataset and return the output. +What differentiates Fused from traditional ETL and orchestrators is that a Fused "job" is triggered by a simple parametrized HTTP call. This means they run and load data at any point of the stack. Any mechanism that makes HTTP requests can run UDFs and load their response data - this includes the terminal, ETL jobs, webhooks, Tile maps, frontend applications, and even browser requests. -To write a UDF, decorate a Python function with `@fused.udf` - this tells Fused to give the function special treatment. Encapsulate the business logic within the function and return the data object to visualize. +Furthermore, Fused doesn't try to boil the ocean: it loads only the fraction of data needed for an operation and strategically caches for efficiency. Like DuckDB - but with the entire flexibility of native Python at its disposal. -To illustrate, this UDF is a function called `udf` that returns a dataframe. Notice how its import statements are placed within the function declaration. The `bbox` argument gives the data spatial awareness, which you can read more about [here](/basics/core-concepts/#file--tile). +The following sections outline the core concepts you should grow to understand as you acquaint yourself with Fused. They begin with an overview of the User Defined Function's anatomy, navigate through helper functions and runtime state management, and conclude with an explanation of how external systems integrate with UDFs via the Fused Hosted API. +## The UDF -```python +User Defined Functions (UDFs) are Python operations that integrate across the stack via HTTP endpoints. When called, a UDF endpoint runs a serverless Python function over any size dataset and returns the function output. UDFs are building blocks that can be assembled into complex workflows in which UDFs call eachother or run in parallel. -@fused.udf -def udf(bbox, table_path="s3://fused-asset/infra/building_msft_us"): - from utils import table_to_tile - df=table_to_tile(bbox, table=table_path) - return df -``` +A UDF is a python function with the following components: -:::tip -To visualize the output of a UDF on Workbench, the function should return a Raster or Vector object. Workbench will render the UDF's returned data as a map layer. Read more about return types [here](/basics/core-concepts/#udf-execution-modes-file-tile). -::: -#### Syntax to keep in mind +- [a) `@fused.udf` decorator](/basics/core-concepts/#a-fusedudf-decorator) +- [b) Function declaration](/basics/core-concepts/#b-function-declaration) +- [c) Typed parameters](/basics/core-concepts/#c-typed-parameters) +- [d) Return object](/basics/core-concepts/#d-return-object) -The Fused compute engine recognizes the UDF as self-contained function. This means that developers should: +The structure of the API call determines a UDFs execution mode. -a) Decorate the UDF function with `@fused.udf`. +### Execution modes (File & Tile) -b) Declare imports within the function. +Fused automatically creates an endpoint for all saved Fused UDF. When a client application calls a UDF endpoint, Fused runs a lightweight serverless Python operation and returns the function output. Its engine leverages industry standard cloud optimized dataset formats to efficiently pull specific fragments of datasets - based on specified geographic or logical partitions. -c) Encapsulate helper functions as importable `util modules` of the UDF. +Fused can processes datasets of any size and serve them as dynamic vector and raster tilesets. Instead of loading an entire dataset, which is an expensive operation, Fused tile layers load instantly because they operate on a fraction of the dataset - in parallel. Tile-level spatial filtering supercharges UDFs to process only specific spatial areas within a geopartitioned dataset. -d) Optionally, enable autodetection with [explicit typing](/workbench/udf-editor/#auto-tile-and-file). - -That’s all the new syntax you need to remember to get started! - -#### Saving UDFs - -UDFs are saved as a directory of associated files that furnish functionality to run anywhere. This makes them shareable. - -For example, the following snippet saves a UDF in a local directory, `Sample_UDF`. - -```python -import fused - -@fused.udf -def my_udf(): - return "Hello from Fused!" -# Save locally -my_udf.to_directory('Sample_UDF') -``` +This enables a UDF endpoint to act as a remote `File`, as a one-off task, or to become a dynamic `Tile` endpoint that interoperates with map tiling system. -The directory contains the UDF's documentation, code, metadata, and utility function code. +#### File & Tile -``` -└── Sample_UDF - ├── README.MD - ├── Sample_UDF.py - ├── meta.json - └── utils.py -``` +Depending on how a client application calls an endpoint, the same endpoint can run a UDF as a one-off `File` or as a dynamic `Tile`. -Files relevant to each UDF are: +- When an endpoint is called as a `File`, the UDF runs only once and returns a single batch of output data. This behaves like the access pattern for a remote file URL. +- When and endpoint is called as a `Tile`, the endpoint becomes interoperable with map tiling clients. The endpoint is called and returns data in parallel for each tile - specified by templated `X`, `Y`, and `Z` indices. -- `README.md` Provides details of the UDF's purpose and how it works. -- `Sample_UDF.py` This eponymous Python file contains the UDF's business logic as a Python function decorated with `@fused.udf`. -- `meta.json` This file contains metadata needed to render the UDF in the Fused explorer and for the UDF to run correctly. -- `utils.py` This Python file contains helper functions the UDF (optionally) imports and references. +Read-on to understand the nuances between the two way UDF endpoints can be called. +#### Call a UDF endpoint as a File +By default, a UDF runs as `File` - it executes once and returns a single output that corresponds to the input parameters. The UDF endpoint behaves like a remote file in that calling it returns a single batch of data - but the endpoint also accepts parameters that dynamically influence the UDF's execution. -## Utility modules +This enables client applications to make an HTTP request and load the UDF's output data into the tool that makes the call. -Utility modules enhance the functionality and maintainability of UDFs. +Note that files are downloaded entirely. Even if the data is requested as a Parquet. -As UDFs grow in complexity, it's useful to modularize the code to make it reusable and composable. It's also a good practice to keep only the essential "business logic" in the decorated UDF function - this makes it easy to know what a UDF does at a glance. +#### Call a UDF endpoint as a Tile -With this in mind, a Fused UDF can optionally reference a module to import Python objects from it, with an import statement as if importing from a Python package. These modules are reusable Python functions that promote code reuse and speed up development time. UDFs can import from a variety of sources: from the local environment, from GitHub, and from other UDFs. This section shows how to import modules into UDFs form each of these sources. +The same UDF's API endpoint can be called to run like a Tile. This makes it possible for Fused to work like a Tile server that loads vector or raster data into industry standard tools that render [tiled web maps](https://en.wikipedia.org/wiki/Tiled_web_map) - think Leaflet, Mapbox, Foursquare Studio, Lonboard, and beyond. Tiling clients can make dozens of simultaneous calls to the Fused API endpoint - one for each tile - and seamlessly stitch the outputs to render a map. Instead of operating on an entire dataset, Fused only acts on the data that corresponds to the area visible in the current viewport. -### From local +:::tip +You can read more about the XYZ indexing system in the [Deck.gl](http://Deck.gl) [documentation](https://deck.gl/docs/api-reference/geo-layers/tile-layer#indexing-system). In fact, Fused Workbench runs UDFs on a serverless backend and renders output in Deck.gl. +::: -Local modules are Python files in the same environment as the UDF. +### a) `@fused.udf` decorator +To create a UDF, decorate a Python function with `@fused.udf`. This decorator automatically the function into a serverless endpoint that can be invoked via HTTP requests and gives it the ability to fractionally load data. -In the Workbench, the "module" code editor tab is the place for helper functions and other associated Python objects for the UDF to import. Keep in mind that the module's name is configurable in order to avoid naming collisions. In this example, UDF imports the function `arr_to_plasma` from its module, which is named `utils`. The function contains support logic the UDF uses it to transform an array. +This simplified example illustrates the concept. It's that simple. ```python @fused.udf -def udf(bbox): - from utils import arr_to_plasma +def my_udf(): ... - return arr_to_plasma(arr.values, min_max=(0, .8)) + return gdf ``` -![Alt text](https://fused-magic.s3.us-west-2.amazonaws.com/docs_assets/image-33.png) +### b) Function declaration -When importing a module from a Python environment other than Workbench, the module must be specified as the locally-scoped file name in the `headers` argument of the `@fused.udf` decorator. This lets Fused know how to complete the reference. +The next step is to structure the function's business logic to interact with upstream data sources and return an object which will be the UDF's output. + +To illustrate, this UDF is a function called `udf` that returns a dataframe. Notice how its import statements are placed within the function declaration. The `bbox` argument gives the data spatial awareness, which you can read more about [here](/basics/core-concepts/#file--tile). ```python -@fused.udf( - headers=['utils.py'] -) -def udf(bbox): - from utils import arr_to_plasma - ... - return arr_to_plasma(arr.values, min_max=(0, .8)) +@fused.udf +def udf(bbox, table_path="s3://fused-asset/infra/building_msft_us"): + from utils import table_to_tile + df=table_to_tile(bbox, table=table_path) + return df ``` +:::tip +To visualize the output of a UDF on Workbench, the function should return a Raster or Vector object. Workbench will render the UDF's returned data as a map layer. Read more about return types [here](/basics/core-concepts/#udf-execution-modes-file-tile). +::: -### From GitHub - -Fused can also import Python modules from a public GitHub URL. The URL must be of a directory that contains modules exported with Fused - that way they include the metadata needed to import them. This example shows how to import the `utils` module and call its `table_to_tile` function. +#### Syntax to keep in mind -```python -utils = fused.core.import_from_github('https://github.com/fusedio/udfs/tree/main/public/common/').utils -utils.table_to_tile(...) -``` +The Fused compute engine recognizes the UDF as self-contained function. This means that developers should: +- Decorate the UDF function with `@fused.udf`. +- Declare imports within the function. +- Encapsulate helper functions as importable `util modules` of the UDF. +- Optionally, enable autodetection with [explicit typing](/workbench/udf-editor/#auto-tile-and-file). -## Cache +That’s all the new syntax you need to remember to get started! -Fused runs UDFs from top to bottom each time. This execution model makes development easy, but can be encumbered if long-running helper functions are called again and again. +### c) Typed parameters -Sometimes a UDF might take a while to download or process data. When this happens, developers can take advantage of Fused's built-in caching. Caching stores the results of slow function calls so they only need to run once. +When a UDF's signature have explicit types, Fused converts passed parameters to the specified types. -All a developer must do is place slow code inside a helper function, decorate the function with `@fused.cache`, and assign the returned data object to a variable. The object will persist across runs. This empowers users to quickly iterate on downstream code without having to wait for the slow code to run each time. +:::tip +UDF endpoints can be called via HTTP requests, so input parameters must be serializable. -Fused caches the function's output using a unique hash identifier generated based on the function's code, the value of its parameters, and the `_cache_id` argument. +As such, it's important to explicitly define the types of input parameters. That way, Fused knows to convert serialized parameters to the correct type. That way, for example, if a parameter is declared as an `int`, a stringified `"42"` will convert to an integer `42`. +::: -#### Minimal example -To illustrate, this function accepts an argument and a keywork argument. When the function is called to set `output_1` and `output_2`, Fused caches the output of each call as separate objects. That way, the UDF only runs the function once for each set of passed arguments. -```python -@fused.cache -def sample_function(name, company="Fused"): - # Function logic - return f"{name}, at {company}, cached this function's output." -@fused.udf -def udf(bbox): - ... - output_1 = sample_function("Sina") - output_2 = sample_function("Plinio", company="Fused.io") - ... - -``` +#### The `bbox` object -#### Intermediate example +A UDF becomes spatially aware when it leverages the `bbox` parameter to spatially filter the datasets it operates on. A UDF can load from a cloud optimized dataset only the parts of the file that are actually required by the query. -At this point, ony might ask: if UDFs run for each tile in the viewport, how does Fused distinguish the cache for each tile? +:::tip +The growing popularity of cloud optimized data formats revolutionized data processing by eliminating the need for specialized hardware to handle large datasets. These datasets organize vector tables and raster arrays in such a way that Fused reads only specified portions of the file. -UDFs give spatial awareness to the cache decorator by setting `_cache_id` as string identifier unique to the tile's `bbox`. This can for example be a string such as `str(bbox.to_json())`, or something more complex that could include a date to distinguish cached outputs by. +By strategically designing your UDFs be spatially aware (with the `bbox` parameter), Fused distributes execution across multiple workers that scale and wind down as needed. -Note that a custom caching directory can be set with the optional `path` parameter. +💡 For further reading on data formats, refer to resources on: -```python -@fused.cache(path='optional_cache_dir') -def sample_function(name, company="Fused"): - # Function logic - return f"{name}, at {company}, cached this function's output." +- [Cloud Optimized GeoTiff](https://www.cogeo.org/) +- [Raster](https://rasterio.readthedocs.io/en/stable/api/rasterio.windows.html) +- [Geoparquet](https://geoparquet.org/) +- [GeoArrow](https://geoarrow.org/format.html) -@fused.udf -def udf(bbox): - ... - output = sample_function("Plinio", company="Fused.io", _cache_id=str(bbox.to_json())") - ... -``` +::: -## Download +When writing UDFs, it’s important to strategically use the `bbox` spatial filter to select which parts of a dataset to load. This section shows approaches for different dataset types. -Fused Workbench runs UDFs from top to bottom each time code changes. This means objects in the UDF are recreated each time, which can slow down a UDF that downloads files from a remote server. +Tile mapping tools call Fused endpoints and dynamically pass an XYZ index for each Tile to render. When a UDF endpoint is called this way - in Tile mode - Fused passes the UDF a `bbox` object as the first parameter. This object is a data structure with information that corresponds to the Tile's bounds and XYZ coordinates. The object is named `bbox` by convention, but it's possible to use a different name as long as its in the first parameter. -> 💡 Downloaded files are written to a mounted volume shared across all UDFs in an organization. This means that a file downloaded by one UDF can be read by other UDFs. +For convenience, users can decide the structure of the `bbox` object by setting explicit typing. The 3 available structures are: +- `fused.types.Bbox` is a `shapely.geometry.polygon.Polygon` corresponding to the Tile's bounds. +- `fused.types.TileXYZ` is a `mercantile.Tile` object with values for the the `x`, `y`, and `z` Tile indices. +- `fused.types.TileGDF` is a `geopandas.geodataframe.GeoDataFrame` with `x`, `y`, `z`, and `geometry` columns. -Fused addresses the latency of downloading files with the `download` utility function. It stores files in the mounted filesystem so they only download the first time. +:::tip +Because a UDF can be called as either File or Tile, Workbench must explicitly know how to render their output. When a UDF is configured as "Auto", Workbench automatically handles the output as Tile if it statically checks that the types `fused.types.TileXYZ`, `fused.types.TileGDF`, or `fused.types.Bbox` are used in the UDF. Otherwise, it assumes File. -> 💡 Because a Tile UDF runs multiple chunks in parallel, the `download` function sets a signal lock during the first download attempt, to ensure the download happens only once. +Note that the "Auto" setting is specific and applicable only to the Workbench UI. UDFs called via fused-py or HTTP requests run as Tile only if a parameter specifies the Tile geometry. If the UDF is called as a File, Fused passes a `None` value to the first parameter. +::: -### Example: download `.zip` file +This snippet shows an instance of a box object, which is a [shapely.Polygon](https://shapely.readthedocs.io/en/stable/reference/shapely.Polygon.html) type with the Tile bounding box’s 4 vertices. -To download a file to disk, call `fused.core.download`. The function downloads the file only on the first execution, and returns the file path for downstream functions to reference. +```python +import shapely +bbox = shapely.Polygon([[-121.640625, 37.43997405227058], [-121.640625, 37.718590325588146], [-121.9921875, 37.718590325588146], [-121.9921875, 37.43997405227058], [-121.640625, 37.43997405227058]]) +>> POLYGON ((-121.640625 37.43997405227058, -121.640625 37.718590325588146, -121.9921875 37.718590325588146, -121.9921875 37.43997405227058, -121.640625 37.43997405227058)) +``` -This example downloads a `.zip` file then returs it as a GeoDataFrame. Note how GeoPandas reads the local file path returned by `download`. +When writing a UDF, Fused recommends setting the `bbox` as the first parameter, and typing it as a `fused.types.Bbox` with a default value of None. This will enable the UDF to run in both as `File` (when `bbox` isn’t necessarily passed) and as a `Tile`. For example: ```python @fused.udf -def udf(url='https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11/tl_rd22_11_bg.zip'): - import fused - import geopandas as gpd +def udf(bbox: fused.types.Bbox=None): + ... + return ... +``` - # Download zip file - out_path = fused.core.download(url=url, file_path='out.zip') +##### Filter raster files with the `bbox` object - # Show path to file - print(out_path) +The Fused function `utils.mosaic_tiff` and `pystac-client`'s `catalog.search` illustrate how to use `bbox` to spatially filter a dataset. - return gpd.read_file(out_path) +For example, function`utils.mosaic_tiff` generates a mosaic image from a list of TIFF files. `bbox` defines the area of interest within the list of TIFF files set by `tiff_list`. + +```python +utils = fused.load("https://github.com/fusedio/udfs/tree/f928ee1/public/common/").utils +data = utils.mosaic_tiff( + bbox=bbox, + tiff_list=tiff_list, + output_shape=(256, 256), + ) ``` -## `fd://` File system +As an example, the [LULC_Tile UDF](https://github.com/fusedio/udfs/blob/b89a3aab05cb75dab25abb73e4c17490844ab764/public/LULC_Tile_Example/LULC_Tile_Example.py#L21-L27) uses `mosaic_tiff` to create a mosaic from a set of Land Cover tiffs. -The `fd://` file system serves as a namespace for an S3 bucket provisioned by Fused Cloud for your organization. It provides a unified interface for accessing files and directories stored within the bucket, abstracting away the complexities of direct interaction with S3. -Access it like you would an object on S3. +:::tip +When returning a raster object that doesn’t have spatial metadata, like a numpy array, a UDF must also return the bbox object to tell Fused where in the globe to place the output raster. For example: -For example, to fetch a file: ```python -fused.get("fd://bucket-name/file.parquet") +@fused.udf +def udf(bbox: fused.types.Bbox=None): + ... + return np.array([[…], […]]), bbox ``` +::: -Or, for example, to ingest a table: -```python -job = fused.ingest( - input="https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/06_CALIFORNIA/06/tl_rd22_06_bg.zip", - output="fd://census/ca_bg_2022/", -).execute() -``` +##### Filter STAC datasets with the `bbox` object + +STAC ([SpatioTemporal Asset Catalog](https://github.com/radiantearth/stac-spec)) datasets can be queried by passing the bounding box’s bounds (`bbox.bounds`) to the pystac client of the Python [pystac-client](https://pypi.org/project/pystac-client/) library. +```jsx +import pystac_client +from pystac.extensions.eo import EOExtension as eo +catalog = pystac_client.Client.open( + "https://planetarycomputer.microsoft.com/api/stac/v1", + modifier=planetary_computer.sign_inplace, +) +items = catalog.search( + collections=[collection], + bbox=bbox.total_bounds, +).item_collection() +``` -## Return types +### d) Return object Spatial data provides type information to render it on a map. The Fused Workbench displays data on the map as either vector or raster types. Vectors are polygons typically in the form of a GeoDataFrame, and rasters are pixel arrays typically in the form of numpy arrays, or xarray Datasets or DataArrays. -> 💡 Fused expect all return data in `EPSG:4326` - `WGS84` coordinates, using Latitude-Longitude units in decimal degrees. +> 💡 Fused expect all returned GeoDataFrame data in `EPSG:4326` - `WGS84` coordinates, using Latitude-Longitude units in decimal degrees. When writing a UDF, the data type and CRS of its returned object determines how it’ll appear on the map. @@ -309,9 +296,9 @@ Attributes: 1. In RGB images, the color black (0,0,0) is automatically set to full transparency. 2. If a 4 channel array is passed, i.e. RGBA, the value of the 4th channel is the transparency. -#### Bounds +##### Bounds -But, what if the raster object does not have an inherent spatial attribute? +At this point you might be wondering: what happens when the raster object does not have an inherent spatial attribute? Objects like numpy *arrays* don’t inherently have a geo attribute. In these cases, the UDF must also return the object's *bounds* to tell Fused where it goes on the map. @@ -359,167 +346,256 @@ TODO: reinstate once array_to_xarray relocation consistent > 💡 If you forget to pass the bounds, Fused will default its bounds to `(-180, -90, 180, 90)` and the output image will expand to the size of the globe. -## Environment variables - -Add constants and secrets to an `.env` file to make them available to your UDFs via environment variables. -First, run a UDF that sets variables in an `.env` file. -:::note -To be accessible to all UDF run events, the file must be placed on the runtime's mount path `/mnt/cache/`. -::: -```py -env_vars = """ -MY_ENV_VAR=123 -DB_USER=username -DB_PASS=****** -""" +## Saving UDFs -# Path to your .env file -env_file_path = '/mnt/cache/.env' +UDFs are saved as a directory of associated files that furnish functionality to run anywhere. This makes them shareable. -# Writing the environment variables to the .env file -with open(env_file_path, 'w') as file: - file.write(env_vars) -``` +For example, the following snippet saves a UDF in a local directory, `Sample_UDF`. -Then, in a different UDF, load the variables into the environment. +```python +import fused -```py -from dotenv import load_dotenv +@fused.udf +def my_udf(): + return "Hello from Fused!" -# Load environment variable -env_file_path = '/mnt/cache/.env' -load_dotenv(env_file_path, override=True) -print(f"Updated MY_ENV_VAR: {os.getenv('MY_ENV_VAR')}") +# Save locally +my_udf.to_directory('Sample_UDF') ``` +The directory contains the UDF's documentation, code, metadata, and utility function code. + +``` +└── Sample_UDF + ├── README.MD + ├── Sample_UDF.py + ├── meta.json + └── utils.py +``` -## UDF execution modes (File, Tile) +Files relevant to each UDF are: -Fused can efficiently transform and load any size geospatial dataset into dynamic and performant maps into data analysis tools. +- `README.md` Provides details of the UDF's purpose and how it works. +- `Sample_UDF.py` This eponymous Python file contains the UDF's business logic as a Python function decorated with `@fused.udf`. +- `meta.json` This file contains metadata needed to render the UDF in the Fused explorer and for the UDF to run correctly. +- `utils.py` This Python file contains helper functions the UDF (optionally) imports and references. -The growing popularity of analysis-ready cloud optimized data formats has revolutionized data processing by eliminating the need for specialized hardware to handle large datasets. Fused leverages industry standard cloud optimized formats to efficiently pull specific portions of a dataset corresponding to specified spatial areas. -These datasets organize vectors and raster pixels in a manner that allows Fused to request specific portions of the file. Fused UDFs are spatially aware thanks to the bbox parameter, which specifies the portion of the dataset to query. -By designing your UDFs to operating on specified areas, Fused optimizes resource allocation across multiple workers to enhance processing efficiency. +## Utility modules -💡 For further reading on data formats, refer to resources on: +Utility modules enhance the functionality and maintainability of UDFs. -- [Cloud Optimized GeoTiff](https://www.cogeo.org/) -- [Raster](https://rasterio.readthedocs.io/en/stable/api/rasterio.windows.html) -- [Geoparquet](https://geoparquet.org/) -- [GeoArrow](https://geoarrow.org/format.html) +As UDFs grow in complexity, it's useful to modularize the code to make it reusable and composable. It's also a good practice to keep only the essential "business logic" in the decorated UDF function - this makes it easy to know what a UDF does at a glance. -### File & Tile +With this in mind, a Fused UDF can optionally reference a module to import Python objects from it, with an import statement as if importing from a Python package. These modules are reusable Python functions that promote code reuse and speed up development time. UDFs can import from a variety of sources: from the local environment, from GitHub, and from other UDFs. This section shows how to import modules into UDFs form each of these sources. -Fused is designed to process complex datasets of any size and serve them as custom vector and raster tilesets. Instead of querying a slow backend database, a map that uses Fused’s dynamic Tile layers is smooth and loads instantly. This is especially powerful to do custom transformations on datasets with millions of records. +### From local -A UDF is essentially a serverless python function that returns a result. Spatial filtering supercharges UDFs with the capability to process only specific spatial areas within a dataset of any size. +Local modules are Python files in the same environment as the UDF. -A UDF becomes spatially aware when it leverages the `bbox` parameter to filter the datasets it operates on. This enables a UDF to run the `File` way, as a one-off task, or to become a dynamic `Tile`. -- The `File` way, a UDF as a one-off task and returns results based on specified input parameters. It executes once for the specified inputs. -- The `Tile` way, similar to `File`, but Fused dynamically passes the `bbox` parameter. +In the Workbench, the "module" code editor tab is the place for helper functions and other associated Python objects for the UDF to import. Keep in mind that the module's name is configurable in order to avoid naming collisions. In this example, UDF imports the function `arr_to_plasma` from its module, which is named `utils`. The function contains support logic the UDF uses it to transform an array. -When web mapping tools call a UDF as a `Tile`, they make multiple calls in parallel for different areas then stitch the results together to create a map. This creates a responsive visualization experience. The best part is that Fused handles data partitions, caching, and parallelization behind the scenes so users can focus on analyzing data. +```python +@fused.udf +def udf(bbox): + from utils import arr_to_plasma + ... + return arr_to_plasma(arr.values, min_max=(0, .8)) +``` -Read-on to understand the nuances between the two way UDFs can run. +![Alt text](https://fused-magic.s3.us-west-2.amazonaws.com/docs_assets/image-33.png) -#### When would I want to run a UDF in Tile mode? +When importing a module from a Python environment other than Workbench, the module must be specified as the locally-scoped file name in the `headers` argument of the `@fused.udf` decorator. This lets Fused know how to complete the reference. -Running a UDF in Tile mode enables compatibility with industry standard tools that render [tiled web maps](https://en.wikipedia.org/wiki/Tiled_web_map), which consist of dozens of seamlessly joined individually-requested tiles of either image or vector format. Instead of fetching an entire dataset, Tile-based mapping tools only load and render what's visible in the current viewport. +```python +@fused.udf( + headers=['utils.py'] +) +def udf(bbox): + from utils import arr_to_plasma + ... + return arr_to_plasma(arr.values, min_max=(0, .8)) +``` -To use data in these tools, data must be sliced into "tiles" - each with a pre-defined bounding box and zoom level. But loading data from large files can be slow slow and generating precomputed tiles can be tedious. Instead, Fused UDFs can dynamically generate Tiles that load into map apps from a unique URL that Fused hosts for you. Use this to create responsive frontend applications. -In addition to integrating with other tools (such as geemap, leaflet, mapbox), running UDFs in Tile mode gives them other advantages like parallel execution and spatial caching. +### From GitHub -You can read more about the XYZ indexing system in the [Deck.gl](http://Deck.gl) [documentation](https://deck.gl/docs/api-reference/geo-layers/tile-layer#indexing-system). In fact, Fused Workbench runs UDFs on a serverless backend and renders output in Deck.gl. +Fused can also import Python modules from a public GitHub URL. The URL must be of a directory that contains modules exported with Fused - that way they include the metadata needed to import them. This example shows how to import the `utils` module and call its `table_to_tile` function. -#### How can I run a UDF as a Tile? +```python +utils = fused.core.import_from_github('https://github.com/fusedio/udfs/tree/main/public/common/').utils +utils.table_to_tile(...) +``` -By default, a UDF runs as `File` - it executes once and returns a single output that corresponds to the input parameters. The same UDF can be triggered to run like a Tile when its called using a `bbox` spatial argument. This makes it possible to plug in its HTTP endpoint into a frontend Tile mapping application - think Leaflet, Mapbox, Foursquare Studio, Lonboard, and others. +## Cache -### Writing UDFs +Fused runs UDFs from top to bottom each time. This execution model makes development easy, but can be encumbered if long-running helper functions are called again and again. -When writing UDFs, it’s important to understand how to strategically use the `bbox` spatial filter to select which parts of a dataset to load. This section shows approaches for different dataset types. +Sometimes a UDF might take a while to download or process data. When this happens, developers can take advantage of Fused's built-in caching. Caching stores the results of slow function calls so they only need to run once. -#### The `bbox` object +All a developer must do is place slow code inside a helper function, decorate the function with `@fused.cache`, and assign the returned data object to a variable. The object will persist across runs. This empowers users to quickly iterate on downstream code without having to wait for the slow code to run each time. -`bbox` is a UDF’s spatial filter. +Fused caches the function's output using a unique hash identifier generated based on the function's code, the value of its parameters, and the `_cache_id` argument. -Tile mapping tools call Fused endpoints and dynamically pass an XYZ index for each Tile to render. When a UDF endpoint is called this way - in Tile mode - Fused passes the UDF a `bbox` object as the first parameter, which is a polygon with coordinates for the particular Tile. +#### Minimal example -This snippet shows an instance of a box object, which is a [shapely.Polygon](https://shapely.readthedocs.io/en/stable/reference/shapely.Polygon.html) type with the Tile bounding box’s 4 vertices. +To illustrate, this function accepts an argument and a keywork argument. When the function is called to set `output_1` and `output_2`, Fused caches the output of each call as separate objects. That way, the UDF only runs the function once for each set of passed arguments. ```python -import shapely -bbox = shapely.Polygon([[-121.640625, 37.43997405227058], [-121.640625, 37.718590325588146], [-121.9921875, 37.718590325588146], [-121.9921875, 37.43997405227058], [-121.640625, 37.43997405227058]]) ->> POLYGON ((-121.640625 37.43997405227058, -121.640625 37.718590325588146, -121.9921875 37.718590325588146, -121.9921875 37.43997405227058, -121.640625 37.43997405227058)) +@fused.cache +def sample_function(name, company="Fused"): + # Function logic + return f"{name}, at {company}, cached this function's output." + +@fused.udf +def udf(bbox): + ... + output_1 = sample_function("Sina") + output_2 = sample_function("Plinio", company="Fused.io") + ... + ``` -When writing a UDF, Fused recommends setting the `bbox` as the first parameter, and typing it as a `fused.types.Bbox` with a default value of None. This will enable the UDF to run in both as `File` (when `bbox` isn’t necessarily passed) and as a `Tile`. For example: +#### Intermediate example + +At this point, ony might ask: if UDFs run for each tile in the viewport, how does Fused distinguish the cache for each tile? + +UDFs give spatial awareness to the cache decorator by setting `_cache_id` as string identifier unique to the tile's `bbox`. This can for example be a string such as `str(bbox.to_json())`, or something more complex that could include a date to distinguish cached outputs by. + +Note that a custom caching directory can be set with the optional `path` parameter. ```python +@fused.cache(path='optional_cache_dir') +def sample_function(name, company="Fused"): + # Function logic + return f"{name}, at {company}, cached this function's output." + @fused.udf -def udf(bbox: fused.types.Bbox=None): +def udf(bbox): + ... + output = sample_function("Plinio", company="Fused.io", _cache_id=str(bbox.to_json())") ... - return ... ``` -#### Filter raster files +## Download -Fused exposes utility functions to spatially filter raster files. +Fused Workbench runs UDFs from top to bottom each time code changes. This means objects in the UDF are recreated each time, which can slow down a UDF that downloads files from a remote server. + +> 💡 Downloaded files are written to a mounted volume shared across all UDFs in an organization. This means that a file downloaded by one UDF can be read by other UDFs. + +Fused addresses the latency of downloading files with the `download` utility function. It stores files in the mounted filesystem so they only download the first time. + +> 💡 Because a Tile UDF runs multiple chunks in parallel, the `download` function sets a signal lock during the first download attempt, to ensure the download happens only once. + +### Example: download `.zip` file -- `utils.read_tiff` +To download a file to disk, call `fused.core.download`. The function downloads the file only on the first execution, and returns the file path for downstream functions to reference. -The function`utils.mosaic_tiff` generates a mosaic image from a list of TIFF files. `bbox` defines the area of interest within the list of TIFF files set by `tiff_list`. +This example downloads a `.zip` file then returs it as a GeoDataFrame. Note how GeoPandas reads the local file path returned by `download`. ```python -utils = fused.load("https://github.com/fusedio/udfs/tree/f928ee1/public/common/").utils -data = utils.mosaic_tiff( - bbox=bbox, - tiff_list=tiff_list, - output_shape=(256, 256), - ) +@fused.udf +def udf(url='https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11/tl_rd22_11_bg.zip'): + import fused + import geopandas as gpd + + # Download zip file + out_path = fused.core.download(url=url, file_path='out.zip') + + # Show path to file + print(out_path) + + return gpd.read_file(out_path) ``` -As an example, the [LULC_Tile UDF](https://github.com/fusedio/udfs/blob/b89a3aab05cb75dab25abb73e4c17490844ab764/public/LULC_Tile_Example/LULC_Tile_Example.py#L21-L27) uses `mosaic_tiff` to create a mosaic from a set of Land Cover tiffs. +## File systems -:::tip -When returning a raster object that doesn’t have spatial metadata, like a numpy array, a UDF must also return the bbox object to tell Fused where in the globe to place the output raster. For example: +The Fused runtime has two file systems to persist and share artifacts across UDF runs: an S3 bucket and a disk file system. These are used to store downloaded or generated objects, environment variables, and auxiliary files. + +:::warning +Access to the file systems is tightly scoped at the organization level, so files stored in either system can only be accessed by accounts in the same organization. + +Given the flexibility of Fused to run any Python code on files in the file system, users should take precautions standard to working with sensitive files. +::: + +### `fd://` S3 bucket + +The `fd://` bucket file system serves as a namespace for an S3 bucket provisioned by Fused Cloud for your organization. It provides a unified interface for accessing files and directories stored within the bucket, abstracting away the complexities of direct interaction with S3. Fused helper functions access it like an object on S3. +For example, to fetch a file: ```python -@fused.udf -def udf(bbox: fused.types.Bbox=None): - ... - return np.array([[…], […]]), bbox +fused.get("fd://bucket-name/file.parquet") ``` + +Or, for example, to ingest a table: +```python +job = fused.ingest( + input="https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/06_CALIFORNIA/06/tl_rd22_06_bg.zip", + output="fd://census/ca_bg_2022/", +).execute() +``` + +### `/mnt/cache` disk + +The `/mnt/cache` disk file system is the UDF runtime's local directory that persists across UDF runs. Use it store downloaded files, the output of cached functions, access keys, and to set environment variables with `.env` files. + +To list files in the directory, run this in a UDF. + +```python +import os + +for each in os.listdir('/mnt/cache/'): + print(each) +``` + +## Environment variables + +Save constants and secrets to an `.env` file to make them available to your UDFs via environment variables. + +First, run a UDF that sets variables in an `.env` file. + +:::note +To be accessible to all UDF run events, the file must be placed on the runtime's mount path `/mnt/cache/`. ::: -#### Filter STAC datasets +```py +@fused.udf +def udf(): + env_vars = """ + MY_ENV_VAR=123 + DB_USER=username + DB_PASS=****** + """ -STAC ([SpatioTemporal Asset Catalog](https://github.com/radiantearth/stac-spec)) datasets can be queried by passing the bounding box’s bounds (`bbox.bounds`) to the pystac client of the Python [pystac-client](https://pypi.org/project/pystac-client/) library. + # Path to .env file in disk file system + env_file_path = '/mnt/cache/.env' -```jsx -import pystac_client -from pystac.extensions.eo import EOExtension as eo + # Write the environment variables to the .env file + with open(env_file_path, 'w') as file: + file.write(env_vars) +``` -catalog = pystac_client.Client.open( - "https://planetarycomputer.microsoft.com/api/stac/v1", - modifier=planetary_computer.sign_inplace, -) -items = catalog.search( - collections=[collection], - bbox=bbox.total_bounds, -).item_collection() +Then, in a different UDF, load the variables into the environment. + +```py +@fused.udf +def udf(): + from dotenv import load_dotenv + + # Load environment variable + env_file_path = '/mnt/cache/.env' + load_dotenv(env_file_path, override=True) + print(f"Updated MY_ENV_VAR: {os.getenv('MY_ENV_VAR')}") ``` ## Hosted API -UDFs saved on Fused cloud can be called as HTTP endpoints. +Fused automatically creates HTTP endpoints for every UDF saved in the Fused cloud. Using the Fused Hosted API supercharges your stack with the ability to trigger and load the output of any scale workflows. API calls automatically provision serverless compute resources to run workflows in parallel using advanced caching and geo partitioning - without your team needing to spend time on setup. @@ -538,7 +614,7 @@ The following sections describes how to create a UDF endpoint either in Workbenc ### Generate endpoints with Workbench -Once a UDF is saved in Workbench, the "Settings" tab of the editor will show code snippets that can be used to call the UDF from different environments. +Once a UDF is saved in Workbench, the "Settings" tab of the editor shows code snippets that can be used to call the UDF from different environments. #### Shareable public endpoints diff --git a/docs/basics/in/gee.ipynb b/docs/basics/in/gee.ipynb index 114ec4eb..c0d105ab 100644 --- a/docs/basics/in/gee.ipynb +++ b/docs/basics/in/gee.ipynb @@ -10,7 +10,7 @@ "\n", "Fused interfaces Google Earth Engine through the Python `earthengine-api` library. This example shows how to load data from [Google Earth Engine](https://developers.google.com/earth-engine/datasets) into your Fused UDFs.\n", "\n", - "The GEE library requires Service Accunt credentials to be set. Read how to create them in the Google Earth Engine documentation, and set `key_path` as the path of the key json." + "To use the GEE Python library with Fused, generate Google [Service Account credentials](https://developers.google.com/earth-engine/guides/service_account) and set them as a JSON key file in the [Fused disk filesystem](/basics/core-concepts/#mntcache-disk). In this example, the file path is specified as `key_path` in the `ee.ServiceAccountCredentials`." ] }, { diff --git a/docs/basics/in/gee/Gee.mdx b/docs/basics/in/gee/Gee.mdx index 20b83efa..aec1e6bb 100644 --- a/docs/basics/in/gee/Gee.mdx +++ b/docs/basics/in/gee/Gee.mdx @@ -26,9 +26,11 @@ Fused interfaces Google Earth Engine through the Pyth [Google Earth Engine](https://developers.google.com/earth-engine/datasets) into your Fused UDFs. -The GEE library requires Service Accunt credentials to be set. Read how to create them -in the Google Earth Engine documentation, and set `key_path` as the path of the key -json. +To use the GEE Python library with Fused, generate Google +[Service Account credentials](https://developers.google.com/earth-engine/guides/service_account) +and set them as a JSON key file in the +[Fused disk filesystem](/basics/core-concepts/#mntcache-disk). In this example, the file +path is specified as `key_path` in the `ee.ServiceAccountCredentials`. ```python # !pip install fused earthengine-api xarray xee -q diff --git a/docs/basics/index.mdx b/docs/basics/index.mdx index 96ec0208..1e7629c5 100644 --- a/docs/basics/index.mdx +++ b/docs/basics/index.mdx @@ -9,11 +9,10 @@ slug: / # Meet Fused -[Fused](https://www.fused.io/) is the glue layer to run any size workflows to load data across your most important tools +[Fused](https://www.fused.io/) is the glue layer to run workflows to load data across your most important tools. Use the Fused serverless API to build, scale, and ship geospatial workflows of any size. - ## Ecosystem Build any scale workflows with the [Fused Python SDK](/python-sdk) and [Workbench webapp](/workbench), and integrate them into your stack with the [Fused Hosted API](/basics/core-concepts/#hosted-api). diff --git a/docs/basics/tutorials/load/Load.mdx b/docs/basics/tutorials/load/Load.mdx index e530e310..55de1195 100644 --- a/docs/basics/tutorials/load/Load.mdx +++ b/docs/basics/tutorials/load/Load.mdx @@ -58,6 +58,8 @@ my_new_udf.to_fused() ## From Workbench to local +Load a UDF saved in Workbench and run it in a local Python environment. + ```python import fused diff --git a/docs/python-sdk/dependencies.md b/docs/python-sdk/dependencies.md index 20d98cb8..e5807d71 100644 --- a/docs/python-sdk/dependencies.md +++ b/docs/python-sdk/dependencies.md @@ -10,6 +10,7 @@ Get in touch to have a package added to the list of dependencies or to learn abo ```python +absl-py==2.1.0 adlfs==2023.8.0 affine==2.4.0 aioboto3==7.0.0 @@ -20,26 +21,33 @@ aiohttp-client-cache==0.10.0 aioitertools==0.11.0 aiosignal==1.3.1 aiosqlite==0.19.0 -annotated-types==0.6.0 anyio==3.6.2 appdirs==1.4.4 appnope==0.1.3 +argon2-cffi==23.1.0 +argon2-cffi-bindings==21.2.0 +arraylake==0.7.8 +arrow==1.3.0 asciitree==0.3.3 asn1crypto==1.5.1 asttokens==2.2.1 +async-lru==2.0.4 +async-retriever==0.15.2 +async-timeout==4.0.2 atpublic==4.0 attrs==23.1.0 azure-core==1.29.1 azure-datalake-store==0.0.53 azure-identity==1.15.0 azure-storage-blob==12.19.0 +Babel==2.14.0 backcall==0.2.0 +beautifulsoup4==4.12.3 bidict==0.22.1 -black==23.3.0 +bleach==6.1.0 +blosc2==2.5.1 boto3==1.28.17 -boto3-stubs==1.26.135 botocore==1.31.17 -botocore-stubs==1.29.130 Bottleneck==1.3.7 branca==0.7.0 Brotli==1.1.0 @@ -56,161 +64,207 @@ cligj==0.7.2 cloudpickle==3.0.0 color-operations==0.1.1 colorcet==3.0.1 -constructs==10.2.70 +comm==0.2.2 contourpy==1.2.0 cryptography==40.0.2 cycler==0.12.1 cytoolz==0.12.2 dask==2023.12.0 datashader==0.16.0 +daymetpy==1.0.0 db-dtypes==1.2.0 +debugpy==1.8.1 decorator==5.1.1 defusedxml==0.7.1 -dill==0.3.7 -duckdb==0.9.0 +distro==1.9.0 +dnspython==2.6.1 +docopt==0.6.2 +donfig==0.8.1.post0 +duckdb==0.10.1 duckdb_engine==0.10.0 earthengine-api==0.1.384 +email_validator==2.1.1 entrypoints==0.4 +et-xmlfile==1.1.0 executing==1.2.0 fastapi==0.95.2 fasteners==0.19 +fastjsonschema==2.19.1 filelock==3.13.1 -fiona==1.9.5 +Fiona==1.9.4.post1 folium==0.15.1 fonttools==4.47.0 +fqdn==1.5.1 frozenlist==1.3.3 fsspec==2023.6.0 +fused-internal=0.1.2 future==0.18.3 gax-google-logging-v2==0.8.3 gax-google-pubsub-v1==0.8.3 gcloud==0.18.3 gcsfs==2023.6.0 geopandas==0.12.2 -google-api-core==2.15.0 -google-api-python-client==2.112.0 -google-auth==2.26.1 +google-api-core==2.11.1 +google-api-python-client==2.111.0 +google-auth==2.23.0 google-auth-httplib2==0.2.0 -google-auth-oauthlib==1.2.0 +google-auth-oauthlib==1.1.0 google-cloud-bigquery==3.14.1 google-cloud-bigquery-storage==2.24.0 -google-cloud-core==2.4.1 -google-cloud-storage==2.14.0 +google-cloud-core==2.3.3 +google-cloud-storage==2.11.0 google-crc32c==1.5.0 google-gax==0.12.5 -google-resumable-media==2.7.0 -googleapis-common-protos==1.62.0 +google-resumable-media==2.6.0 +googleapis-common-protos==1.60.0 +gradio_client==0.15.1 grpc-google-logging-v2==0.8.1 grpc-google-pubsub-v1==0.8.1 grpcio==1.60.0 grpcio-status==1.60.0 +gtfs-realtime-bindings==1.0.0 gunicorn==20.1.0 h11==0.14.0 -h3==4.0.0b2 -h5netcdf==1.3.0 -h5py==3.10.0 -httpcore==1.0.2 +h3==4.0.0b3 +h5netcdf==1.2.0 +h5py==3.9.0 +highspy==1.5.3 +httpcore==0.18.0 httplib2==0.22.0 -httpx==0.26.0 +httpx==0.25.0 +huggingface-hub==0.20.1 humanize==4.9.0 hydrosignatures==0.15.2 -ibis-framework==7.2.0 -idna==3.6 +ibis-framework==8.0.0 +idna==3.4 imageio==2.33.1 imageio-ffmpeg==0.4.9 -importlib-metadata==7.0.1 -importlib-resources==6.1.1 +immutabledict==4.2.0 +importlib-metadata==7.0.0 +importlib-resources==5.12.0 iniconfig==2.0.0 -ipykernel==6.24.0 -ipython==8.14.0 +ipykernel==6.29.4 +ipython==8.12.0 +ipython-genutils==0.2.0 +ipywidgets==7.8.1 isodate==0.6.1 +isoduration==20.11.0 itsdangerous==2.1.2 jedi==0.18.2 jellyfish==0.11.2 Jinja2==3.1.2 jmespath==1.0.1 -joblib==1.3.2 -jsii==1.89.0 +jplephem==2.21 +json5==0.9.25 +jsonpointer==2.4 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 +jupyter-events==0.10.0 +jupyter-lsp==2.2.5 +jupyter_client==8.6.1 +jupyter_core==5.7.2 +jupyter_server==2.14.0 +jupyter_server_terminals==0.5.3 +jupyterlab==4.1.6 +jupyterlab-widgets==1.1.7 +jupyterlab_pygments==0.3.0 +jupyterlab_server==2.24.0 +keplergl==0.3.2 kiwisolver==1.4.5 lazy_loader==0.3 +linkify-it-py==2.0.3 llvmlite==0.41.0 locket==1.0.0 loguru==0.6.0 lxml==4.9.4 lz4==4.3.2 mangum==0.17.0 +mapbox-vector-tile==2.0.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 matplotlib==3.8.2 matplotlib-inline==0.1.6 +mdit-py-plugins==0.4.0 mdurl==0.1.2 mercantile==1.2.1 +mistune==3.0.2 +more-itertools==10.2.0 morecantile==4.3.0 mpire==2.8.1 msal==1.26.0 msal-extensions==1.1.0 +msgpack==1.0.7 multidict==6.0.4 multipledispatch==1.0.0 -mypy-boto3-ec2==1.26.129 -mypy-boto3-logs==1.26.53 -mypy-boto3-s3==1.26.127 -mypy-extensions==1.0.0 +nbclient==0.10.0 +nbconvert==7.16.3 +nbformat==5.10.4 +ndindex==1.7 nest-asyncio==1.5.8 netCDF4==1.6.5 networkx==3.2.1 -numba==0.58.0 +notebook==7.1.3 +notebook_shim==0.2.4 +numba==0.58.1 numcodecs==0.11.0 numexpr==2.8.8 -numpy==1.25.2 -oauth2client==4.1.3 -oauthlib==3.2.2 +numpy==1.26.4 odc-geo==0.4.1 odc-stac==0.3.8 -opencv-python-headless==4.8.0.74 +openai==1.14.1 openpyxl==3.1.2 +ortools==9.9.3963 +osmnet==0.1.7 osmnx==1.7.1 +overrides==7.7.0 OWSLib==0.29.3 packaging==23.1 palettable==3.3.3 +pandana==0.7 pandas==2.1.1 +pandocfilters==1.5.1 param==2.0.1 parso==0.8.3 parsy==2.1 partd==1.4.1 -pathspec==0.11.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==10.1.0 pins==0.8.4 planetary-computer==1.0.0 platformdirs==3.5.1 -pluggy==1.0.0 +pluggy==1.4.0 ply==3.8 portalocker==2.8.2 +prometheus_client==0.20.0 prompt-toolkit==3.0.38 proto-plus==1.23.0 -protobuf==4.24.3 +protobuf==4.25.3 +psutil==5.9.7 psycopg2==2.9.9 ptyprocess==0.7.0 -publication==0.0.3 +pure-eval==0.2.2 +py-cpuinfo==9.0.0 py3dep==0.15.2 pyarrow==14.0.1 pyarrow-hotfix==0.6 pyasn1==0.5.0 pyasn1-modules==0.3.0 +pybdshadow==0.3.5 pycares==4.4.0 +pyclipper==1.3.0.post5 pycparser==2.21 pyct==0.5.0 pydantic==1.10.7 -pydantic_core==2.14.6 pydata-google-auth==1.8.2 pydaymet==0.15.2 pygeohydro==0.15.2 pygeoogc==0.15.2 pygeoutils==0.15.2 Pygments==2.15.1 +pygtfs==0.1.9 PyJWT==2.7.0 +pykalman==0.9.7 pynhd==0.15.2 pynldas2==0.15.2 pyogrio==0.6.0 @@ -218,36 +272,51 @@ pyOpenSSL==23.2.0 pyparsing==3.1.1 pyproj==3.6.1 pyshp==2.3.1 +PySocks==1.7.1 pystac==1.9.0 pystac-client==0.7.5 -pytest==7.3.1 +pytest==8.0.0 +pytest-asyncio==0.23.5 python-dateutil==2.8.2 python-dotenv==1.0.0 +python-json-logger==2.0.7 pytz==2023.3.post1 PyYAML==6.0.1 +pyzmq==26.0.2 rasterio==1.3.7 referencing==0.32.0 requests==2.30.0 requests-cache==1.1.1 +requests-file==2.0.0 requests-oauthlib==1.3.1 +requests-toolbelt==1.0.0 +retrying==1.3.4 +rfc3339-validator==0.1.4 +rfc3986-validator==0.1.1 rich==13.7.0 rio-tiler==5.0.3 rioxarray==0.14.1 rpds-py==0.13.2 +rpy2==3.5.15 rsa==4.9 -rtoml==0.9.0 +ruamel.yaml==0.18.6 +ruamel.yaml.clib==0.2.8 s3fs==2023.6.0 s3transfer==0.6.2 scikit-image==0.22.0 scikit-learn==1.3.2 scipy==1.11.2 -shapely==2.0.2 +sgp4==2.23 +shapely==2.0.1 +simple-salesforce==1.12.6 six==1.16.0 +skyfield==1.48 sniffio==1.3.0 -snowflake-connector-python==3.0.4 +snowflake-connector-python==3.6.0 snowflake-sqlalchemy==1.5.1 snuggs==1.4.7 sortedcontainers==2.4.0 +soupsieve==2.5 SQLAlchemy==1.4.51 sqlalchemy-views==0.3.2 sqlglot==20.7.1 @@ -256,32 +325,51 @@ stack-data==0.6.2 stackstac==0.5.0 starlette==0.27.0 structlog==23.3.0 +suncalc==0.1.3 +tables==3.9.2 +terminado==0.18.1 threadpoolctl==3.2.0 tifffile==2023.12.9 +tiledb==0.27.1 +tinycss2==1.3.0 +tomlkit==0.12.3 toolz==0.12.0 topojson==1.7 -tornado==6.3.2 +tornado==6.4 tqdm==4.66.1 traitlets==5.9.0 +traittypes==0.2.1 +transbigdata==0.5.3 typer==0.9.0 -types-s3transfer==0.6.1 -typing_extensions==4.9.0 -tzdata==2023.4 +types-cachetools==5.3.0.7 +types-python-dateutil==2.9.0.20240316 +typing_extensions==4.10.0 +tzdata==2023.3 +tzlocal==5.2 +uc-micro-py==1.0.3 ujson==5.9.0 +uri-template==1.3.0 uritemplate==4.1.1 url-normalize==1.4.3 -urllib3==1.26.18 +urllib3==1.26.15 us==3.1.1 uvicorn==0.18.3 uvloop==0.19.0 +vt2geojson==0.2.1 wcwidth==0.2.6 -wrapt==1.16.0 -xarray==2023.12.0 +webcolors==1.13 +webencodings==0.5.1 +websocket-client==1.7.0 +websockets==11.0.3 +widgetsnbextension==3.6.6 +wrapt==1.15.0 +xarray==2023.8.0 xarray-spatial==0.3.5 xee==0.0.4 xxhash==3.4.1 xyzservices==2023.10.1 yarl==1.9.4 zarr==2.16.0 +zeep==4.2.1 zipp==3.17.0 ``` diff --git a/docs/python-sdk/index.mdx b/docs/python-sdk/index.mdx index 19f525d1..b4750206 100644 --- a/docs/python-sdk/index.mdx +++ b/docs/python-sdk/index.mdx @@ -88,14 +88,16 @@ Loading UDFs from GitHub repositories or local files do not require authenticati ## Run a UDF -Once a UDF is loaded, running it executes the function code and retrieves the function output. Fused provides flexibility in how the execution and output retrieval. +☝️ Read more about File & Tile execution models in the [core concepts section](/basics/core-concepts/#udf-execution-modes-file-tile). -Fused offers the option to partition datasets and perform computation based on map tiles. UDFs by default run as a single operation, called `File` mode, and can run as spatially partitioned, called `Tile`. +Once a UDF is loaded, running it executes the parametrized function code and returns the function output. + +UDFs by default run as a single operation, called `File` mode, and can run as spatially partitioned, called `Tile`. - `File`. By default, UDFs run as a single operation and return all data in one call. This option is suitable for localized and smaller outputs where fetching the entire dataset at once is feasible. - `Tile`. In this mode, UDFs process data for specific geographic areas defined by predefined bounding boxes. These bounding boxes can be specified in various ways. This option is suitable for datasets that cover geographic extents and allow for spatial queries. Compute tasks are distributed among worker, with each worker processing only the fraction of data corresponding to a specific [tile](https://deck.gl/docs/api-reference/geo-layers/tile-layer). This enables parallel processing and efficient computation. -Deciding which to use is based on the underlying dataset and on how the outputs will be represented. This is specified by the parameters passed to the `fused.run` convenience function. +Deciding which to use is based on the underlying dataset and on the client mechanism. This is specified by parameters of the `fused.run` convenience function. ### Run as File diff --git a/docs/workbench/udf-editor.mdx b/docs/workbench/udf-editor.mdx index 21a56d6f..75ae5757 100644 --- a/docs/workbench/udf-editor.mdx +++ b/docs/workbench/udf-editor.mdx @@ -11,13 +11,14 @@ The UDF Editor is where developers author UDFs. They can write Python to explore A UDF can be set to render its outputs on Workbench as a [Tile or File](/basics/core-concepts/#file--tile) - or autodetect between the two based on parameters. The choice depends on the nature of data the UDF will return. New UDFs are set to autodetect by default, and the kind can be changed in the top-right dropdown of the code editor. -You can read more about the difference between the two types of outputs in the [core concepts section](/basics/core-concepts/#udf-execution-modes-file-tile). +☝️ Read more about File & Tile in the [core concepts section](/basics/core-concepts/#udf-execution-modes-file-tile). + +![alt text](@site/static/img/autofiletile.png) :::note -Because a UDF can be called as either File or Tile, Workbench must explicitly know how to render their output. When a UDF is configured as "Auto", Workbench automatically handles the output as Tile if it statically checks that the types `fused.types.TileXYZ`, `fused.types.TileGDF`, or `fused.types.Bbox` are used in the UDF. Otherwise, it assumes File. +Because the same UDF endpoint can called as either File or Tile, it needs to tell Workbench which one to use. When a UDF is configured as "Auto", Workbench automatically handles the output as Tile if it statically checks that the types `fused.types.TileXYZ`, `fused.types.TileGDF`, or `fused.types.Bbox` are used in the UDF. Otherwise, it assumes File. Note that the "Auto" setting is specific and applicable only to the Workbench UI. UDFs called via `fused-py` or HTTP requests run as Tile only if a parameter specifies the Tile geometry. - ::: ### Toolbar @@ -100,21 +101,25 @@ UDFs run code remotely and return outputs to the browser over the network. Conse ![Alt text](https://fused-magic.s3.us-west-2.amazonaws.com/docs_assets/image-28.png) -### Snippets -Once a user creates a UDF in the Workbench, they can use snippets to call it and load its output it into other workflows. The "Snippets" section shows copyable commands to trigger the UDF from within a Python environment or bash. +### Share & snippets -By default, UDFs can only be called by user account that creates them. This can be done with the snippets below. +Once a user creates a UDF in the Workbench, they can use shared endpoint and snippets to call it and load its output it into other workflows. The "Share" and "Snippets" sections shows copyable code snippets to trigger the UDF from within different applications. -![Alt text](https://fused-magic.s3.us-west-2.amazonaws.com/docs_assets/snippets2.png) +#### Share -It's also possible to generate signed tokens that allow anyone with the token to call the UDF. These tokens can be revoked. +Generate signed tokens that allow anyone with the token to call the UDF. These tokens can be revoked. -![Alt text](https://fused-magic.s3.us-west-2.amazonaws.com/docs_assets/gifs/sign_url.gif) +![Alt text](https://fused-magic.s3.us-west-2.amazonaws.com/docs_assets/gifs/templated_share_snippets.gif) +#### Snippets +By default, UDFs can only be called by user account that creates them. This can be done with the snippets below. + +![Alt text](https://fused-magic.s3.us-west-2.amazonaws.com/docs_assets/snippets2.png) + ### Default parameter values diff --git a/docusaurus.config.ts b/docusaurus.config.ts index 4bc755bf..940cfc4d 100644 --- a/docusaurus.config.ts +++ b/docusaurus.config.ts @@ -63,6 +63,10 @@ const config: Config = { themeConfig: { + tableOfContents: { + minHeadingLevel: 2, + maxHeadingLevel: 3, + }, typesense: { // Replace this with the name of your index/collection. diff --git a/static/img/autofiletile.png b/static/img/autofiletile.png new file mode 100644 index 00000000..2e90e6fb Binary files /dev/null and b/static/img/autofiletile.png differ