Adds future work section to pyspark udf readme

So that it's clear we have more work to do here.
DAGWorks-Inc · Feb 28, 2023 · 17ae5bd · 17ae5bd
1 parent 8b89e55
commit 17ae5bd
Showing 1 changed file with 24 additions and 1 deletion.
diff --git a/examples/spark/pyspark_udfs/README.md b/examples/spark/pyspark_udfs/README.md
@@ -71,7 +71,30 @@ passed in dataframe.
 3. `@check_output` annotations are not currently supported for pyspark UDFs at this time. But we're working on it - ping
 us in slack (or via issues) if you need this feature!
 
-# Can't I just use pyspark dataframes directly with Hamilton functions?
+# Future work
+
+## Auto vectorize UDFs to be pandas_udfs
+We could under the hood translate basic vanilla python UDF functions to use the pandas_udf route. This could be a
+variable passed to the PySparkUDFGraphAdapter to enable it/or require some annotation on the function, or both.
+Let us know if this would be useful to you!
+
+## All the Pandas UDF signatures
+
+(1) Let us know what you need.
+(2) Implementation is a matter of (a) getting the API right, and (b) making sure it fits with the Hamilton way of thinking.
+
+## Aggregation UDFs
+
+We just need to determine what a good API for this would be. We're open to suggestions!
+
+## Other dataframe operations
+
+We could support other dataframe operations, like joins, etc. We're open to suggestions! The main issue is creating
+a good API for this.
+
+# Other questions
+
+## Can't I just use pyspark dataframes directly with Hamilton functions?
 
 Yes, with Hamilton you could write functions that define a named flow that operates entirely over pyspark dataframes.
 However, you lose a lot of the flexibility of Hamilton doing things that way. We're open to suggestions,