Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Rename OnDemandTransformations to Transformations #4038

Merged
merged 27 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
7383550
feat: updating protos to separate transformation
franciscojavierarceo Mar 17, 2024
6bcff8d
fixed stuff...i think
franciscojavierarceo Mar 17, 2024
ea58ace
updated tests and registry diff function
franciscojavierarceo Mar 18, 2024
1713313
updated base registry
franciscojavierarceo Mar 18, 2024
4a00c12
updated react component
franciscojavierarceo Mar 18, 2024
97a8bb6
formatted
franciscojavierarceo Mar 18, 2024
5190d6c
updated stream feature view proto
franciscojavierarceo Mar 18, 2024
23ae349
making the proto changes backwards compatable
franciscojavierarceo Mar 18, 2024
1d598d2
trying to make this backwards compatible
franciscojavierarceo Mar 20, 2024
81c6f82
caught a bug and fixed the linter
franciscojavierarceo Mar 20, 2024
7687e23
actually linted
franciscojavierarceo Mar 20, 2024
6507808
updated ui component
franciscojavierarceo Mar 20, 2024
f44c227
accidentally commented out fixtures
franciscojavierarceo Mar 20, 2024
dd2a5ca
Updated
franciscojavierarceo Mar 22, 2024
9ac6793
incrementing protos
franciscojavierarceo Mar 22, 2024
5a1db09
updated tests
franciscojavierarceo Mar 22, 2024
e6bf1e9
fixed linting issue and made backwards compatible
franciscojavierarceo Mar 22, 2024
0daf027
feat: Renaming OnDemandTransformations to Transformations
franciscojavierarceo Mar 24, 2024
9417006
updated proto name
franciscojavierarceo Mar 24, 2024
eff1497
renamed substrait proto
franciscojavierarceo Mar 24, 2024
ae19919
renamed substrait proto
franciscojavierarceo Mar 24, 2024
e34b604
updated
franciscojavierarceo Mar 24, 2024
9cd0ebe
updated
franciscojavierarceo Mar 25, 2024
7b9f180
updated integration test
franciscojavierarceo Mar 25, 2024
19544f4
missed one
franciscojavierarceo Mar 25, 2024
41524c9
updated to include Substrait type
franciscojavierarceo Mar 25, 2024
7de39ab
linter
franciscojavierarceo Mar 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions protos/feast/core/Transformation.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,12 @@ message UserDefinedFunctionV2 {

// A feature transformation executed as a user-defined function
message FeatureTransformationV2 {
// Note this Transformation starts at 5 for backwards compatibility
oneof transformation {
UserDefinedFunctionV2 user_defined_function = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't ready, but let me suggest some names. What if we call this PythonTransformation instead of UserDefinedFunctionV2. We could reuse that message type both for pandas_transformation and upcoming python_transformation fields and V2 in the naming (I think) will no longer be necessary. wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like stating V2 so that people understand it's a replacement for the deprecated proto. Are you thinking of making PythonTransformation an enum as well with Pandas and Python as elements? Feel free to suggest what you're thinking to make it a little more concret if you want.

My guess is something like

message FeatureTransformationV2 {
    oneof PythonTransformation {
        NativePython native_python = 1;
        Pandas pandas = 2;
    }
    SubstraitTransformationV2 substrait_transformation = 3;
}

Or something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would leave the possibility of having both python and substrait fields set, so probably not the best approach. I was thinking more like this (I'll omit V2s here just for brevity).

message FeatureTransformation {
    oneof transformation {
        PythonTransformation pandas_transformation  = 1;
        SubstraitTransformation substrait_transformation = 2;
        PythonTransformation python_transformation  = 3;
    }
}

note that pandas_transformation and python_transformation fields share the message type but that's just incidental because it just so happens that they need same type of information. If in the future we see that that's no longer the case, we could introduce PandasTransformation message as well and the first field of transformation will become PandasTransformation pandas_transformation = 1;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the UDF structure as it's a common industry pattern/convention especially for Spark.

@HaoXuAI any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not specifically against UDFs, but the way I like to think about it all these options are sort of udfs anyway, so calling the message just UDF without any quilifier seems redundant, if it was called PythonUserDefinedFunction then it would be okay. I guess what I'm saying is I'm equally okay with the trio of (PythonTransformation, SubstraitTransformation, PandasTransformation) and with that of (PythonUDF, SubstraitUDF and PandasUDF).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyary @etirelli any opinions here? I am in favor of user_defined_function and the code for this PR is ready otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy consensus will win here. I'm going to merge as is since everything's covered now.

OnDemandSubstraitTransformationV2 on_demand_substrait_transformation = 2;
SubstraitTransformationV2 substrait_transformation = 2;
}
}

message OnDemandSubstraitTransformationV2 {
message SubstraitTransformationV2 {
bytes substrait_plan = 1;
}
28 changes: 14 additions & 14 deletions sdk/python/feast/on_demand_feature_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
from feast.feature_view import FeatureView
from feast.feature_view_projection import FeatureViewProjection
from feast.field import Field, from_value_type
from feast.on_demand_pandas_transformation import OnDemandPandasTransformation
from feast.on_demand_substrait_transformation import OnDemandSubstraitTransformation
from feast.protos.feast.core.OnDemandFeatureView_pb2 import (
OnDemandFeatureView as OnDemandFeatureViewProto,
)
Expand All @@ -33,6 +31,8 @@
from feast.protos.feast.core.Transformation_pb2 import (
UserDefinedFunctionV2 as UserDefinedFunctionProto,
)
from feast.transformation.pandas_transformation import PandasTransformation
from feast.transformation.substrait_transformation import SubstraitTransformation
from feast.type_map import (
feast_value_type_to_pandas_type,
python_type_to_feast_value_type,
Expand Down Expand Up @@ -68,8 +68,8 @@ class OnDemandFeatureView(BaseFeatureView):
features: List[Field]
source_feature_view_projections: Dict[str, FeatureViewProjection]
source_request_sources: Dict[str, RequestSource]
transformation: Union[OnDemandPandasTransformation]
feature_transformation: Union[OnDemandPandasTransformation]
transformation: Union[PandasTransformation]
feature_transformation: Union[PandasTransformation]
franciscojavierarceo marked this conversation as resolved.
Show resolved Hide resolved
description: str
tags: Dict[str, str]
owner: str
Expand All @@ -89,8 +89,8 @@ def __init__( # noqa: C901
],
udf: Optional[FunctionType] = None,
udf_string: str = "",
transformation: Optional[Union[OnDemandPandasTransformation]] = None,
feature_transformation: Optional[Union[OnDemandPandasTransformation]] = None,
transformation: Optional[Union[PandasTransformation]] = None,
feature_transformation: Optional[Union[PandasTransformation]] = None,
description: str = "",
tags: Optional[Dict[str, str]] = None,
owner: str = "",
Expand Down Expand Up @@ -129,7 +129,7 @@ def __init__( # noqa: C901
"udf and udf_string parameters are deprecated. Please use transformation=OnDemandPandasTransformation(udf, udf_string) instead.",
DeprecationWarning,
)
transformation = OnDemandPandasTransformation(udf, udf_string)
transformation = PandasTransformation(udf, udf_string)
else:
raise Exception(
"OnDemandFeatureView needs to be initialized with either transformation or udf arguments"
Expand Down Expand Up @@ -219,10 +219,10 @@ def to_proto(self) -> OnDemandFeatureViewProto:

feature_transformation = FeatureTransformationProto(
user_defined_function=self.transformation.to_proto()
if type(self.transformation) == OnDemandPandasTransformation
if type(self.transformation) == PandasTransformation
else None,
on_demand_substrait_transformation=self.transformation.to_proto()
if type(self.transformation) == OnDemandSubstraitTransformation
if type(self.transformation) == SubstraitTransformation
else None, # type: ignore
)
spec = OnDemandFeatureViewSpec(
Expand Down Expand Up @@ -276,7 +276,7 @@ def from_proto(cls, on_demand_feature_view_proto: OnDemandFeatureViewProto):
and on_demand_feature_view_proto.spec.feature_transformation.user_defined_function.body_text
!= ""
):
transformation = OnDemandPandasTransformation.from_proto(
transformation = PandasTransformation.from_proto(
on_demand_feature_view_proto.spec.feature_transformation.user_defined_function
)
elif (
Expand All @@ -285,7 +285,7 @@ def from_proto(cls, on_demand_feature_view_proto: OnDemandFeatureViewProto):
)
== "on_demand_substrait_transformation"
):
transformation = OnDemandSubstraitTransformation.from_proto(
transformation = SubstraitTransformation.from_proto(
on_demand_feature_view_proto.spec.feature_transformation.on_demand_substrait_transformation
)
elif (
Expand All @@ -298,7 +298,7 @@ def from_proto(cls, on_demand_feature_view_proto: OnDemandFeatureViewProto):
body=on_demand_feature_view_proto.spec.user_defined_function.body,
body_text=on_demand_feature_view_proto.spec.user_defined_function.body_text,
)
transformation = OnDemandPandasTransformation.from_proto(
transformation = PandasTransformation.from_proto(
user_defined_function_proto=backwards_compatible_udf,
)
else:
Expand Down Expand Up @@ -540,13 +540,13 @@ def decorator(user_function):

expr = user_function(ibis.table(input_fields, "t"))

transformation = OnDemandSubstraitTransformation(
transformation = SubstraitTransformation(
substrait_plan=compiler.compile(expr).SerializeToString()
)
else:
udf_string = dill.source.getsource(user_function)
mainify(user_function)
transformation = OnDemandPandasTransformation(user_function, udf_string)
transformation = PandasTransformation(user_function, udf_string)

on_demand_feature_view_obj = OnDemandFeatureView(
name=user_function.__name__,
Expand Down
10 changes: 4 additions & 6 deletions sdk/python/feast/stream_feature_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
from feast.entity import Entity
from feast.feature_view import FeatureView
from feast.field import Field
from feast.on_demand_pandas_transformation import OnDemandPandasTransformation
from feast.protos.feast.core.DataSource_pb2 import DataSource as DataSourceProto
from feast.protos.feast.core.OnDemandFeatureView_pb2 import (
UserDefinedFunction as UserDefinedFunctionProto,
Expand All @@ -32,6 +31,7 @@
from feast.protos.feast.core.Transformation_pb2 import (
UserDefinedFunctionV2 as UserDefinedFunctionProtoV2,
)
from feast.transformation.pandas_transformation import PandasTransformation

warnings.simplefilter("once", RuntimeWarning)

Expand Down Expand Up @@ -80,7 +80,7 @@ class StreamFeatureView(FeatureView):
materialization_intervals: List[Tuple[datetime, datetime]]
udf: Optional[FunctionType]
udf_string: Optional[str]
feature_transformation: Optional[OnDemandPandasTransformation]
feature_transformation: Optional[PandasTransformation]

def __init__(
self,
Expand All @@ -99,7 +99,7 @@ def __init__(
timestamp_field: Optional[str] = "",
udf: Optional[FunctionType] = None,
udf_string: Optional[str] = "",
feature_transformation: Optional[Union[OnDemandPandasTransformation]] = None,
feature_transformation: Optional[Union[PandasTransformation]] = None,
):
if not flags_helper.is_test():
warnings.warn(
Expand Down Expand Up @@ -371,9 +371,7 @@ def decorator(user_function):
schema=schema,
udf=user_function,
udf_string=udf_string,
feature_transformation=OnDemandPandasTransformation(
user_function, udf_string
),
feature_transformation=PandasTransformation(user_function, udf_string),
description=description,
tags=tags,
online=online,
Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
)


class OnDemandPandasTransformation:
class PandasTransformation:
def __init__(self, udf: FunctionType, udf_string: str = ""):
"""
Creates an OnDemandPandasTransformation object.
Expand All @@ -25,7 +25,7 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
return self.udf.__call__(df)

def __eq__(self, other):
if not isinstance(other, OnDemandPandasTransformation):
if not isinstance(other, PandasTransformation):
raise TypeError(
"Comparisons should only involve OnDemandPandasTransformation class objects."
)
Expand All @@ -47,7 +47,7 @@ def to_proto(self) -> UserDefinedFunctionProto:

@classmethod
def from_proto(cls, user_defined_function_proto: UserDefinedFunctionProto):
return OnDemandPandasTransformation(
return PandasTransformation(
udf=dill.loads(user_defined_function_proto.body),
udf_string=user_defined_function_proto.body_text,
)
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
)


class OnDemandSubstraitTransformation:
class SubstraitTransformation:
def __init__(self, substrait_plan: bytes):
"""
Creates an OnDemandSubstraitTransformation object.
Expand All @@ -27,7 +27,7 @@ def table_provider(names, schema: pyarrow.Schema):
return table.to_pandas()

def __eq__(self, other):
if not isinstance(other, OnDemandSubstraitTransformation):
if not isinstance(other, SubstraitTransformation):
raise TypeError(
"Comparisons should only involve OnDemandSubstraitTransformation class objects."
)
Expand All @@ -45,6 +45,6 @@ def from_proto(
cls,
on_demand_substrait_transformation_proto: OnDemandSubstraitTransformationProto,
):
return OnDemandSubstraitTransformation(
return SubstraitTransformation(
substrait_plan=on_demand_substrait_transformation_proto.substrait_plan
)
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
)
from feast.data_source import DataSource, RequestSource
from feast.feature_view_projection import FeatureViewProjection
from feast.on_demand_feature_view import OnDemandPandasTransformation
from feast.on_demand_feature_view import PandasTransformation
from feast.types import Array, FeastType, Float32, Float64, Int32, Int64
from tests.integration.feature_repos.universal.entities import (
customer,
Expand Down Expand Up @@ -71,7 +71,7 @@ def conv_rate_plus_100_feature_view(
name=conv_rate_plus_100.__name__,
schema=[] if infer_features else _features,
sources=sources,
transformation=OnDemandPandasTransformation(
transformation=PandasTransformation(
udf=conv_rate_plus_100, udf_string="raw udf source"
),
)
Expand Down Expand Up @@ -110,7 +110,7 @@ def similarity_feature_view(
name=similarity.__name__,
sources=sources,
schema=[] if infer_features else _fields,
transformation=OnDemandPandasTransformation(
transformation=PandasTransformation(
udf=similarity, udf_string="similarity raw udf"
),
)
Expand Down
27 changes: 7 additions & 20 deletions sdk/python/tests/unit/test_on_demand_feature_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,7 @@
from feast.feature_view import FeatureView
from feast.field import Field
from feast.infra.offline_stores.file_source import FileSource
from feast.on_demand_feature_view import (
OnDemandFeatureView,
OnDemandPandasTransformation,
)
from feast.on_demand_feature_view import OnDemandFeatureView, PandasTransformation
from feast.types import Float32


Expand Down Expand Up @@ -59,9 +56,7 @@ def test_hash():
Field(name="output1", dtype=Float32),
Field(name="output2", dtype=Float32),
],
transformation=OnDemandPandasTransformation(
udf=udf1, udf_string="udf1 source code"
),
transformation=PandasTransformation(udf=udf1, udf_string="udf1 source code"),
)
on_demand_feature_view_2 = OnDemandFeatureView(
name="my-on-demand-feature-view",
Expand All @@ -70,9 +65,7 @@ def test_hash():
Field(name="output1", dtype=Float32),
Field(name="output2", dtype=Float32),
],
transformation=OnDemandPandasTransformation(
udf=udf1, udf_string="udf1 source code"
),
transformation=PandasTransformation(udf=udf1, udf_string="udf1 source code"),
)
on_demand_feature_view_3 = OnDemandFeatureView(
name="my-on-demand-feature-view",
Expand All @@ -81,9 +74,7 @@ def test_hash():
Field(name="output1", dtype=Float32),
Field(name="output2", dtype=Float32),
],
transformation=OnDemandPandasTransformation(
udf=udf2, udf_string="udf2 source code"
),
transformation=PandasTransformation(udf=udf2, udf_string="udf2 source code"),
)
on_demand_feature_view_4 = OnDemandFeatureView(
name="my-on-demand-feature-view",
Expand All @@ -92,9 +83,7 @@ def test_hash():
Field(name="output1", dtype=Float32),
Field(name="output2", dtype=Float32),
],
transformation=OnDemandPandasTransformation(
udf=udf2, udf_string="udf2 source code"
),
transformation=PandasTransformation(udf=udf2, udf_string="udf2 source code"),
description="test",
)
on_demand_feature_view_5 = OnDemandFeatureView(
Expand Down Expand Up @@ -126,7 +115,7 @@ def test_hash():
}
assert len(s4) == 3

assert on_demand_feature_view_5.transformation == OnDemandPandasTransformation(
assert on_demand_feature_view_5.transformation == PandasTransformation(
udf2, "udf2 source code"
)
assert (
Expand Down Expand Up @@ -155,9 +144,7 @@ def test_from_proto_backwards_compatable_udf():
Field(name="output1", dtype=Float32),
Field(name="output2", dtype=Float32),
],
transformation=OnDemandPandasTransformation(
udf=udf1, udf_string="udf1 source code"
),
transformation=PandasTransformation(udf=udf1, udf_string="udf1 source code"),
)

# We need a proto with the "udf1 source code" in the user_defined_function.body_text
Expand Down
Loading