Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python Function-in-Formula Benchmarks #132

Open
stanbrub opened this issue Jun 7, 2023 · 0 comments
Open

Add Python Function-in-Formula Benchmarks #132

stanbrub opened this issue Jun 7, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@stanbrub
Copy link
Collaborator

stanbrub commented Jun 7, 2023

Calling python functions from within query formulas (e.g. in update()) is extremely slow, whether they are DH-provided or user-provided functions. Add some benchmarks to capture this.

(This issue is becoming more relevant as customers are being pushed in the direction of Python more than Groovy.)

Notes On User-Defined Functions: (From JianFeng)

  • Has not been scheduled yet
  • The improvement on UDF could be a huge amount of work (very risky too), not really scoped yet, but the recent partial vectorization work has achieved 4 - 20 times performance gain depending on how much time a specific UDF takes
  • Translating a Python wrapper function into a Java static method call isn’t trivial either, since that basically eliminates the boundary crossing between the JVM and the Python interpreter. I think we can confidently say that the performance will be much much better
  • No specific ticket for the UDF improvement
  • Ticket for Python-to-Java method translation: Python Wrapper Properties include Java static method  deephaven-core#2306
  • Look at using vectorize

Note from Chip:

Note From Ryan:

  • Translations are being added to convert Python-equivalent date time utils that are already benchmarked. No need to test all of them. Pick a much-used one and make a test for that.

Paul provided some examples between Python and Groovy. Benchmark does not use Groovy but that could be added in the future.

Python:

tb_out=db.historical_table("FeedOS", "EquityQuoteL1") \
    .where("Date=`2023-05-31`") \
    .update("Open_Size = (int)AskSize")
def Rounding(x):
    out=100*int(x/100)+100*((x%100)>=50)
    if out==0:
        out=100
        return out
    else :
        return out
open_1=tb_out.where("!isNull(Open_Size)").update(["Date","LocalCodeStr","Open_Size= (int)(Rounding.call(Open_Size))"])

Groovy:

tb_out=db.t("FeedOS", "EquityQuoteL1") \
    .where("Date=`2023-05-31`") \
    .update("Open_Size = (int)AskSize")
Rounding={ int x  ->
    out=100*(int)(x/100)+100*((x%100) >= 50 ? 1 : 0)
    return out==0 ? 100 : out
}
open_1=tb_out.where("!isNull(Open_Size)").update("Date","LocalCodeStr","Open_Size= Rounding.call(Open_Size)")

An improvement to the Python snippet:

import numpy as np

tb_out=db.historical_table("FeedOS", "EquityQuoteL1") \
    .where("Date=`2023-05-31`") \
    .update("Open_Size = (long)AskSize")

def Rounding(x) -> np.int32:
    out=100*int(x/100)+100*((x%100)>=50)
    if out==0:
        out=100
        return out
    else :
        return out

open_1=tb_out.where("!isNull(Open_Size)").update(["Date","LocalCodeStr","Open_Size=Rounding(Open_Size)"])
@stanbrub stanbrub added the enhancement New feature or request label Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant