Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A proposed select() that could achieve high-performance #504

Closed
johnmyleswhite opened this issue Jan 25, 2014 · 5 comments
Closed

A proposed select() that could achieve high-performance #504

johnmyleswhite opened this issue Jan 25, 2014 · 5 comments
Labels

Comments

@johnmyleswhite
Copy link
Contributor

I've thought a lot about getting delayed evaluation like semantics in a sane way lately since I know that people really want them.

So I'd like to propose that we take SQL's lead when we return to those tricks and use the following ideas:

(1) Maintain a proper DSL that's passed in as strings. For precedent, see Pandas' experimental eval function.

(2) Interact with variables in the calling scope using placeholders: query("select * from df where colA > ?", a).

We can make this sort of thing blaze if we adopt the prepare/exec distinction that most RDBMS's offer, where prepare compiles the clause and exec substitutes values for placeholders in tight loops.

I don't intend to work on this for a long time, but I think this proposal is truly viable. It involves writing some proper parsing code, but removes all the crazy mixed scope issues that made expression indexing hard.

@nalimilan
Copy link
Member

I'd much prefer something more integrated with the language as discussed in #381. As I see it, Pandas implemented query() using a plain string simply because Python does not offer a native way of handling this as an expression in the first place (I may be wrong as I don't know Python that much). Since Julia starts from scratch and offers many possibilities as regards metaprogramming, it should be possible to achieve a much better design.

The SQL syntax may be nice for people who are used to it, but the default way of selecting observations should be closer to the Julia syntax IMHO. For example, giving * a different meaning when it appears after select is crazy. Being forced to use placeholders for variables in the calling scope is also painful (after all, that's why Julia offers string interpolation using $: it's much clearer than listing arguments after the string).

@johnmyleswhite
Copy link
Contributor Author

The beauty of placeholders is that they don't involve reaching outside of a function's scope. How are you going to avoid placeholders while not violating scope boundaries?

@kmsquire
Copy link
Contributor

I'll just add that I share Milan's sentiment, in that I would prefer a
more linq-like interface.

Cheers!

@johnmyleswhite
Copy link
Contributor Author

Ok. Then let's get to 1.0 before we consider LINQ-like behavior.

@nalimilan
Copy link
Member

We have DataFramesMeta now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants