-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of prefixed column resolution #1115
Improve performance of prefixed column resolution #1115
Conversation
- Previously this allocated arrays every time it was accessed. It will now only allocate in one of the error cases. - It also keeps track of the columnName index for both key and value so it doesn't need to look it up twice.
Thanks for submitting a PR. Please use |
@jberkel fixed. |
Wouldn't it be simpler to just change line 1172 to let similar = columnNames.keys.filter { $0.hasSuffix(".\(column.template)") } This avoids the allocation of the array with all columns. In the most common case (empty array) no allocation will be made. |
I am less certain about this, but I believe the most common case is
Filter on dictionary keys creates an array. We could use the If the
With the proposed changes we only apply the predicate to the entire list once, and we don't have to lookup the entry in the dictionary afterwards. Now, this kind of analysis is a definitely a little overboard for most code, but I think this fix is pretty straightforward. |
Yes, it will create an array, but one with just one element in non-error cases. I'm not sure if your changes really have a noticeable benefit. The array is allocated on the stack, so this is a cheap operation, and I doubt that this is slower than two separate |
@jberkel If I demonstrate it is faster, will you include it? Or do you still have other concerns? |
If it's faster, yes. A straightforward change would be to avoid the allocation of an array containing all the column names. ( |
@jberkel Just did a test where I pull a single column from about 50,000 rows. I measure the length of time using sign posts in instruments:
The runtimes for a few trials were the following: Filter keys
Finding index
Explanation
|
nice, that looks like a substantial improvement. How many columns are in your test data? There's not just the allocation, but also the iteration over all elements. |
@jberkel that's a good question. I just had 4 columns in this query result. I expect both algorithms to scale the same in the number of columns. Both visit each column exactly once, in order. Both apply a predicate on each visit. |
You're right, that leaves the allocation overhead. Maybe the stack allocation isn't as fast they claim it is. Thanks for investigating! |
For issue #1109