Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace use of .size with .length for Arrays #5376

Closed
wants to merge 1 commit into from
Closed

Replace use of .size with .length for Arrays #5376

wants to merge 1 commit into from

Conversation

sksamuel
Copy link
Contributor

@sksamuel sksamuel commented Apr 6, 2015

Invoking .size on arrays is valid, but requires an implicit conversion to SeqLike. This incurs a compile time overhead and more importantly a runtime overhead, as the Array must be wrapped before the method can be invoked. For example, the difference in generated byte code is:

public int withSize();
Code:
0: getstatic #23 // Field scala/Predef$.MODULE$:Lscala/Predef$;
3: aload_0
4: invokevirtual #25 // Method array:()[I
7: invokevirtual #29 // Method scala/Predef$.intArrayOps:([I)Lscala/collection/mutable/ArrayOps;
10: invokeinterface #34, 1 // InterfaceMethod scala/collection/mutable/ArrayOps.size:()I
15: ireturn

public int withLength();
Code:
0: aload_0
1: invokevirtual #25 // Method array:()[I
4: arraylength
5: ireturn

Invoking .size on arrays is valid, but requires an implicit conversion to SeqLike. This incurs a compile time overhead and more importantly a runtime overhead, as the Array must be wrapped before the method can be invoked. For example, the difference in generated byte code is:

  public int withSize();
    Code:
       0: getstatic     #23                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
       3: aload_0
       4: invokevirtual #25                 // Method array:()[I
       7: invokevirtual #29                 // Method scala/Predef$.intArrayOps:([I)Lscala/collection/mutable/ArrayOps;
      10: invokeinterface #34,  1           // InterfaceMethod scala/collection/mutable/ArrayOps.size:()I
      15: ireturn

  public int withLength();
    Code:
       0: aload_0
       1: invokevirtual #25                 // Method array:()[I
       4: arraylength
       5: ireturn
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@sryza
Copy link
Contributor

sryza commented Apr 6, 2015

This seems reasonable and mostly straightforward to me. What was your method for finding all the instances of this?

@sksamuel
Copy link
Contributor Author

sksamuel commented Apr 6, 2015

I found the instances by grepping for .size and then checking if the receiver was declared as an Array.

@sksamuel
Copy link
Contributor Author

sksamuel commented Apr 6, 2015

In addition I may have missed some, but I think I've got the ones in the classes where performance is paramount.

@srowen
Copy link
Member

srowen commented Apr 7, 2015

So I proposed this and a few other similar changes a while ago, and the feedback was that it wasn't worth the merge conflicts to save a few opcodes. It is a little faster to call .length for an array even if .size is more consistent with the rest of the collections API, and I'd write .length in new code or change it if I were touching some nearby code anyway. I'd also think we should make this change in tight loops.

So on that basis, do you know of any case here where performance is non-trivially improved?
CC @JoshRosen for a gut check on whether it's worth changing it all, anyway.

@srowen
Copy link
Member

srowen commented Apr 7, 2015

(This is on the border of where I'd make a JIRA instead of just opening a PR)

@sksamuel
Copy link
Contributor Author

sksamuel commented Apr 7, 2015

Noted (re: Jira) for future. I think its worth doing though because its relatively a trivial change and there's no real drawbacks that I know. We can save the bytecode and also the object allocations.

@JoshRosen
Copy link
Contributor

The performance impact is probably negligible in most of these cases, since most of these methods are invoked only once or twice on the driver, but using length instead of size could have a large impact for code in hot loops that get called O(numRecords) times. Unless I'm overlooking something, though, it doesn't look like any of the instances here are occurring in performance-sensitive code.

Did you check to see whether there are any occurrences of .size outside of core, perhaps in SQL or MLlib, where the performance benefit might be greater?

@JoshRosen
Copy link
Contributor

This change doesn't really have any downside, though, so I suppose it doesn't hurt (besides merge conflicts, but they should be trivial to resolve in this case). Do you know if we can configure our linter to enforce the use of .length instead of .size for arrays? If so, it might be nice to add a rule so that we continue to use length in all new code.

@rxin
Copy link
Contributor

rxin commented Apr 7, 2015

scalastyle is syntactic I think -- it doesn't understand semantics, and doesn't have the type tree, so it can't really do that.

@rxin
Copy link
Contributor

rxin commented Apr 7, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 7, 2015

Test build #29783 has started for PR 5376 at commit 77ec261.

@SparkQA
Copy link

SparkQA commented Apr 7, 2015

Test build #29783 has finished for PR 5376 at commit 77ec261.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29783/
Test PASSed.

@sksamuel
Copy link
Contributor Author

sksamuel commented Apr 7, 2015

Scalastyle is mainly syntactic as @rxin says. Have you guys considered https://github.com/sksamuel/scalac-scapegoat-plugin it's a linter in the proper sense. You would probably need to configure your own custom list of inspections rather than use the default.

@srowen
Copy link
Member

srowen commented Apr 7, 2015

(FWIW, IntelliJ's Scala inspections are pretty good and included with the free version. It definitely flags this one, so, would be happy in a very small way to see the yellow flags go away)

@vanzin
Copy link
Contributor

vanzin commented Apr 7, 2015

Just wanted to leave a comment that, while I think this change is good (and good practice in general), comparing the generated bytecode is a little misleading. In hot paths, hotspot would probably optimize the difference away, since the extra operations have no side-effects.

@sksamuel
Copy link
Contributor Author

sksamuel commented Apr 7, 2015

In general Hotspot is good at optimizing away unneeded allocations but in this case, since you're invoking a method on the allocated object, you're actually asking hotspot to realize that the method is a delegate and the delegation call can be flattened (I hestitate to use the word inlined so we don't go off topic on inlining of local functions which hotspot does). I don't think Hotspot can do this currently, but would be impressed to learn otherwise.

@rxin
Copy link
Contributor

rxin commented Apr 7, 2015

Merging in master. Thanks.

@asfgit asfgit closed this in 2c32bef Apr 7, 2015
@rxin
Copy link
Contributor

rxin commented Apr 7, 2015

@sksamuel how does scapegoat compare with the new abide project?

@sksamuel
Copy link
Contributor Author

sksamuel commented Apr 7, 2015

@rxin Both very similar. Abide is an official Typesafe project, but has less inspections. When I started scapegoat there was no Abide. I talked to Mr Dragos at ScalaExchange in December about merging but I've not made any progress on that due to working on a very exciting Spark project for my tier 1 bank which has taken all my time :)

@dragos
Copy link
Contributor

dragos commented Apr 10, 2015

@sksamuel is right. That being said, abide wasn't released yet. Once we have a first version out I can give it a try on the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants