FunctionBodyLengthRule should not count comment lines #330

marcelofabri · 2016-01-08T23:16:56Z

Fixes #258.

jpsim · 2016-01-08T23:20:29Z

Source/SwiftLintFramework/Extensions/File+SwiftLint.swift

+            let kindsInRange = syntax.tokens.filter { token in
+                let tokenLine = contents.lineAndCharacterForByteOffset(token.offset)
+                return tokenLine?.line == line.index
+                }.map({ $0.type }).flatMap(SyntaxKind.init)


indentation is off here

jpsim · 2016-01-09T00:02:03Z

Thanks for doing this @marcelofabri, this makes the rule much more useful!

I was a bit worried about the impact this would have on performance, and turns out it's a fairly large one 😬.

As a reference point, the testSwiftLintLints integration test takes 2.817s to run on the master branch right now.

With this PR, it takes 46.092s 😱.

By changing the syntaxKindsByLine function to this, that goes down to 6.815s:

public func syntaxKindsByLine(startLine: Int? = nil,
                              endLine: Int? = nil) -> [(Int, [SyntaxKind])] {
    let contents = self.contents as NSString
    let linesAndKinds = syntaxMap.tokens.map { token -> (Int, SyntaxKind) in
        let tokenLine = contents.lineAndCharacterForByteOffset(token.offset)
        return (tokenLine!.line, SyntaxKind(rawValue: token.type)!)
    }.filter { line, _ in
        return line >= startLine && line <= endLine
    }
    var results = [Int: [SyntaxKind]]()
    for lineAndKind in linesAndKinds {
        results[lineAndKind.0] = (results[lineAndKind.0] ?? []) + [lineAndKind.1]
    }
    return Array(zip(results.keys, results.values))
}

However, I still think that's too much of a performance regression, especially for larger projects.

One thing we could do to minimize the performance impact here would be to only count the number of comment-only lines after confirming that the function exceeds the maximum line length when accounting for all lines, which is much faster. For example, in FunctionBodyLengthRule.validateFile(...):

where endLine - startLine > limit && lineCount(file, startLine: startLine, endLine: endLine) > limit {

Actually we could move that to its own function:

...
where exceedsLineCountExcludingComments(file, startLine, endLine, limit)
...
private func exceedsLineCountExcludingComments(file: File, _ start: Int, _ end: Int,
                                               _ limit: Int) -> Bool {
    return end - start > limit && lineCount(file, startLine: start, endLine: end) > limit
}

In this case, SwiftLint goes back to linting in 2.866s, although that's not exactly a great benchmark now because SwiftLint's functions never exceed 40 lines, so the benchmark doesn't hit the slow path.

Could you please apply these changes to help reduce the performance impact? If you find different ways to achieve similar performance gains, I'm certainly open to that too!

marcelofabri · 2016-01-09T00:52:04Z

@jpsim I think that if we called syntaxKindsByLine only once (with the full file) we'd have a performance gain. However, I don't see an easy way now to do it. We could hack it by adding an associated object on the File object, but we probably should do something better.

Anyway, I have committed the changes you proposed. I was worried about performance too 😬

I know that it deserves another issue, but we should think about adding performance tests to the project, even linting other projects.

marcelofabri · 2016-01-09T01:52:17Z

By using the associated objects approach, I was able to make it run in ~ 3.401s (without the exceedsLineCountExcludingComments shortcut), against 2.744s with the shortcut.

I believe it could be better if I could drop the conversion needed to store/read the associated object:

private var syntaxKindsByLines: [(Int, [SyntaxKind])] {
    if let lines = objc_getAssociatedObject(self, &keyLines) as? [Int],
        syntaxKinds = objc_getAssociatedObject(self, &keySyntaxKinds) as? [[String]] {
        return Array(zip(lines, syntaxKinds.map { $0.flatMap(SyntaxKind.init) }))
    }

    let newValue = syntaxKindsByLine()
    let lines = newValue.map { $0.0 }
    let kinds = newValue.map { $0.1.map { $0.rawValue } }
    objc_setAssociatedObject(self, &keyLines, lines,
        .OBJC_ASSOCIATION_RETAIN_NONATOMIC)
    objc_setAssociatedObject(self, &keySyntaxKinds, kinds,
        .OBJC_ASSOCIATION_RETAIN_NONATOMIC)

    return newValue
}

However, this made the tests actually slower with the shortcut (because I called syntaxKindsByLine in validateFile). If I move it to an inner place (i.e. numberOfCommentOnlyLines), it goes back to ~2.8s, but the version without the shortcut is a bit slower (~3.6s).

norio-nomura · 2016-01-09T04:29:06Z

Calling counting line can be reduced by changing position of parameter enumeration.
from:

// enumerate parameters
for parameter in parameters.reverse() {
    // count lines
    let lineCountExcludingComments =…
    // check violation
    if lineCountExcludingComments > parameter.value {
        …
    }
}

to:

// count lines
let lineCountExcludingComments =…
// enumerate parameters
for parameter in parameters.reverse() {
    // check violation
    if LineCountExcludingComments > parameter.value {
        …
    }
}

norio-nomura · 2016-01-09T05:09:31Z

Source/SwiftLintFramework/Extensions/File+SwiftLint.swift

+        let contents = self.contents as NSString
+        let kindsWithLines = syntaxMap.tokens.map { token -> (Int, SyntaxKind) in
+            let tokenLine = contents.lineAndCharacterForByteOffset(token.offset)
+            return (tokenLine!.line, SyntaxKind(rawValue: token.type)!)


SyntaxKinds of whole the file are created here on every checking functions in the file.
.filter should be applied before creation for reducing them.

This is because currently .filter needs the line. If we filtered first, lineAndCharacterForByteOffset would be called twice.

@marcelofabri it might be better to avoid eagerly creating the SyntaxKinds by mapping twice:

let kindsWithLines = syntaxMap.tokens.map { token -> (Int, SyntaxToken) in let tokenLine = contents.lineAndCharacterForByteOffset(token.offset) return (tokenLine!.line, token) }.filter { line, token in return line >= (startLine ?? 0) && line <= (endLine ?? Int.max) }.map { (line, token) -> (Int, SyntaxKind) in return (line, SyntaxKind(rawValue: token.type)!) }

jpsim · 2016-01-09T06:13:23Z

I know that it deserves another issue, but we should think about adding performance tests to the project, even linting other projects.

How do you recommend we do that? XCTest supports performance tests, but those only really matter when running on a given device. What we really want is to compare with running on the previous version of the test running on the exact same device, although I don't think there's any elegant way to set that up.

marcelofabri · 2016-01-11T07:56:07Z

@jpsim I haven't thought about it a lot, but I was thinking about XCTest performance tests. I'm still not sure why it wouldn't work. Is it because Travis won't have a baseline to compare to?

jpsim · 2016-01-11T15:55:23Z

@jpsim I haven't thought about it a lot, but I was thinking about XCTest performance tests. I'm still not sure why it wouldn't work. Is it because Travis won't have a baseline to compare to?

XCTest performance tests compare the test output with a previously recorded baseline that's associated with a unique device identifier (in this case, the Mac on which these tests are running). This wouldn't work on Travis because they use VMs to run tests, and they're not guaranteed to have the same unique device ID. Even if/when they did, the fact that these are running on shared hardware means that we can't really trust a single run of performance tests.

I'm thinking a different way to measure performance in PRs would be to install the latest version of SwiftLint through homebrew and compare the total time to lint SwiftLint from the latest release and the current PR. Ideally Travis would be able to post back a comment to the PR with the before & after lint times for us to review. Even better would be to have a way to run SwiftLint in a benchmarking mode that logs out the time spent in each rule.

jpsim · 2016-01-11T16:00:58Z

@marcelofabri this is looking good. What are the next steps (if any) you want to take with this PR?

norio-nomura · 2016-01-11T16:16:39Z

Sorry, my saying too much about performance. 🙇
I also think this is looking good.

jpsim · 2016-01-11T18:12:44Z

Source/SwiftLintFramework/Extensions/File+SwiftLint.swift

@@ -90,6 +90,22 @@ extension File {
        }
    }

+    internal func syntaxKindsByLine(startLine: Int? = nil,
+                                  endLine: Int? = nil) -> [(Int, [SyntaxKind])] {


indentation is off here

marcelofabri · 2016-01-12T06:15:05Z

@jpsim @norio-nomura I've added a new property on File+Cache to cache this and I'm pretty happy with the performance now.

Could you guys please take another look and share your thoughts?

norio-nomura · 2016-01-12T08:13:28Z

Source/SwiftLintFrameworkTests/ASTRuleTests.swift

+        let longerFunctionBodyWithComments = "func abc() {" +
+            Repeat(count: 40, repeatedValue: " // this is a comment\n").joinWithSeparator("") +
+            "}\n"
+        XCTAssertEqual(violations(longerFunctionBodyWithComments), [])


Sorry, I had overlooked that the tests is not enough.
Maybe it should pass following tests:

let longFunctionBodyWithComments = "func abc() {" + Repeat(count: 40, repeatedValue: "\n").joinWithSeparator("") + "// comment only line should be ignored.\n" + "}\n" XCTAssertEqual(violations(longFunctionBodyWithComments), []) let longerFunctionBodyWithComments = "func abc() {" + Repeat(count: 41, repeatedValue: "\n").joinWithSeparator("") + "// comment only line should be ignored.\n" + "}\n" XCTAssertEqual(violations(longerFunctionBodyWithComments), [StyleViolation( ruleDescription: FunctionBodyLengthRule.description, location: Location(file: nil, line: 1, character: 1), reason: "Function body should span 40 lines or less: currently spans 41 lines")]) let longFunctionBodyWithMultilineComments = "func abc() {" + Repeat(count: 40, repeatedValue: "\n").joinWithSeparator("") + "/* multi line comment only line should be ignored.\n*/\n" + "}\n" XCTAssertEqual(violations(longFunctionBodyWithMultilineComments), []) let longerFunctionBodyWithMultilineComments = "func abc() {" + Repeat(count: 41, repeatedValue: "\n").joinWithSeparator("") + "/* multi line comment only line should be ignored.\n*/\n" + "}\n" XCTAssertEqual(violations(longerFunctionBodyWithMultilineComments), [StyleViolation( ruleDescription: FunctionBodyLengthRule.description, location: Location(file: nil, line: 1, character: 1), reason: "Function body should span 40 lines or less: currently spans 41 lines")])

Shouldn't longerFunctionBodyWithCommentsand longerFunctionBodyWithMultilineComments not trigger any violations?

Forget that, I didn't realize that the comments were outside Repeat. My bad!

That tests actually caught a bug: the number of lines that the function spans in the reason. Should we update it to subtract the comment only lines? This might be confusing for someone who isn't very familiar with the rule.

Also, if we follow that idea, any ideas on how to return the "fixed" line count on exceedsLineCountExcludingComments? I was thinking about returning a tuple, but I'm not sure if that's the best way.

This might be confusing for someone who isn't very familiar with the rule.

I think it would be better reason explaining that comment only lines are ignored.

That tests actually caught a bug

As I tested, SourceKitten returns one token for one multiline comment block. But your code expects that every tokens are in one line. So, counting ignored comment only lines is not correct on multiline comment.

We should fill in missing lines in syntaxKindsByLines, which will also help avoid counting whitespace-only lines in the function body line count.

I've committed ec2be78 which should address these issues. As @jpsim (implicitly) suggested, the rule now ignores whitespace-only lines.

Also, I had to separate the tests to another file (FunctionBodyLengthRuleTests), because the integration tests were failing.

jpsim · 2016-01-15T00:52:20Z

@marcelofabri @norio-nomura this PR seems fine to me now, other than a rebase, what else needs to be done?

norio-nomura · 2016-01-15T01:58:58Z

Maybe making the reason message clearer by adding explains that comment and whitespace only lines are excluded.
Everyone does not know how the rules counting the lines.

…ason

codecov-io · 2016-01-15T04:38:52Z

Current coverage is `89.58%`

Merging #330 into master will increase coverage by +0.32% as of a9fe99a

@@            master    #330   diff @@
======================================
  Files           57      57       
  Stmts         1481    1527    +46
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           1322    1368    +46
  Partial          0       0       
  Missed         159     159

Review entire Coverage Diff as of a9fe99a

Uncovered Suggestions

+0.85% via ...eMinLengthRule.swift#16...28
+0.79% via ...Yaml+SwiftLint.swift#49...60
+0.53% via .../Configuration.swift#28...35
See 7 more...

Powered by Codecov. Updated on successful CI builds.

marcelofabri · 2016-01-15T04:47:50Z

I've updated the reason message to

Function body should span 40 lines or less: currently spans 41 lines (already ignoring comment and whitespace only ones)

What do you guys think?

jpsim · 2016-01-15T05:06:30Z

Maybe this instead?

Function body should span 40 lines or less excluding comments and whitespace: currently spans 41 lines.

jpsim · 2016-01-15T05:08:20Z

@scottrhoyt I think we should disable codecov's automated comments and just have it as a GitHub status check instead. The automated comments are a bit too noisy for my taste.

marcelofabri · 2016-01-15T05:25:02Z

@jpsim Much better! Already updated on 75a4f2f.

jpsim · 2016-01-15T05:55:38Z

👏 thanks!!!

…ents FunctionBodyLengthRule should not count comment lines

norio-nomura · 2016-01-15T05:57:38Z

🎉

marcelofabri · 2016-01-15T06:33:45Z

💯

scottrhoyt · 2016-01-15T15:45:24Z

@jpsim, will do. The options for what show up are configurable here. For instance, I just changed the behavior to only post the header:

If that still seems to noisy we can disable it altogether and just stick with the commit status and PR checks.

jpsim reviewed Jan 8, 2016
View reviewed changes

norio-nomura reviewed Jan 9, 2016
View reviewed changes

jpsim reviewed Jan 11, 2016
View reviewed changes

norio-nomura reviewed Jan 12, 2016
View reviewed changes

marcelofabri and others added 10 commits January 14, 2016 20:24

FunctionBodyLengthRule should not count comment lines

04b2510

Adding entry to CHANGELOG.md

b930857

Fix identation

57cf38a

Fix identation

483fbc0

Address performance issues

07c0766

Fix comparision when startLine or endLine is nil

08ad3d3

Refactoring

72f4014

Caching

ea33640

Clearing syntaxKindsByLinesCache

811cfd1

Handling multiline comments and correcting line count on violation re…

579872c

…ason

Updating reason message

12eb3fe

Changing reason message

75a4f2f

marcelofabri mentioned this pull request Jan 15, 2016

TypeBodyLengthRule should not count comment and whitespace lines #369

Closed

jpsim added a commit that referenced this pull request Jan 15, 2016

Merge pull request #330 from marcelofabri/function-length-ignore-comm…

5103ebd

…ents FunctionBodyLengthRule should not count comment lines

jpsim merged commit 5103ebd into realm:master Jan 15, 2016

marcelofabri mentioned this pull request Feb 12, 2016

[01/03/2016] O mundo é mais que seu umbigo CocoaHeadsBrasil/equinociOS#13

Closed

marcelofabri mentioned this pull request Apr 5, 2017

Ignore comment lines in file_length rule #1409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FunctionBodyLengthRule should not count comment lines #330

FunctionBodyLengthRule should not count comment lines #330

marcelofabri commented Jan 8, 2016

jpsim Jan 8, 2016

jpsim commented Jan 9, 2016

marcelofabri commented Jan 9, 2016

marcelofabri commented Jan 9, 2016

norio-nomura commented Jan 9, 2016

norio-nomura Jan 9, 2016

marcelofabri Jan 11, 2016

jpsim Jan 11, 2016

jpsim commented Jan 9, 2016

marcelofabri commented Jan 11, 2016

jpsim commented Jan 11, 2016

jpsim commented Jan 11, 2016

norio-nomura commented Jan 11, 2016

jpsim Jan 11, 2016

marcelofabri commented Jan 12, 2016

norio-nomura Jan 12, 2016

marcelofabri Jan 12, 2016

marcelofabri Jan 12, 2016

marcelofabri Jan 12, 2016

norio-nomura Jan 13, 2016

jpsim Jan 13, 2016

marcelofabri Jan 13, 2016

jpsim commented Jan 15, 2016

norio-nomura commented Jan 15, 2016

codecov-io commented Jan 15, 2016

marcelofabri commented Jan 15, 2016

jpsim commented Jan 15, 2016

jpsim commented Jan 15, 2016

marcelofabri commented Jan 15, 2016

jpsim commented Jan 15, 2016

norio-nomura commented Jan 15, 2016

marcelofabri commented Jan 15, 2016

scottrhoyt commented Jan 15, 2016

FunctionBodyLengthRule should not count comment lines #330

FunctionBodyLengthRule should not count comment lines #330

Conversation

marcelofabri commented Jan 8, 2016

Choose a reason for hiding this comment

jpsim commented Jan 9, 2016

marcelofabri commented Jan 9, 2016

marcelofabri commented Jan 9, 2016

norio-nomura commented Jan 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpsim commented Jan 9, 2016

marcelofabri commented Jan 11, 2016

jpsim commented Jan 11, 2016

jpsim commented Jan 11, 2016

norio-nomura commented Jan 11, 2016

Choose a reason for hiding this comment

marcelofabri commented Jan 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpsim commented Jan 15, 2016

norio-nomura commented Jan 15, 2016

codecov-io commented Jan 15, 2016

Current coverage is 89.58%

marcelofabri commented Jan 15, 2016

jpsim commented Jan 15, 2016

jpsim commented Jan 15, 2016

marcelofabri commented Jan 15, 2016

jpsim commented Jan 15, 2016

norio-nomura commented Jan 15, 2016

marcelofabri commented Jan 15, 2016

scottrhoyt commented Jan 15, 2016

Current coverage is `89.58%`