Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FunctionBodyLengthRule should not count comment lines #330

Merged
merged 12 commits into from
Jan 15, 2016
Merged

FunctionBodyLengthRule should not count comment lines #330

merged 12 commits into from
Jan 15, 2016

Conversation

marcelofabri
Copy link
Collaborator

Fixes #258.

let kindsInRange = syntax.tokens.filter { token in
let tokenLine = contents.lineAndCharacterForByteOffset(token.offset)
return tokenLine?.line == line.index
}.map({ $0.type }).flatMap(SyntaxKind.init)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is off here

@jpsim
Copy link
Collaborator

jpsim commented Jan 9, 2016

Thanks for doing this @marcelofabri, this makes the rule much more useful!

I was a bit worried about the impact this would have on performance, and turns out it's a fairly large one 😬.

As a reference point, the testSwiftLintLints integration test takes 2.817s to run on the master branch right now.

With this PR, it takes 46.092s 😱.

By changing the syntaxKindsByLine function to this, that goes down to 6.815s:

public func syntaxKindsByLine(startLine: Int? = nil,
                              endLine: Int? = nil) -> [(Int, [SyntaxKind])] {
    let contents = self.contents as NSString
    let linesAndKinds = syntaxMap.tokens.map { token -> (Int, SyntaxKind) in
        let tokenLine = contents.lineAndCharacterForByteOffset(token.offset)
        return (tokenLine!.line, SyntaxKind(rawValue: token.type)!)
    }.filter { line, _ in
        return line >= startLine && line <= endLine
    }
    var results = [Int: [SyntaxKind]]()
    for lineAndKind in linesAndKinds {
        results[lineAndKind.0] = (results[lineAndKind.0] ?? []) + [lineAndKind.1]
    }
    return Array(zip(results.keys, results.values))
}

However, I still think that's too much of a performance regression, especially for larger projects.

One thing we could do to minimize the performance impact here would be to only count the number of comment-only lines after confirming that the function exceeds the maximum line length when accounting for all lines, which is much faster. For example, in FunctionBodyLengthRule.validateFile(...):

where endLine - startLine > limit && lineCount(file, startLine: startLine, endLine: endLine) > limit {

Actually we could move that to its own function:

...
where exceedsLineCountExcludingComments(file, startLine, endLine, limit)
...
private func exceedsLineCountExcludingComments(file: File, _ start: Int, _ end: Int,
                                               _ limit: Int) -> Bool {
    return end - start > limit && lineCount(file, startLine: start, endLine: end) > limit
}

In this case, SwiftLint goes back to linting in 2.866s, although that's not exactly a great benchmark now because SwiftLint's functions never exceed 40 lines, so the benchmark doesn't hit the slow path.

Could you please apply these changes to help reduce the performance impact? If you find different ways to achieve similar performance gains, I'm certainly open to that too!

@marcelofabri
Copy link
Collaborator Author

@jpsim I think that if we called syntaxKindsByLine only once (with the full file) we'd have a performance gain. However, I don't see an easy way now to do it. We could hack it by adding an associated object on the File object, but we probably should do something better.

Anyway, I have committed the changes you proposed. I was worried about performance too 😬

I know that it deserves another issue, but we should think about adding performance tests to the project, even linting other projects.

@marcelofabri
Copy link
Collaborator Author

By using the associated objects approach, I was able to make it run in ~ 3.401s (without the exceedsLineCountExcludingComments shortcut), against 2.744s with the shortcut.

I believe it could be better if I could drop the conversion needed to store/read the associated object:

private var syntaxKindsByLines: [(Int, [SyntaxKind])] {
    if let lines = objc_getAssociatedObject(self, &keyLines) as? [Int],
        syntaxKinds = objc_getAssociatedObject(self, &keySyntaxKinds) as? [[String]] {
        return Array(zip(lines, syntaxKinds.map { $0.flatMap(SyntaxKind.init) }))
    }

    let newValue = syntaxKindsByLine()
    let lines = newValue.map { $0.0 }
    let kinds = newValue.map { $0.1.map { $0.rawValue } }
    objc_setAssociatedObject(self, &keyLines, lines,
        .OBJC_ASSOCIATION_RETAIN_NONATOMIC)
    objc_setAssociatedObject(self, &keySyntaxKinds, kinds,
        .OBJC_ASSOCIATION_RETAIN_NONATOMIC)

    return newValue
}

However, this made the tests actually slower with the shortcut (because I called syntaxKindsByLine in validateFile). If I move it to an inner place (i.e. numberOfCommentOnlyLines), it goes back to ~2.8s, but the version without the shortcut is a bit slower (~3.6s).

@norio-nomura
Copy link
Collaborator

Calling counting line can be reduced by changing position of parameter enumeration.
from:

// enumerate parameters
for parameter in parameters.reverse() {
    // count lines
    let lineCountExcludingComments =
    // check violation
    if lineCountExcludingComments > parameter.value {
        
    }
}

to:

// count lines
let lineCountExcludingComments =
// enumerate parameters
for parameter in parameters.reverse() {
    // check violation
    if LineCountExcludingComments > parameter.value {
        
    }
}

let contents = self.contents as NSString
let kindsWithLines = syntaxMap.tokens.map { token -> (Int, SyntaxKind) in
let tokenLine = contents.lineAndCharacterForByteOffset(token.offset)
return (tokenLine!.line, SyntaxKind(rawValue: token.type)!)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyntaxKinds of whole the file are created here on every checking functions in the file.
.filter should be applied before creation for reducing them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because currently .filter needs the line. If we filtered first, lineAndCharacterForByteOffset would be called twice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcelofabri it might be better to avoid eagerly creating the SyntaxKinds by mapping twice:

let kindsWithLines = syntaxMap.tokens.map { token -> (Int, SyntaxToken) in
    let tokenLine = contents.lineAndCharacterForByteOffset(token.offset)
    return (tokenLine!.line, token)
}.filter { line, token in
    return line >= (startLine ?? 0) && line <= (endLine ?? Int.max)
}.map { (line, token) -> (Int, SyntaxKind) in
    return (line, SyntaxKind(rawValue: token.type)!)
}

@jpsim
Copy link
Collaborator

jpsim commented Jan 9, 2016

I know that it deserves another issue, but we should think about adding performance tests to the project, even linting other projects.

How do you recommend we do that? XCTest supports performance tests, but those only really matter when running on a given device. What we really want is to compare with running on the previous version of the test running on the exact same device, although I don't think there's any elegant way to set that up.

@marcelofabri
Copy link
Collaborator Author

@jpsim I haven't thought about it a lot, but I was thinking about XCTest performance tests. I'm still not sure why it wouldn't work. Is it because Travis won't have a baseline to compare to?

@jpsim
Copy link
Collaborator

jpsim commented Jan 11, 2016

@jpsim I haven't thought about it a lot, but I was thinking about XCTest performance tests. I'm still not sure why it wouldn't work. Is it because Travis won't have a baseline to compare to?

XCTest performance tests compare the test output with a previously recorded baseline that's associated with a unique device identifier (in this case, the Mac on which these tests are running). This wouldn't work on Travis because they use VMs to run tests, and they're not guaranteed to have the same unique device ID. Even if/when they did, the fact that these are running on shared hardware means that we can't really trust a single run of performance tests.

I'm thinking a different way to measure performance in PRs would be to install the latest version of SwiftLint through homebrew and compare the total time to lint SwiftLint from the latest release and the current PR. Ideally Travis would be able to post back a comment to the PR with the before & after lint times for us to review. Even better would be to have a way to run SwiftLint in a benchmarking mode that logs out the time spent in each rule.

@jpsim
Copy link
Collaborator

jpsim commented Jan 11, 2016

@marcelofabri this is looking good. What are the next steps (if any) you want to take with this PR?

@norio-nomura
Copy link
Collaborator

Sorry, my saying too much about performance. 🙇
I also think this is looking good.

@@ -90,6 +90,22 @@ extension File {
}
}

internal func syntaxKindsByLine(startLine: Int? = nil,
endLine: Int? = nil) -> [(Int, [SyntaxKind])] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is off here

@marcelofabri
Copy link
Collaborator Author

@jpsim @norio-nomura I've added a new property on File+Cache to cache this and I'm pretty happy with the performance now.

Could you guys please take another look and share your thoughts?

let longerFunctionBodyWithComments = "func abc() {" +
Repeat(count: 40, repeatedValue: " // this is a comment\n").joinWithSeparator("") +
"}\n"
XCTAssertEqual(violations(longerFunctionBodyWithComments), [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I had overlooked that the tests is not enough.
Maybe it should pass following tests:

        let longFunctionBodyWithComments = "func abc() {" +
            Repeat(count: 40, repeatedValue: "\n").joinWithSeparator("") +
            "// comment only line should be ignored.\n" +
            "}\n"
        XCTAssertEqual(violations(longFunctionBodyWithComments), [])

        let longerFunctionBodyWithComments = "func abc() {" +
            Repeat(count: 41, repeatedValue: "\n").joinWithSeparator("") +
            "// comment only line should be ignored.\n" +
            "}\n"
        XCTAssertEqual(violations(longerFunctionBodyWithComments), [StyleViolation(
            ruleDescription: FunctionBodyLengthRule.description,
            location: Location(file: nil, line: 1, character: 1),
            reason: "Function body should span 40 lines or less: currently spans 41 lines")])

        let longFunctionBodyWithMultilineComments = "func abc() {" +
            Repeat(count: 40, repeatedValue: "\n").joinWithSeparator("") +
            "/* multi line comment only line should be ignored.\n*/\n" +
            "}\n"
        XCTAssertEqual(violations(longFunctionBodyWithMultilineComments), [])

        let longerFunctionBodyWithMultilineComments = "func abc() {" +
            Repeat(count: 41, repeatedValue: "\n").joinWithSeparator("") +
            "/* multi line comment only line should be ignored.\n*/\n" +
            "}\n"
        XCTAssertEqual(violations(longerFunctionBodyWithMultilineComments), [StyleViolation(
            ruleDescription: FunctionBodyLengthRule.description,
            location: Location(file: nil, line: 1, character: 1),
            reason: "Function body should span 40 lines or less: currently spans 41 lines")])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't longerFunctionBodyWithCommentsand longerFunctionBodyWithMultilineComments not trigger any violations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget that, I didn't realize that the comments were outside Repeat. My bad!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That tests actually caught a bug: the number of lines that the function spans in the reason. Should we update it to subtract the comment only lines? This might be confusing for someone who isn't very familiar with the rule.

Also, if we follow that idea, any ideas on how to return the "fixed" line count on exceedsLineCountExcludingComments? I was thinking about returning a tuple, but I'm not sure if that's the best way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be confusing for someone who isn't very familiar with the rule.

I think it would be better reason explaining that comment only lines are ignored.

That tests actually caught a bug

As I tested, SourceKitten returns one token for one multiline comment block. But your code expects that every tokens are in one line. So, counting ignored comment only lines is not correct on multiline comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fill in missing lines in syntaxKindsByLines, which will also help avoid counting whitespace-only lines in the function body line count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've committed ec2be78 which should address these issues. As @jpsim (implicitly) suggested, the rule now ignores whitespace-only lines.

Also, I had to separate the tests to another file (FunctionBodyLengthRuleTests), because the integration tests were failing.

@jpsim
Copy link
Collaborator

jpsim commented Jan 15, 2016

@marcelofabri @norio-nomura this PR seems fine to me now, other than a rebase, what else needs to be done?

@norio-nomura
Copy link
Collaborator

Maybe making the reason message clearer by adding explains that comment and whitespace only lines are excluded.
Everyone does not know how the rules counting the lines.

@codecov-io
Copy link

Current coverage is 89.58%

Merging #330 into master will increase coverage by +0.32% as of a9fe99a

@@            master    #330   diff @@
======================================
  Files           57      57       
  Stmts         1481    1527    +46
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           1322    1368    +46
  Partial          0       0       
  Missed         159     159       

Review entire Coverage Diff as of a9fe99a


Uncovered Suggestions

  1. +0.85% via ...eMinLengthRule.swift#16...28
  2. +0.79% via ...Yaml+SwiftLint.swift#49...60
  3. +0.53% via .../Configuration.swift#28...35
  4. See 7 more...

Powered by Codecov. Updated on successful CI builds.

@marcelofabri
Copy link
Collaborator Author

I've updated the reason message to

Function body should span 40 lines or less: currently spans 41 lines (already ignoring comment and whitespace only ones)

What do you guys think?

@jpsim
Copy link
Collaborator

jpsim commented Jan 15, 2016

Maybe this instead?

Function body should span 40 lines or less excluding comments and whitespace: currently spans 41 lines.

@jpsim
Copy link
Collaborator

jpsim commented Jan 15, 2016

@scottrhoyt I think we should disable codecov's automated comments and just have it as a GitHub status check instead. The automated comments are a bit too noisy for my taste.

@marcelofabri
Copy link
Collaborator Author

@jpsim Much better! Already updated on 75a4f2f.

@jpsim
Copy link
Collaborator

jpsim commented Jan 15, 2016

👏 thanks!!!

jpsim added a commit that referenced this pull request Jan 15, 2016
…ents

FunctionBodyLengthRule should not count comment lines
@jpsim jpsim merged commit 5103ebd into realm:master Jan 15, 2016
@norio-nomura
Copy link
Collaborator

🎉

@marcelofabri
Copy link
Collaborator Author

💯

@scottrhoyt
Copy link
Contributor

@jpsim, will do. The options for what show up are configurable here. For instance, I just changed the behavior to only post the header:
screen shot 2016-01-15 at 7 52 28 am
If that still seems to noisy we can disable it altogether and just stick with the commit status and PR checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants