Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Metals support - Automatic SemanticDB enablement (RFC) #1546

Closed
lefou opened this issue Nov 5, 2021 · 16 comments · Fixed by #1977
Closed

Better Metals support - Automatic SemanticDB enablement (RFC) #1546

lefou opened this issue Nov 5, 2021 · 16 comments · Fixed by #1977
Labels
feedback wanted Additional feedback or testing is apreciated
Milestone

Comments

@lefou
Copy link
Member

lefou commented Nov 5, 2021

For a nice user experience Metals (a Scala Language Server which connects to Mill via BSP) needs also SemanticDB information, which can be generated by enabling the semanticdb-scalac-plugin for Scala modules.

We need to find a good integration and implement it.

Summary from PR #1536

Originally posted by @lefou in #1536 (comment)

Mill has various IDE support:

  • generates IDEA projects directly, supports all Languages and features, IDEA has
  • provides a BSP server, supports all BSP clients
    ** IDEA, which supports Languages Java and Scala
    ** Metals as Language Server used by many different editors, supports only Language Scala; requires SemanticDB generator

As SemanticDB generation isn't cheap in terms of processing time and may have unforseen side-effects, we only want to enable it if it is in the interest of the user, which means: only enable it when the user is using Metals.

Three options to handle SemanticDB:

a) leaf it to the user to enable it in their build, e.g. use ScalaMetalsSupport trait.

pros:

  • simple

cons:

  • too simple, bad user experience

b) have a dedicated compile target for Metals (we could detect it in the BSP initialize request or by env var)

pros:

  • only used when needed (by Metals)

cons:

  • compilation in IDE differs from commandline
  • customized toolchains need extra care

c) automatically enable SemanticDB in case the user uses Metals

pros:

  • code compilation is the same for IDE and commandline (although changes once after Metals was started the first time)

cons:

  • we need to be clever to not repeatedly flip the enablement when on commandline, so we need to make the enablement persistent
  • we need to be clever to not enable it when inappropriate, e.g. in CI
  • we need provide a way to disable it explicitly
  • we need to handle Scala versions unsupported by the SemanticDB version
  • invasive into existing targets: scalacOptions, scalacPluginIvyDeps, ...

How could we implement c):

i) depend on external info, e.g. existence of .metals directory or some file in it.

pros:

  • easy to implement

cons:

  • unexpected for Mill users, which assume the build.sc (and the actual mill args and cache) as single source of truth
  • hard to disable, once the file exists

ii) persist the fact that we were once executed from Metals

pros:

  • easy to reset with mill clean or mill clean <persistent-target>

cons:

  • more complicated to implement
  • probably hard to understand by user
  • we will need some persistent targets to keep the state (combined with some input task to capture the env)
@lefou lefou added the feedback wanted Additional feedback or testing is apreciated label Nov 5, 2021
@lefou lefou changed the title Better Metals support - Automatic SemanticDB enablement Better Metals support - Automatic SemanticDB enablement (RFC) Nov 9, 2021
@ckipp01
Copy link
Contributor

ckipp01 commented Nov 22, 2021

a) leaf it to the user to enable it in their build, e.g. use ScalaMetalsSupport trait.

I think an extra con to this one is that if you have a team where some users are using IntelliJ and some are using Metals we don't want to penalize IntelliJ users with Metals specific stuff. So right off the batt ScalaMetalsSupport doesn't work great in those scenarios.

Between b and c, I'm not fully sure what fits Mill better, so I'll only speak from the Metals perspective, but I'm always in favor of there being as little as friction as possible for the user. Ideally, they shouldn't even notice that anything different is happening. I've been playing around a bit with Metals just being able to generate the bsp config for Mill automatically in scalameta/metals@cb3e60b and it's not a ton of work, and while doing it, it got me thinking that on the Metals side we do have a lot of control over what happens when a user chooses to use a specific BSP server. For example in both sbt and Bloop we are adding plugins to the build to either ensure we can export to Bloop or ensure that stuff is set correctly in sbt for semanticdb production. Just throwing out ideas here, but what if we ran mill.bsp.BSP/install in a specific way that caused a specific flag to be set in the .bsp/mill.json that ensured that semanticdb was produced when we connected to mill via BSP? The con to this would be that then it wouldn't b set if a user manually did mill mill.bsp.BSP/install instead of letting Metals do it.

I think on the Metals side we are totally open to whatever fits Mill best and whatever causes the least friction for users and allows for the actual build to not need to change for teams that are mixed editor teams.

@lefou
Copy link
Member Author

lefou commented Nov 23, 2021

The proper (read Mill-idiomatic) way is to just have dedicated targets/tasks to compile for Metals. That would be the cleanest and most robust solution. It may include proxying the whole modules or writing to a custom out directory, I can't tell right now. I think we should start a POC in that direction and refine later, when we got more insights and feedback.

@lolgab
Copy link
Member

lolgab commented Apr 22, 2022

The proper (read Mill-idiomatic) way is to just have dedicated targets/tasks to compile for Metals.

This would mean compiling the code twice which is a waste of resources and against one of the benefits of using BSP.

@ckipp01
Copy link
Contributor

ckipp01 commented Jul 28, 2022

So thinking about this some more. When a BSP initialize request comes in, could we just detect the displayName and if it's Metals, then enable semanticdb?

@lefou
Copy link
Member Author

lefou commented Jul 28, 2022

So thinking about this some more. When a BSP initialize request comes in, could we just detect the displayName and if it's Metals, then enable semanticdb?

It's not about detection of Metals. This one is actually easy, and we do it e.g. when we calculate the sources of the mill-build module. (IJ does only support directories).

It's about tracking changes and keeping caches up-to-date and reproducible. Whenever we detect, that we want to enable SemanticDB and some extra scalac options, we need to also rebuild. Same, when we decide to disable it. That's why I proposed cii), in which we detect Metals and persist that fact (e.g. in a dot-file or a persitent target). As long as we find this fact true, we enable SemanticDB.

@lefou
Copy link
Member Author

lefou commented Jul 28, 2022

It would be very nice, if we could produce semanticDB data in a separate process, which is less resource intensive as full compilation. From a tooling perspective this is the most wanted setup. Maybe it just works and we can stop after a specific compiler phase and get the semanticDB data at a significant shorter compile time. If such a setup is possible, it would be my go-to implementation.

@lefou
Copy link
Member Author

lefou commented Jul 29, 2022

I made some experiments. Turns out, starting the scala compiler with semanticDB enabled but stopping it after the semanticdb-typer phase is actually rather fast, compared to a full compile. I don't have numbers, but Mill returned much quicker. So, my idea is to generate SemanticDB data in a dedicated target and only invoke it when BSP client requests semanticDB data.

@lefou
Copy link
Member Author

lefou commented Jul 29, 2022

I pushed my current experiment to PR #1977. I won't continue it for at least a week...

@ckipp01
Copy link
Contributor

ckipp01 commented Jul 29, 2022

It would be very nice, if we could produce semanticDB data in a separate process, which is less resource intensive as full compilation.

If I'm understanding the code right:

 case m: SemanticDbScalaModule => T.task {
          m.compile()
          // we also generate semantic db data
          m.semanticDbData()
          ()
        }

I know caching is a concern here, and so is speed, but doing it this way will make it even slower won't it? We're doing a full compile here and then also doing a semanticDbData after that, which is a partial compile. This will end up being slower than just doing a normal compile with semantidb enabled.

I need to dig into the Metals side a bit more, but testing out the pr, it still doesn't recognize that semanticDB is enabled and you can see this for example via BSP buildTarget/scalacOptions. In a minimal project I tested on this is what is returned:

   {
      "target": {
        "uri": "file:///Users/ckipp/Documents/scala-workspace/minimal/minimal?id\u003dminimal"
      },
      "options": [
        "-Xplugin:/Users/ckipp/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/com/lihaoyi/acyclic_2.13.8/0.3.3/acyclic_2.13.8-0.3.3.jar"
      ],
      "classpath": [
        "file:///Users/ckipp/Documents/scala-workspace/minimal/minimal/resources",
        "file:///Users/ckipp/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/com/lihaoyi/acyclic_2.13.8/0.3.3/acyclic_2.13.8-0.3.3.jar",
        "file:///Users/ckipp/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar"
      ],
      "classDirectory": "file:///Users/ckipp/Documents/scala-workspace/minimal/out/minimal/compile.dest/classes"
    }

Compare this to the same thing with Bloop:

{
      "target": {
        "uri": "file:/Users/ckipp/Documents/scala-workspace/minimal/minimal/?id\u003dminimal"
      },
      "options": [
        "-P:semanticdb:sourceroot:/Users/ckipp/Documents/scala-workspace/minimal",
        "-Xplugin:/Users/ckipp/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/com/lihaoyi/acyclic_2.13.8/0.3.3/acyclic_2.13.8-0.3.3.jar",
        "-Yrangepos",
        "-P:semanticdb:failures:warning",
        "-P:semanticdb:synthetics:on",
        "-Xplugin-require:semanticdb",
        "-Xplugin:/Users/ckipp/Library/Caches/bloop/semanticdb/org.scalameta.semanticdb-scalac_2.13.8.4.5.9/semanticdb-scalac_2.13.8-4.5.9.jar"
      ],
      "classpath": [
        "file:///Users/ckipp/Documents/scala-workspace/minimal/.bloop/out/minimal/bloop-bsp-clients-classes/classes-Metals-69AMmR6cQMakFe1OX8uwKg\u003d\u003d/",
        "file:///Users/ckipp/Library/Caches/bloop/semanticdb/com.sourcegraph.semanticdb-javac.0.7.4/semanticdb-javac-0.7.4.jar",
        "file:///Users/ckipp/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/com/lihaoyi/acyclic_2.13.8/0.3.3/acyclic_2.13.8-0.3.3.jar",
        "file:///Users/ckipp/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar"
      ],
      "classDirectory": "file:///Users/ckipp/Documents/scala-workspace/minimal/.bloop/out/minimal/bloop-bsp-clients-classes/classes-Metals-69AMmR6cQMakFe1OX8uwKg\u003d\u003d/"
    }

I do see the new task being called...

[Trace - 00:28:00 pm] Received notification 'build/taskProgress'
Params: {
  "taskId": {
    "id": "-793638209"
  },
  "eventTime": 1659090480520,
  "message": "[5/18] minimal.scalaCompilerClasspath \u003e [47/48] mill.bsp.MillBuildServer.mill-build.semanticDbData ",
  "total": 18,
  "progress": 5,
  "unit": "minimal.scalaCompilerClasspath"
}


[Trace - 00:28:00 pm] Received notification 'build/showMessage'
Params: {
  "type": 4,
  "message": "effective scalac options: List(-Xplugin:/Users/ckipp/.ivy2/local/com.lihaoyi/mill-main-moduledefs_2.13/0.10.5-56-eca162/jars/mill-main-moduledefs_2.13.jar, -deprecation, -Xplugin:/Users/ckipp/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/scalameta/semanticdb-scalac_2.13.8/4.5.11/semanticdb-scalac_2.13.8-4.5.11.jar, -Yrangepos, -P:semanticdb:sourceroot:/Users/ckipp/Documents/scala-workspace/minimal, -Ystop-after:semanticdb-typer)"
}

Although there is also the message being given back to the user which I'm unsure about.

When this task runs, where does it spit out the semanticdb? I don't see a META-INF next to the example dir like I'd expect to and that's where tooling is going to look for it. I believe it will just look in the classDirectory and for mill-bsp this ends up being file:///Users/ckipp/Documents/scala-workspace/minimal/out/minimal/compile.dest/classes which doesn't seem to have semanticdb when being used with Metals.

@lefou
Copy link
Member Author

lefou commented Jul 29, 2022

@ckipp01 Thanks for this early feedback. As described in the PR, this is a not yet finished experiment whether this works at all. The task item "Use SemanticDB in BSP module" isn't checked yet, it's still an open TODO.

But regarding your first questions of "understanding". Compiling with SemanticDB is slower than without (around 30% for Scala 2 as states here: #1536 (comment)), so m.compile without any SemanticDB enablement is the fastest way to produce class files and compiler messages, it's also incremental if possible. The additional m.semanticDbData step is only producing semanticdb files and a lot faster than m.compile (See my timed mill output in the PR, #1977 (comment)). There is probably a small tradeoff, as the two steps together are probably slower than one single compile-run with the semanticdb plugin, but on the other side, we have a nice setup, which avoid re-compilation in mixed IDE/CLI usage and which don't affect the intended build process at all.

Ideally, Metals is only invoking the semanticDbData target when the data is needed, but I think there is no easy to use option in BSP to produce it on demand, so we have to do this at the same time when we compile the project, and combine both outputs (class files and semanticdb files) in the BSP compile request response.

There are for sure some subtle details we need to work out, e.g. the right place to apply additional compiler flags required to produce better message in the IDE.

@lefou
Copy link
Member Author

lefou commented Jul 29, 2022

I don't know zinc internals enough, but if zinc supports incremental compilation with the -Ystop-after option (and it look to me that way), then my newly invented target semanticDbData is incremental too. And if my first timed test is in any way representative, then we are within the 30% overhead window we should expect for compilations with the semanticDB plugin.

@ckipp01
Copy link
Contributor

ckipp01 commented Jul 29, 2022

As described in the PR, this is a not yet finished experiment whether this works at all.

Ha, sorry some of my comments were probably premature 😆 I just saw the pr and got real excited to try it.

as the two steps together are probably slower than one single compile-run with the semanticdb plugin

Yea, this is what I was referring to. I was just curious of the cost compared to just one compile with the plugin enabled. However, thinking more about it, you're totally right. The fact that this still enables the separation of producing semanticdb and also compilation while preserving cache for the cli is probably a win and pretty clever.

Ideally, Metals is only invoking the semanticDbData target when the data is needed, but I think there is no easy to use option in BSP to produce it on demand

Yea, I don't think there is even a good way to determine when it would be needed unless we do some sort of comparison on class files or something, but then it's just faster to probably reproduce. In theory every compilation will trigger it.

As you continue to iterate don't hesitate to ping me to test stuff with Metals or look into things. If this is the direction we go, we may need to change our doctor report a bit for Mill BSP since the response of scalacOptions won't have semanticdb stuff it in even though there is semanticdb getting produced. We'll have to detect that maybe another way so users don't get warned that it's not working.

@lolgab
Copy link
Member

lolgab commented Jul 29, 2022

Correct me if I'm wrong but I think that if you use -j 0 then m.semanticDbData() and m.compile() can run in parallel in

T.task {
  m.compile()
  // we also generate semantic db data
  m.semanticDbData()
  ()
}

@lefou
Copy link
Member Author

lefou commented Jul 29, 2022

Correct me if I'm wrong but I think that if you use -j 0 then m.semanticDbData() and m.compile() can run in parallel in

T.task {
  m.compile()
  // we also generate semantic db data
  m.semanticDbData()
  ()
}

Exactly. Yet, I think this concrete piece of code is suspected to change.

All named targets can be scheduled on different threads. At least, in the current parallel implementation.

@lefou
Copy link
Member Author

lefou commented Aug 10, 2022

One difficulty is, that BSP only want a single classes directory. And Metals is searching for semanticDB files in that directory. As we have two different targets to produce class files and semanticDB files, we result with two distinct output directories. As a consequence, we need to copy all files to a third location and use this as the output directory.

@lefou
Copy link
Member Author

lefou commented Aug 11, 2022

As you continue to iterate don't hesitate to ping me to test stuff with Metals or look into things. If this is the direction we go, we may need to change our doctor report a bit for Mill BSP since the response of scalacOptions won't have semanticdb stuff it in even though there is semanticdb getting produced. We'll have to detect that maybe another way so users don't get warned that it's not working.

@ckipp01 I updated my PR #1977 (comment) and would appreciate your feedback. I think we hit that predicted issue, as Metals "thinks" semanticDB isn't enabled.

@lefou lefou linked a pull request Aug 22, 2022 that will close this issue
5 tasks
@lefou lefou added this to the 0.10.6 milestone Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback wanted Additional feedback or testing is apreciated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants