Get Microsoft.Spark and Microsoft.Spark.Worker assembly version information #715

suhsteve · 2020-10-06T23:39:45Z

Method added that returns a DataFrame that contains the current Microsoft.Spark assembly version running on the Driver and tries to make a best effort attempt in determining the assembly version of the Microsoft.Spark.Worker.

There is no guarantee that a Spark Executor will be run on all the nodes in a cluster. To increase the likelyhood, the spark conf spark.executor.instances and the numPartitions (parameter to GetAssemblyInfo(...) should be a adjusted to a reasonable number relative to the number of nodes in the Spark cluster.

spark.GetAssemblyInfo().Show(20, 0);

+----------------------+---------------+--------+
|AssemblyName          |AssemblyVersion|HostName|
+----------------------+---------------+--------+
|Microsoft.Spark       |0.12.1.0       |hostname|
|Microsoft.Spark.Worker|0.12.1.0       |hostname|
+----------------------+---------------+--------+

Building Microsoft.Spark.Worker with an updated version and rerunning the above yields

+----------------------+---------------+--------+
|AssemblyName          |AssemblyVersion|HostName|
+----------------------+---------------+--------+
|Microsoft.Spark       |0.12.1.0       |hostname|
|Microsoft.Spark.Worker|0.13.1.0       |hostname|
+----------------------+---------------+--------+

Fixes #713

suhsteve · 2020-10-06T23:41:48Z

src/csharp/Microsoft.Spark/Sql/SparkSession.cs

+
+            string tempColName = "WorkerVersionInfo";
+            DataFrame workerInfoTempDf = df
+                .Repartition(1000)


the 1000 is too big for local settings. Is there some spark setting we can leverage to set this value ? Or what do you think are safe defaults ? we can also pass this in as a parameter and have a safe default of 10 or something ?

src/csharp/Microsoft.Spark/Sql/SparkSession.cs

Niharikadutta · 2020-10-07T00:13:02Z

src/csharp/Microsoft.Spark/Sql/SparkSession.cs

+        /// <returns>
+        /// A <see cref="DataFrame"/> containing the <see cref="VersionSensor.VersionInfo"/>
+        /// </returns>
+        public DataFrame Version(int numPartitions = 10)


Just curious, why have we chosen numPartitions as 10 here?

Any number is fine, just some safe default to run for local / small clusters. Do you have a recommendation on what value we can use ?

No 10 is fine, I was just wondering if there was a specific reason.

Do we want to add a test covering this since it is a public API?

Also I am wondering if asking for numPartitions is really necessary for finding the version from the user perspective. Would having it as a constant inside the function pose a problem?

What would be the pros and cons of hardcoding the numbers instead ?

Pros I think would be usability, since it might be hard to intuitively understand what number of partitions a user should enter and what it has to do with getting version information. Cons, I guess include not being able to trigger every worker node and if some nodes have a different worker version on them then that would go uncaught. But that could happen even with taking numPartitions as an argument so not sure what the benefit there is, unless we document it in more depth so as to advise the user of the importance of estimating a 'correct' numPartitions value, maybe explain in detail what best effort means etc. Thoughts @imback82 ?

src/csharp/Microsoft.Spark/Sql/SparkSession.cs

src/csharp/Microsoft.Spark/Utils/VersionSensor.cs

suhsteve · 2020-10-07T01:49:50Z

src/csharp/Microsoft.Spark/Sql/SparkSession.cs

+        /// <returns>
+        /// A <see cref="DataFrame"/> containing the <see cref="VersionSensor.VersionInfo"/>
+        /// </returns>
+        public DataFrame Version(int numPartitions = 10)


Maybe we should name this AssemblyVersion/AssemblyVersionInfo/VersionInfo or something unique to dotnet just in case Spark ever adds a method called Version ?

imback82 · 2020-10-07T15:51:31Z

src/csharp/Microsoft.Spark/Sql/SparkSession.cs

+        /// <returns>
+        /// A <see cref="DataFrame"/> containing the <see cref="VersionSensor.VersionInfo"/>
+        /// </returns>
+        public DataFrame Version(int numPartitions = 10)


How about making this an extension method under Experimental namespace?

Keep in same Microsoft.Spark project just a different namespace, right ?

Moved to Microsoft.Spark.Experimental.Sql

imback82 · 2020-10-07T15:52:06Z

src/csharp/Microsoft.Spark/Sql/SparkSession.cs

+        /// <returns>
+        /// A <see cref="DataFrame"/> containing the <see cref="VersionSensor.VersionInfo"/>
+        /// </returns>
+        public DataFrame Version(int numPartitions = 10)


How about the assembly version on the driver?

Which assembly version on the driver ? We are getting the Microsoft.Spark assembly version on the driver and the Microsoft.Spark.Worker version on whichever machines the spark executors are spinning up on.

suhsteve · 2020-10-07T22:19:25Z

src/csharp/Microsoft.Spark/Utils/VersionSensor.cs

+
+            internal string AssemblyName { get; set; }
+            internal string AssemblyVersion { get; set; }
+            internal string HostName { get; set; }


We can add BuildDate (by getting assembly file creation date) if we think it'd be useful/nice to have.

We can iterate on this as we get feedbacks

src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/SparkSessionTests.cs

src/csharp/Microsoft.Spark/Utils/VersionSensor.cs

imback82 · 2020-10-08T05:00:58Z

src/csharp/Microsoft.Spark/Utils/VersionSensor.cs

+
+            internal string AssemblyName { get; set; }
+            internal string AssemblyVersion { get; set; }
+            internal string HostName { get; set; }


We can iterate on this as we get feedbacks

src/csharp/Microsoft.Spark/Experimental/Sql/SparkSessionExtensions.cs

src/csharp/Microsoft.Spark/Utils/VersionSensor.cs

src/csharp/Microsoft.Spark/Experimental/Sql/SparkSessionExtensions.cs

src/csharp/Microsoft.Spark/Experimental/Utils/VersionSensor.cs

imback82

LGTM, thanks @suhsteve!

imback82 · 2020-10-08T20:47:48Z

src/csharp/Microsoft.Spark/Experimental/Sql/SparkSessionExtensions.cs

+        /// <returns>
+        /// A <see cref="DataFrame"/> containing the <see cref="AssemblyInfoProvider.AssemblyInfo"/>
+        /// </returns>
+        public static DataFrame Version(this SparkSession session, int numPartitions = 10)


Oh, I missed this. Should we also rename this?

maybe GetAssemblyInfo?

suhsteve · 2020-10-08T21:20:25Z

src/csharp/Microsoft.Spark/Experimental/Utils/AssemblyInfoProvider.cs

+
+        internal class AssemblyInfo
+        {
+            internal static readonly StructType s_schema = new StructType(


should we make this lazy ?

Good point. Can you just move this and ToGenericRow to GetAssemblyInfo? I will probably move this to non-experimental in my next PR since this class will be more general after moving it out.

moved to SparkSessionExtensions

initial commit

54ce888

suhsteve commented Oct 6, 2020

View reviewed changes

suhsteve added 3 commits October 6, 2020 16:51

pass in partitions as parameter

9412db8

Merge branch 'master' into get_version

4ae7663

cleanup

68d24eb

suhsteve self-assigned this Oct 6, 2020

suhsteve added the enhancement New feature or request label Oct 6, 2020

suhsteve added this to the 1.0.0 milestone Oct 6, 2020

suhsteve requested review from Niharikadutta, rapoth and imback82 October 6, 2020 23:59

Niharikadutta reviewed Oct 7, 2020

View reviewed changes

src/csharp/Microsoft.Spark/Sql/SparkSession.cs Outdated Show resolved Hide resolved

Niharikadutta reviewed Oct 7, 2020

View reviewed changes

add param description

d3e8d98

Niharikadutta reviewed Oct 7, 2020

View reviewed changes

src/csharp/Microsoft.Spark/Sql/SparkSession.cs Outdated Show resolved Hide resolved

Niharikadutta reviewed Oct 7, 2020

View reviewed changes

src/csharp/Microsoft.Spark/Utils/VersionSensor.cs Outdated Show resolved Hide resolved

PR comments.

7013b1a

suhsteve commented Oct 7, 2020

View reviewed changes

suhsteve added 2 commits October 6, 2020 18:57

clean up test

84917dc

cleanup

33460ec

imback82 reviewed Oct 7, 2020

View reviewed changes

move to Experimental namespace.

33ccccb

suhsteve commented Oct 7, 2020

View reviewed changes

Merge branch 'master' into get_version

d427423

imback82 reviewed Oct 8, 2020

View reviewed changes

suhsteve added 3 commits October 7, 2020 22:46

PR comments.

7a3b2f0

Merge branch 'get_version' of github.com:suhsteve/spark into get_version

8b9d732

typo

d7fdc1d

suhsteve requested review from Niharikadutta and imback82 October 8, 2020 18:46

Merge branch 'master' into get_version

267cc11

imback82 mentioned this pull request Oct 8, 2020

Update version check logic on the worker #718

Merged

imback82 reviewed Oct 8, 2020

View reviewed changes

suhsteve added 3 commits October 8, 2020 12:34

PR comments

341d603

Merge branch 'get_version' of github.com:suhsteve/spark into get_version

02ff4be

Merge branch 'master' into get_version

050ce15

imback82 previously approved these changes Oct 8, 2020

View reviewed changes

imback82 reviewed Oct 8, 2020

View reviewed changes

PR comments

f0ec8db

suhsteve dismissed imback82’s stale review via f0ec8db October 8, 2020 21:19

suhsteve commented Oct 8, 2020

View reviewed changes

suhsteve added 5 commits October 8, 2020 14:52

refactor

0b465a2

Move to SparkSessionExtensions

a6b089d

cleanup usings

dd057a1

var

c7e4131

update

9de8f4a

imback82 approved these changes Oct 8, 2020

View reviewed changes

imback82 merged commit e384bf0 into dotnet:master Oct 8, 2020

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get Microsoft.Spark and Microsoft.Spark.Worker assembly version information #715

Get Microsoft.Spark and Microsoft.Spark.Worker assembly version information #715

suhsteve commented Oct 6, 2020 •

edited

Loading

suhsteve Oct 6, 2020

Niharikadutta Oct 7, 2020

suhsteve Oct 7, 2020

Niharikadutta Oct 7, 2020

Niharikadutta Oct 7, 2020

Niharikadutta Oct 7, 2020

suhsteve Oct 7, 2020

Niharikadutta Oct 7, 2020

suhsteve Oct 7, 2020 •

edited

Loading

imback82 Oct 7, 2020

suhsteve Oct 7, 2020

suhsteve Oct 7, 2020

imback82 Oct 7, 2020

suhsteve Oct 7, 2020

suhsteve Oct 7, 2020 •

edited

Loading

imback82 Oct 8, 2020

imback82 Oct 8, 2020

imback82 left a comment

imback82 Oct 8, 2020

imback82 Oct 8, 2020

suhsteve Oct 8, 2020

suhsteve Oct 8, 2020

imback82 Oct 8, 2020

suhsteve Oct 8, 2020

Get Microsoft.Spark and Microsoft.Spark.Worker assembly version information #715

Get Microsoft.Spark and Microsoft.Spark.Worker assembly version information #715

Conversation

suhsteve commented Oct 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suhsteve Oct 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suhsteve Oct 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imback82 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suhsteve commented Oct 6, 2020 •

edited

Loading

suhsteve Oct 7, 2020 •

edited

Loading

suhsteve Oct 7, 2020 •

edited

Loading