Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use standard algebra types #523

Merged
merged 15 commits into from
Dec 2, 2016
Merged

Use standard algebra types #523

merged 15 commits into from
Dec 2, 2016

Conversation

johnynek
Copy link
Collaborator

@johnynek johnynek commented May 15, 2016

This is a first step to using standard scala algebra classes.

This depends on non/algebra (the base types Semigroup, Monoid, etc.. are now in typelevel/cats). There may be some changes, but the goal is to get a shared common set of basic mathematical typeclasses, and algebird will then migrate to just being about the implementations we have here (specifically the big-data/sketch algorithm examples).

Since Field is so seldom used, I propose we remove it here to smooth interop, and if someone needs it direct them to non/algebra (which will soon be typelevel/algebra). The problem is algebra Group.inverse is what we called negate so Field.inverse cannot be 1/x. The type signature is the same, so it is dangerous to leave it around.

/cc @non @avibryant

@jnievelt
Copy link
Contributor

I like the idea of extending where we can, but this naturally raises the question of what further steps are we planning? Should we just add some implicit bijections to map between the two?

@avibryant
Copy link
Contributor

I suspect the practical impact of changing Field would be very low, and this seems like a big enough change overall that it seems reasonable to make it breaking in minor ways. I'd vote for changing Field.

@johnynek
Copy link
Collaborator Author

I'm almost thinking we should drop Field and redirect users to use the algebra one if they want, but it basically never comes up. Scalding uses it in exactly 1 place (to divide matrices by scalars), but for that we could explicitly depend on algebra or scala.math.Numeric.

I think algebird should focus on data-structures as algebraic objects, mostly for big-data.

@jnievelt
Copy link
Contributor

Re: removing Field, I think it's a bit much to say it's replaceable with Numeric; I think similar arguments extend to everything else (e.g., Numeric defines plus, so maybe we can just use that instead of Semigroup).

I'm also a little concerned about dropping too much of the base framework from algebird and taking such a strong a dependency on a personal repo (no offense to Erik). I guess that's also an argument for using converters everywhere rather than inheritance.

@johnynek
Copy link
Collaborator Author

johnynek commented Jun 4, 2016

@jnievelt here is a branch that removes field

There are only 3 fields defined: Float, Double, Boolean. I'm fairly sure no one uses Boolean fields with division (you can just divide things by true to get the input again), and Float and Double are not even lawful.

In scalding there are two places we use field (I think) in the (almost unused?) matrix API to do division (which again, can be fixed by requiring Float or Double there).

About algebra: that is a project @avibryant @non @tixxit @larsrh and myself started some time ago. It will soon be moved under the typelevel org and the work there became cats-kernel: https://github.com/typelevel/cats/tree/master/kernel/src/main/scala/cats/kernel

In fact, as I look at this now, I realize that we can just depend on cats-kernel.

But the goal is that cats, spire and algebird can all share core types so you don't have all these walls between projects.

As for field, it just does not come up very much. I would be very surprised if there are more than 5 uses at Twitter (and I couldn't guess how many semigroups are there).

Since the name on Group inverse in cats-kernel (nee algebra) collides with what we were using in Field, to get simple subclass interop with cats-kernel (and hence spire, algebra and cats) we should either:

  1. make Field not extend Group
  2. remove Field
  3. make algebird Group not extend cats-kernel Group (this is a shame since groups are fairly common and we could not as cleanly share them).

I have pressed for, and gotten claims that cats-kernel (and algebra) will be very conservative on these core interfaces, and they do not export any instances by default (the instances are opt in).

I think the benefit of not siloing all these efforts is worth making some small changes to the project, so I strongly support us moving in this direction.

@@ -52,11 +52,9 @@ object UtilAlgebras {
implicit def futureMonoid[T: Monoid]: Monoid[Future[T]] = new ApplicativeMonoid[T, Future]
implicit def futureGroup[T: Group]: Group[Future[T]] = new ApplicativeGroup[T, Future]
implicit def futureRing[T: Ring]: Ring[Future[T]] = new ApplicativeRing[T, Future]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact future is not even a group because Throw/Failure has no inverse, same with Try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I assume that these are here for backwards compatibility?

(We do have algebra.ring.Semiring if you wanted to be more precise, although you might still not want to define exception * 0 as 0.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we should remove them. Added #548

@johnynek johnynek force-pushed the oscar/non-algebra branch from 910393f to 053fd81 Compare June 6, 2016 21:13
*/
override def additive: AMonoid[T] = this
override def empty: T = zero
override def combineAll(t: TraversableOnce[T]): T = sum(t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is defining zero now ? AdditiveMonoid?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Actually AdditiveMonoid is basically identical to our Monoid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between non-additive Monoid and additive Monoid?

Copy link
Collaborator Author

@johnynek johnynek Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiplication as "plus" and 1 as "zero" is a valid Monoid. Algebird has
just settled this by saying we always prefer additive so that Monoid can be
a subclass of group.

Not everyone agrees with that design. So in algebra, there is a Monoid that
has a combine method, and in the ring package, there are Additive and
Multiplicative monoids. The difference is only used in spire where the
syntax extensions are enabled. + needs an additive Semigroup. * needs a
multiplicative Semigroup.

To keep comparability with our source code, and interop with
monoids/semigroups from other packages, I think we need to do what I did
(extend Semigroup and AdditiveSemigroup).

On Thursday, June 9, 2016, Alex Levenson notifications@github.com wrote:

In algebird-core/src/main/scala/com/twitter/algebird/Monoid.scala
#523 (comment):

@@ -47,8 +48,14 @@ trait Monoid[@specialized(Int, Long, Float, Double) T] extends Semigroup[T] {
None
}
}

  • // Override this if there is a more efficient means to implement this
  • def sum(vs: TraversableOnce[T]): T = sumOption(vs).getOrElse(zero)
  • override def sum(vs: TraversableOnce[T]): T = sumOption(vs).getOrElse(zero)
  • /**
  • * These are from algebra.Monoid
  • */
  • override def additive: AMonoid[T] = this
  • override def empty: T = zero
  • override def combineAll(t: TraversableOnce[T]): T = sum(t)

What's the difference between non-additive Monoid and additive Monoid?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/twitter/algebird/pull/523/files/053fd81dafbd88982cad3b9b5c8839ab16ceb399#r66546774,
or mute the thread
https://github.com/notifications/unsubscribe/AAEJdkKu96bh2NZ25MRTFe_VCf1H-veFks5qKLRrgaJpZM4Ieuhu
.

P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, still don't understand. Is this just about naming?
I thought a monoid was:

  1. A set of elements, E
  2. A (closed) binary operator over the elements in E
  3. An identity element present in E

You can call the operator "plus" and the identity "zero" or you can call it "times" and "one" but that doesn't really mean anything in terms of how it works right? Or am I missing something and there's a functional difference between these two?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't missing anything. It is totally a naming thing.

On Thursday, June 9, 2016, Alex Levenson notifications@github.com wrote:

In algebird-core/src/main/scala/com/twitter/algebird/Monoid.scala
#523 (comment):

@@ -47,8 +48,14 @@ trait Monoid[@specialized(Int, Long, Float, Double) T] extends Semigroup[T] {
None
}
}

  • // Override this if there is a more efficient means to implement this
  • def sum(vs: TraversableOnce[T]): T = sumOption(vs).getOrElse(zero)
  • override def sum(vs: TraversableOnce[T]): T = sumOption(vs).getOrElse(zero)
  • /**
  • * These are from algebra.Monoid
  • */
  • override def additive: AMonoid[T] = this
  • override def empty: T = zero
  • override def combineAll(t: TraversableOnce[T]): T = sum(t)

sorry, still don't understand. Is this just about naming?
I thought a monoid was:

  1. A set of elements, E
  2. A (closed) binary operator over the elements in E
  3. An identity element present in E

You can call the operator "plus" and the identity "zero" or you can call
it "times" and "one" but that doesn't really mean anything in terms of how
it works right? Or am I missing something and there's a functional
difference between these two?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/twitter/algebird/pull/523/files/053fd81dafbd88982cad3b9b5c8839ab16ceb399#r66557027,
or mute the thread
https://github.com/notifications/unsubscribe/AAEJdhyBqldBPjEvwCIrlhP-IHQI5yJXks5qKNaxgaJpZM4Ieuhu
.

P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is a distinction that was important to us over in Spire-land, and which @johnynek agreed wouldn't be too difficult to encode here. I think the strategy of having Algebird's instances extend both the generic and additive traits is an elegant solution that preserves the previous semantics while aiding interop.

@ianoc
Copy link
Collaborator

ianoc commented Jun 8, 2016

Super bullish on this, small dependency. Looks like a small code change too and should be able to easily consume downstream.

@@ -82,7 +87,16 @@ class ArrayGroup[T: ClassTag](implicit grp: Group[T])
}.toArray
}

object Group extends GeneratedGroupImplicits with ProductGroups {
class FromAlgebraGroup[T](m: AGroup[T]) extends FromAlgebraMonoid(m) with Group[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just make this an implicit class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be an inner class of a trait to get the priority right. I think it is just clearer to manually handle it personally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Also, I think having one of the implicit classes extend the other would get ugly when both were defined in traits. This encoding seems pretty straightforward to me.

@non
Copy link
Contributor

non commented Sep 9, 2016

This looks good to me. 👍

I feel bad you had to remove Field but at least it seems like it wasn't widely-used.

class FromAlgebraSemigroup[T](sg: ASemigroup[T]) extends Semigroup[T] {
override def plus(l: T, r: T): T = sg.combine(l, r)
override def sumOption(ts: TraversableOnce[T]): Option[T] = sg.combineAllOption(ts)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly annoying to have to do this, but if you could provide the same thing from AdditiveSemigroup as well it would be really helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can do that. I can just call the implicit on .additive, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that should do it.

@johnynek
Copy link
Collaborator Author

johnynek commented Sep 19, 2016

I compiled scalding against this with the diff below. This looks really minimal to me, and I think is well worth the ability for spire, cats, and algebird to all interoperate.

What do you think @jnievelt of the solution I have for field? I used a normal object not a package object because scala recommends not nesting classes in package objects.

diff --git a/build.sbt b/build.sbt
index 846af97..f573fb0 100644
--- a/build.sbt
+++ b/build.sbt
@@ -19,7 +19,7 @@ def scalaBinaryVersion(scalaVersion: String) = scalaVersion match {
 }
 def isScala210x(scalaVersion: String) = scalaBinaryVersion(scalaVersion) == "2.10"

-val algebirdVersion = "0.12.1"
+val algebirdVersion = "0.12.2-SNAPSHOT"
 val apacheCommonsVersion = "2.2"
 val avroVersion = "1.7.4"
 val bijectionVersion = "0.9.1"
diff --git a/scalding-core/src/main/scala/com/twitter/scalding/mathematics/Matrix.scala b/scalding-core/src/main/scala/com/twitter/scalding/mathematics/Matrix.scala
index a053efd..f94a886 100644
--- a/scalding-core/src/main/scala/com/twitter/scalding/mathematics/Matrix.scala
+++ b/scalding-core/src/main/scala/com/twitter/scalding/mathematics/Matrix.scala
@@ -16,6 +16,7 @@ limitations under the License.
 package com.twitter.scalding.mathematics

 import com.twitter.algebird.{ Monoid, Group, Ring, Field }
+import com.twitter.algebird.field._ // backwards compatiblity support
 import com.twitter.scalding._

 import cascading.pipe.assembly._
@@ -461,7 +462,7 @@ class Matrix[RowT, ColT, ValT](val rowSym: Symbol, val colSym: Symbol, val valSy

   def /(that: LiteralScalar[ValT])(implicit field: Field[ValT]) = {
     field.assertNotZero(that.value)
-    mapValues(elem => field.div(elem, that.value))(field)
+    mapValues(elem => field.div(elem, that.value))
   }

   def /(that: Scalar[ValT])(implicit field: Field[ValT]) = {
@@ -469,7 +470,7 @@ class Matrix[RowT, ColT, ValT](val rowSym: Symbol, val colSym: Symbol, val valSy
       .mapValues({ leftRight: (ValT, ValT) =>
         val (left, right) = leftRight
         field.div(left, right)
-      })(field)
+      })
   }

   // Between Matrix value reduction - Generalizes matrix addition with an arbitrary value aggregation function
diff --git a/scalding-core/src/test/scala/com/twitter/scalding/mathematics/Matrix2Test.scala b/scalding-core/src/test/scala/com/twitter/scalding/mathematics/Matrix2Test.scala
index bd66a47..f4c19c4 100644
--- a/scalding-core/src/test/scala/com/twitter/scalding/mathematics/Matrix2Test.scala
+++ b/scalding-core/src/test/scala/com/twitter/scalding/mathematics/Matrix2Test.scala
@@ -15,6 +15,7 @@ limitations under the License.
 */
 package com.twitter.scalding.mathematics

+import com.twitter.algebird.field._
 import com.twitter.scalding._
 import com.twitter.scalding.serialization._
 import com.twitter.scalding.source.TypedText
diff --git a/scalding-core/src/test/scala/com/twitter/scalding/mathematics/MatrixTest.scala b/scalding-core/src/test/scala/com/twitter/scalding/mathematics/MatrixTest.scala
index 8dfc963..33a2621 100644
--- a/scalding-core/src/test/scala/com/twitter/scalding/mathematics/MatrixTest.scala
+++ b/scalding-core/src/test/scala/com/twitter/scalding/mathematics/MatrixTest.scala
@@ -15,6 +15,7 @@ limitations under the License.
 */
 package com.twitter.scalding.mathematics

+import com.twitter.algebird.field._
 import com.twitter.scalding._
 import cascading.pipe.joiner._
 import org.scalatest.{ Matchers, WordSpec }

Copy link
Collaborator Author

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some notes.

@@ -160,13 +163,14 @@ def module(name: String) = {
}

lazy val algebirdCore = module("core").settings(
test := { }, // All tests reside in algebirdTest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put some basic tests of the algebra interop here (they don't use any of the test properties we export, so I think it is the right place).

@codecov-io
Copy link

codecov-io commented Sep 19, 2016

Current coverage is 62.56% (diff: 34.78%)

Merging #523 into develop will decrease coverage by 0.99%

@@            develop       #523   diff @@
==========================================
  Files           110        111     +1   
  Lines          4435       4493    +58   
  Methods        4041       4079    +38   
  Messages          0          0          
  Branches        355        375    +20   
==========================================
- Hits           2819       2811     -8   
- Misses         1616       1682    +66   
  Partials          0          0          

Powered by Codecov. Last update 2c5aa7a...9ee238f

@piyushnarang
Copy link
Collaborator

👍 not super familiar with the code so I'd recommend waiting for @isnotinvain / @jnievelt to take a look before merging.

Copy link
Contributor

@jnievelt jnievelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Is the plan to eventually deprecate the algebird types?

def product(iter: TraversableOnce[T]): T = Ring.product(iter)(this)
trait Ring[@specialized(Int, Long, Float, Double) T] extends Group[T] with ARing[T] {
def one: T
def times(a: T, b: T): T
Copy link
Contributor

@jnievelt jnievelt Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we already use a, b somewhere? I do like l, r a little better, to emphasize that these are not commutative.

@@ -112,38 +114,84 @@ object LongRing extends Ring[Long] {
else Some(sum(t))
}

object FloatRing extends Ring[Float] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe def product = iter.product ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be no faster than the default product. That said, adding a custom sumOption and product there is a chance for improvement because we can remove half of the boxing that has to go on with TraversableOnce.

@johnynek
Copy link
Collaborator Author

johnynek commented Oct 3, 2016

@jnievelt I don't think we will deprecate algebird types in the short term, only because the resolution for built-in types is done without imports in algebird. Twitter (and others) would need to add an algebra import to get the old behavior.

Also, some behavior would be different (for instance the very questionable zero deletion from Map[K, V] when used with a Monoid[V] was convincingly argued to be a bad idea, so algebra does not do anything like that).

@jnievelt
Copy link
Contributor

jnievelt commented Oct 3, 2016

Makes sense 👍

Copy link
Contributor

@jnievelt jnievelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper 👍

@sritchie sritchie modified the milestone: 0.13.0 Nov 29, 2016
@sritchie sritchie changed the title First work towards using standard algebra types Use standard algebra types Dec 2, 2016
@johnynek
Copy link
Collaborator Author

johnynek commented Dec 2, 2016

👍 for Sam's changes.

@sritchie
Copy link
Collaborator

sritchie commented Dec 2, 2016

👍 thanks oscar!

@sritchie sritchie merged commit c1ede45 into develop Dec 2, 2016
@sritchie sritchie deleted the oscar/non-algebra branch December 2, 2016 23:23
@koertkuipers
Copy link
Contributor

since Field got dropped here algebird 0.12.4 breaks compatibility with scalding 0.16.0
i see this in our scalding unit tests (and we are not using Field explicitly to my knowledge, so i am not entirely sure where this is coming from)

scalding should probably update its algebird dependency now.

Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: com.twitter.algebird.Field
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.getDeclaredMethod(Class.java:2128)
	at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
	at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
	at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
	at com.twitter.chill.Externalizer.probeJavaWorks(Externalizer.scala:111)
	at com.twitter.chill.Externalizer.javaWorks(Externalizer.scala:100)
	at com.twitter.chill.Externalizer.writeJava(Externalizer.scala:163)
	at com.twitter.chill.Externalizer.maybeWriteJavaKryo(Externalizer.scala:180)
	at com.twitter.chill.Externalizer.writeExternal(Externalizer.scala:187)
	at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:145

@johnynek
Copy link
Collaborator Author

johnynek commented Feb 7, 2017

@koertkuipers yeah. https://github.com/twitter/algebird/releases/tag/0.12.4 that should not have been published as 0.12.4 since it is not backwards compatible. That was a goof up. But the horse is out of the barn.

0.13.0 will be released soon, and then we will release a scalding version shortly there-after.

@koertkuipers
Copy link
Contributor

great thanks

@piyushnarang
Copy link
Collaborator

@koertkuipers yeah sorry about the 0.12.4 push. We'll have a 0.13.0 version out soon and I'll be updating Scalding to pick it up as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.