Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAWB-2056] The status of the last run is showing green when there are errors in them #709

Merged
merged 18 commits into from
Jun 16, 2017
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions core/src/main/resources/swagger/rawls.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5051,11 +5051,11 @@ definitions:
lastSuccessDate:
type: string
format: date-time
description: The date of the last successful workflow
description: The date of the last successful submission
lastFailureDate:
type: string
format: date-time
description: The date of the last failed workflow
description: The date of the last failed submission
runningSubmissionsCount:
type: integer
description: Count of all the running submissions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@ import java.nio.ByteOrder
import java.sql.Timestamp
import java.util.UUID

import akka.util.{ByteString, ByteStringBuilder}
import akka.util.ByteString
import cats._
import cats.implicits._
import org.apache.commons.codec.binary.Base64
import org.broadinstitute.dsde.rawls.model._
import org.broadinstitute.dsde.rawls.{RawlsException, RawlsExceptionWithErrorReport}
import org.joda.time.DateTime
import slick.driver.JdbcDriver
import slick.jdbc.{GetResult, PositionedParameters, SQLActionBuilder, SetParameter}
import spray.http.StatusCodes
import org.apache.commons.codec.binary.Base64

import scala.concurrent.ExecutionContext

Expand Down Expand Up @@ -119,6 +120,48 @@ trait DriverComponent {
def renameForHiding(recordCount: Long, name: String): String = {
name + "_" + getSufficientlyRandomSuffix(recordCount)
}

/**
* Converts a `Seq[(A, B)]` into a `Map[A, B]`, combining the values with a `Monoid[B]` in case of key conflicts.
*
* For example:
* {{{
* scala> groupPairs(Seq(("a", 1), ("b", 2), ("a", 3)))
* res0: Map[String,Int] = Map(b -> 2, a -> 4)
* }}}
* */
def groupPairs[A, B: Monoid](pairs: Seq[(A, B)]): Map[A, B] =
pairs.toList.foldMap { case (a, b) => Map(a -> b) }

/**
* Converts a `Seq[(A, F[B])]` into a `Map[A, F[B]]`, combining the values with the _universal_ monoid for type F.
* This can be useful for when using groupBy with slick aggregate functions such as max, min, avg, etc which return
* an Option.
*
* For example:
* {{{
* scala> case class Foo(i: Int)
* defined class Foo
*
* scala> groupPairs(Seq(("a", Foo(1).some), ("b", Foo(2).some), ("c", Foo(3).some)))
* << does not compile as there is no Monoid instance for Foo >>
*
* scala> groupPairsK(Seq(("a", Foo(1).some), ("b", Foo(2).some), ("c", Foo(3).some)))
* res9: Map[String,Option[Foo]] = Map(b -> Some(Foo(2)), a -> Some(Foo(1)), c -> Some(Foo(3)))
* }}}
*/
def groupPairsK[F[_], A, B](pairs: Seq[(A, F[B])])(implicit M: MonoidK[F]): Map[A, F[B]] =
groupPairs(pairs)(M.algebra[B])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something tells me it'll be a while before this PR gets merged! (i have to wrap my head around this)

Copy link
Contributor Author

@rtitle rtitle Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry if this got a little fancy.. it's not really too bad. Check out:
https://github.com/typelevel/cats/blob/master/docs/src/main/tut/typeclasses/semigroupk.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of feedback here.

  1. Check out CollectionUtils - seems like a good place for these to go.
  2. I don't understand this.
  3. Link to the HTML version of the documentation: it contains hugely important comment lines that aren't rendered inside GitHub!
  4. sbt console gives me this when I try to define groupPairsK:
<pastie>:18: warning: higher-kinded type should be enabled
by making the implicit value scala.language.higherKinds visible.
This can be achieved by adding the import clause 'import scala.language.higherKinds'
or by setting the compiler option -language:higherKinds.
See the Scaladoc for value scala.language.higherKinds for a discussion
why the feature should be explicitly enabled.
  1. groupPairs returns a map. You're then dereferencing the key defined by M.algebra[B]? I don't know what that does and I'd have thought that this would give you a result type of F[B].
  2. Your groupPairsK example is equivalent to toMap, which doesn't help me understand. Perhaps
scala> groupPairsK(Seq(("a", Foo(1).some), ("b", Foo(2).some), ("b", Foo(3).some)))
res0: Map[String,Option[Foo]] = Map(b -> Some(Foo(2)), a -> Some(Foo(1)))

would be more helpful, assuming I've read the Cats documentation (and therefore know that in the absence of a quantified F (= Option) means it'll just do orElse).

TLDR: I am 👍 on groupPairs and groupTriples, though they should go into CollectionUtils. I am 👎 on the K-versions; they rely on non-obvious behaviour (e.g. orElse for Option) that make them hard to understand.

Copy link
Contributor Author

@rtitle rtitle Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback here. After reflecting a bit, maybe it would have been better to leave groupByWorkspaceIdThenStatus and groupByWorkspaceId as is -- I didn't really need to change them for this bug. I'm still learning how to scala in a large team, and readability is important.

Digging in to your points:

  1. +1
  2. see below
  3. ok
  4. Huh, I haven't seen that warning before. I had to enable the -feature compiler flag to see it. I guess it's complaining about the F[_] type parameter. The scaladoc is kind of funny:

Why control it? Higher kinded types in Scala lead to a Turing-complete type system, where compiler termination is no longer guaranteed. They tend to be useful mostly for type-level computation and for highly generic design patterns. The level of abstraction implied by these design patterns is often a barrier to understanding for newcomers to a Scala codebase. Some syntactic aspects of higher-kinded types are hard to understand for the uninitiated and type inference is less effective for them than for normal types. Because we are not completely happy with them yet, it is possible that some aspects of higher-kinded types will change in future versions of Scala. So an explicit enabling also serves as a warning that code involving higher-kinded types might have to be slightly revised in the future.

4,5. So here's why I added the K version. In this case I wanted to group triples where the map value is an Option[java.sql.Timestamp]. Therefore I need a Monoid instance for java.sql.Timestamp, which doesn't exist. It wouldn't make much sense to define a Monoid for timestamps -- what would you do, add them together? However, in this case I'm guaranteed to have no key conflicts because of the SQL structure (group by, etc). So I can just use a Monoid for Option which just takes one or the other, regardless of the value inside the "box". That's exactly what MonoidK[Option] does.

Furthermore, I'm not sure how I would implement this in terms of groupTriples without using MonoidK. Perhaps this would be more clear?

private def groupByWorkspaceIdThenStatus(workflowDates: Seq[(UUID, String, Option[Timestamp])]): Map[UUID, Map[String, Option[Timestamp]]] = {
  // bla bla comment about bringing a monoid into scope with Option.orElse behavior
  implicit val optionUniversalMonoid: Monoid[Option[Timestamp]] = MonoidK[Option].algebra[Timestamp]
  CollectionUtils.groupTriples(workflowDates)
}

Then at least the MonoidK stuff is localized in a private method with some explanation specific to the use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and pushed a commit with ^ those changes, please take a look.


// Same as above but with triples.
// Note: if we used shapeless we could generalize these functions for any arity.
// But I'm not going to add shapeless just for this. :)
def groupTriples[A, B, C: Monoid](trips: Seq[(A, B, C)]): Map[A, Map[B, C]] =
trips.toList.foldMap { case (a, b, c) => Map(a -> Map(b -> c)) }

def groupTriplesK[F[_], A, B, C](trips: Seq[(A, B, F[C])])(implicit M: MonoidK[F]): Map[A, Map[B, F[C]]] = {
groupTriples(trips)(M.algebra[C])
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@ package org.broadinstitute.dsde.rawls.dataaccess.slick
import java.sql.Timestamp
import java.util.{Date, UUID}

import cats.instances.int._
import cats.instances.option._
import org.broadinstitute.dsde.rawls.RawlsException
import org.broadinstitute.dsde.rawls.dataaccess.SlickWorkspaceContext
import org.broadinstitute.dsde.rawls.model.Attributable.AttributeMap
import org.broadinstitute.dsde.rawls.model.WorkspaceAccessLevels.WorkspaceAccessLevel
import org.broadinstitute.dsde.rawls.model._
import org.joda.time.DateTime
import org.broadinstitute.dsde.rawls.dataaccess.SlickWorkspaceContext
import org.broadinstitute.dsde.rawls.model.Attributable.AttributeMap

/**
* Created by dvoet on 2/4/16.
*/
Expand Down Expand Up @@ -499,36 +500,63 @@ trait WorkspaceComponent {
}

/**
* gets the submission stats (last workflow failed date, last workflow success date, running submission count)
* gets the submission stats (last submission failed date, last submission success date, running submission count)
* for each workspace
*
* @param workspaceIds the workspace ids to query for
* @return WorkspaceSubmissionStats keyed by workspace id
*/
def listSubmissionSummaryStats(workspaceIds: Seq[UUID]): ReadAction[Map[UUID, WorkspaceSubmissionStats]] = {
// workflow date query: select workspaceId, workflow.status, max(workflow.statusLastChangedDate) ... group by workspaceId, workflow.status
val workflowDatesQuery = for {

// submission date query:
//
// select workspaceId, status, max(submissionDate)
// from (
// select submission.workspaceId, workflow.status, submission.submissionDate, count(1)
// from submission
// join workflow on workflow.submissionId = submission.id
// where submission.workspaceId in (:workspaceIds)
// group by 1, 2, 3) v
// where (status = 'Failure' or (status = 'Succeeded' and count = 1)
// group by 1, 2

val workflowStatusQuery = for {
submissions <- submissionQuery if submissions.workspaceId.inSetBind(workspaceIds)
workflows <- workflowQuery if submissions.id === workflows.submissionId
} yield (submissions.workspaceId, workflows.status, workflows.statusLastChangedDate)
} yield (submissions.workspaceId, workflows.status, submissions.submissionDate)

val workflowDatesGroupedQuery = workflowDatesQuery.groupBy { case (wsId, status, _) => (wsId, status) }.
map { case ((wsId, wfStatus), records) => (wsId, wfStatus, records.map { case (_, _, lastChanged) => lastChanged }.max) }
val groupedWorkflowStatusQuery = workflowStatusQuery.groupBy { case (wsId, status, submissionDate) =>
(wsId, status, submissionDate)
}.map { case ((wsId, status, submissionDate), records) =>
(wsId, status, submissionDate, records.map(_._3).length)
}

// Note: a submission is successful if it contains _only_ successful workflows.
// A submission is a failure if it contains _any_ failed workflows.
val filteredWorkflowStatusQuery = groupedWorkflowStatusQuery.filter { case (_, status, _, count) =>
status === WorkflowStatuses.Failed.toString || (status === WorkflowStatuses.Succeeded.toString && count === 1)
}

val submissionDateQuery = filteredWorkflowStatusQuery.groupBy { case (workspaceId, status, submissionDate, _) =>
(workspaceId, status)
}.map { case ((workspaceId, status), recs) =>
(workspaceId, status, recs.map(_._3).max)
}

// running submission query: select workspaceId, count(1) ... where submissions.status === Submitted group by workspaceId
val runningSubmissionsQuery = (for {
submissions <- submissionQuery if submissions.workspaceId.inSetBind(workspaceIds) && submissions.status.inSetBind(SubmissionStatuses.activeStatuses.map(_.toString))
} yield submissions).groupBy(_.workspaceId).map { case (wfId, submissions) => (wfId, submissions.length)}

for {
workflowDates <- workflowDatesGroupedQuery.result
submissionDates <- submissionDateQuery.result
runningSubmissions <- runningSubmissionsQuery.result
} yield {
val workflowDatesByWorkspaceByStatus: Map[UUID, Map[String, Option[Timestamp]]] = groupByWorkspaceIdThenStatus(workflowDates)
val runningSubmissionCountByWorkspace: Map[UUID, Int] = groupByWorkspaceId(runningSubmissions)
val submissionDatesByWorkspaceByStatus = groupTriplesK(submissionDates)
val runningSubmissionCountByWorkspace = groupPairs(runningSubmissions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rename from groupByWorkspaceIdThenStatus -> groupTriplesK and groupByWorkspaceId -> groupPairs loses a lot of readability here. Can you keep the function definitions for readability, even if they just point to other ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure


workspaceIds.map { wsId =>
val (lastFailedDate, lastSuccessDate) = workflowDatesByWorkspaceByStatus.get(wsId) match {
val (lastFailedDate, lastSuccessDate) = submissionDatesByWorkspaceByStatus.get(wsId) match {
case None => (None, None)
case Some(datesByStatus) =>
(datesByStatus.getOrElse(WorkflowStatuses.Failed.toString, None), datesByStatus.getOrElse(WorkflowStatuses.Succeeded.toString, None))
Expand Down Expand Up @@ -731,14 +759,6 @@ trait WorkspaceComponent {
WorkspaceGroups(toGroupMap(realmAclRecs), toGroupMap(accessGroupRecs))
}
}

private def groupByWorkspaceId(runningSubmissions: Seq[(UUID, Int)]): Map[UUID, Int] = {
Copy link
Contributor Author

@rtitle rtitle Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods seemed useful so I cats-ified and generalized them and moved to DriverComponent.

runningSubmissions.groupBy{ case (wsId, count) => wsId }.mapValues { case Seq((_, count)) => count }
}

private def groupByWorkspaceIdThenStatus(workflowDates: Seq[(UUID, String, Option[Timestamp])]): Map[UUID, Map[String, Option[Timestamp]]] = {
workflowDates.groupBy { case (wsId, _, _) => wsId }.mapValues(_.groupBy { case (_, status, _) => status }.mapValues { case Seq((_, _, timestamp)) => timestamp })
}
}

private case class WorkspaceGroups(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,4 +114,39 @@ class WorkspaceComponentSpec extends TestDriverComponentWithFlatSpecAndMatchers
runAndWait(workspaceQuery.delete(workspace.toWorkspaceName))
}
}

it should "list submission summary stats" in withDefaultTestDatabase {
implicit def toWorkspaceId(ws: Workspace): UUID = UUID.fromString(ws.workspaceId)

val wsIdNoSubmissions: UUID = testData.workspaceNoSubmissions
assertResult(Map(wsIdNoSubmissions -> WorkspaceSubmissionStats(None, None, 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdNoSubmissions)))
}

val wsIdSuccessfulSubmission: UUID = testData.workspaceSuccessfulSubmission
assertResult(Map(wsIdSuccessfulSubmission -> WorkspaceSubmissionStats(Some(testDate), None, 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdSuccessfulSubmission)))
}

val wsIdFailedSubmission: UUID = testData.workspaceFailedSubmission
assertResult(Map(wsIdFailedSubmission -> WorkspaceSubmissionStats(None, Some(testDate), 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdFailedSubmission)))
}

val wsIdSubmittedSubmission: UUID = testData.workspaceSubmittedSubmission
assertResult(Map(wsIdSubmittedSubmission -> WorkspaceSubmissionStats(None, None, 1))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdSubmittedSubmission)))
}

// Note: a submission with both a successful and failed workflow is a failure
val wsIdMixedSubmission: UUID = testData.workspaceMixedSubmissions
assertResult(Map(wsIdMixedSubmission -> WorkspaceSubmissionStats(None, Some(testDate), 1))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdMixedSubmission)))
}

val wsIdTerminatedSubmission: UUID = testData.workspaceTerminatedSubmissions
assertResult(Map(wsIdTerminatedSubmission -> WorkspaceSubmissionStats(Some(testDate), Some(testDate), 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdTerminatedSubmission)))
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ class WorkspaceApiServiceSpec extends ApiServiceSpec {
}
val dateTime = currentTime()
assertResult(
WorkspaceListResponse(WorkspaceAccessLevels.Owner, testWorkspaces.workspace.copy(lastModified = dateTime), WorkspaceSubmissionStats(Option(testDate), Option(testDate), 2), Seq(testData.userOwner.userEmail.value))
WorkspaceListResponse(WorkspaceAccessLevels.Owner, testWorkspaces.workspace.copy(lastModified = dateTime), WorkspaceSubmissionStats(None, Option(testDate), 2), Seq(testData.userOwner.userEmail.value))
){
val response = responseAs[WorkspaceListResponse]
WorkspaceListResponse(response.accessLevel, response.workspace.copy(lastModified = dateTime), response.workspaceSubmissionStats, response.owners)
Expand Down Expand Up @@ -520,7 +520,7 @@ class WorkspaceApiServiceSpec extends ApiServiceSpec {

val dateTime = currentTime()
assertResult(Set(
WorkspaceListResponse(WorkspaceAccessLevels.Owner, testWorkspaces.workspace.copy(lastModified = dateTime), WorkspaceSubmissionStats(Option(testDate), Option(testDate), 2), Seq(testData.userOwner.userEmail.value)),
WorkspaceListResponse(WorkspaceAccessLevels.Owner, testWorkspaces.workspace.copy(lastModified = dateTime), WorkspaceSubmissionStats(None, Option(testDate), 2), Seq(testData.userOwner.userEmail.value)),
WorkspaceListResponse(WorkspaceAccessLevels.Write, testWorkspaces.workspace2.copy(lastModified = dateTime), WorkspaceSubmissionStats(None, None, 0), Seq.empty),
WorkspaceListResponse(WorkspaceAccessLevels.NoAccess, testWorkspaces.workspace3.copy(lastModified = dateTime), WorkspaceSubmissionStats(None, None, 0), Seq.empty)
)) {
Expand Down