Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAWB-2056] The status of the last run is showing green when there are errors in them #709

Merged
merged 18 commits into from
Jun 16, 2017
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions core/src/main/resources/swagger/rawls.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5051,11 +5051,11 @@ definitions:
lastSuccessDate:
type: string
format: date-time
description: The date of the last successful workflow
description: The date of the last successful submission
lastFailureDate:
type: string
format: date-time
description: The date of the last failed workflow
description: The date of the last failed submission
runningSubmissionsCount:
type: integer
description: Count of all the running submissions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,13 @@ import java.nio.ByteOrder
import java.sql.Timestamp
import java.util.UUID

import akka.util.{ByteString, ByteStringBuilder}
import akka.util.ByteString
import org.apache.commons.codec.binary.Base64
import org.broadinstitute.dsde.rawls.model._
import org.broadinstitute.dsde.rawls.{RawlsException, RawlsExceptionWithErrorReport}
import org.joda.time.DateTime
import slick.driver.JdbcDriver
import slick.jdbc.{GetResult, PositionedParameters, SQLActionBuilder, SetParameter}
import spray.http.StatusCodes
import org.apache.commons.codec.binary.Base64

import scala.concurrent.ExecutionContext

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@ package org.broadinstitute.dsde.rawls.dataaccess.slick
import java.sql.Timestamp
import java.util.{Date, UUID}

import cats.{Monoid, MonoidK}
import cats.instances.int._
import cats.instances.option._
import org.broadinstitute.dsde.rawls.RawlsException
import org.broadinstitute.dsde.rawls.dataaccess.SlickWorkspaceContext
import org.broadinstitute.dsde.rawls.model.Attributable.AttributeMap
import org.broadinstitute.dsde.rawls.model.WorkspaceAccessLevels.WorkspaceAccessLevel
import org.broadinstitute.dsde.rawls.model._
import org.broadinstitute.dsde.rawls.util.CollectionUtils
import org.joda.time.DateTime
import org.broadinstitute.dsde.rawls.dataaccess.SlickWorkspaceContext
import org.broadinstitute.dsde.rawls.model.Attributable.AttributeMap

/**
* Created by dvoet on 2/4/16.
*/
Expand Down Expand Up @@ -499,36 +502,56 @@ trait WorkspaceComponent {
}

/**
* gets the submission stats (last workflow failed date, last workflow success date, running submission count)
* gets the submission stats (last submission failed date, last submission success date, running submission count)
* for each workspace
*
* @param workspaceIds the workspace ids to query for
* @return WorkspaceSubmissionStats keyed by workspace id
*/
def listSubmissionSummaryStats(workspaceIds: Seq[UUID]): ReadAction[Map[UUID, WorkspaceSubmissionStats]] = {
// workflow date query: select workspaceId, workflow.status, max(workflow.statusLastChangedDate) ... group by workspaceId, workflow.status
val workflowDatesQuery = for {

// submission date query:
//
// select workspaceId, status, max(submissionDate)
// from (
// select distinct submission.workspaceId, workflow.status, submission.submissionDate
// from submission
// join workflow on workflow.submissionId = submission.id
// where submission.workspaceId in (:workspaceIds)) v
// group by 1, 2
// having (status = 'Failure' or (status = 'Succeeded' and count(v.*) = 1))
Copy link
Contributor Author

@rtitle rtitle Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helgridly based on our conversation how does this revised query look to you?

      select workspaceId, status, max(subEndDate)
      from (
        select submission.id, submission.workspaceId, workflow.status, max(workflow.statusLastChangedDate) as subEndDate
        from submission
        join workflow on workflow.submissionId = submission.id
        where submission.workspaceId in (:workspaceIds)
        group by 1, 2, 3) v
      group by 1, 2
      having (status = 'Failure' or (status = 'Succeeded' and count(v.id) = 1))

Explanation:

  • inner query returns the most recent workflow status change date, per workflow status and submission
  • outer query returns the most recent workflow status change date where:
    • the status is Failure; or
    • the status is Succeeded and that is the only status in the submission

I can code this up and try to break it with tests too, just thought I'd post the SQL first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right, though SQL isn't my native language!

Copy link
Contributor Author

@rtitle rtitle Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually after some testing I think it's not right. SQL is not my native language either (or maybe it's my slick mapping). :( Still working through it, expect another iteration...


val workflowStatusQuery = (for {
submissions <- submissionQuery if submissions.workspaceId.inSetBind(workspaceIds)
workflows <- workflowQuery if submissions.id === workflows.submissionId
} yield (submissions.workspaceId, workflows.status, workflows.statusLastChangedDate)
} yield (submissions.workspaceId, workflows.status, submissions.submissionDate)).distinct

val submissionMaxDateQuery = workflowStatusQuery.groupBy { case (workspaceId, status, submissionDate) =>
(workspaceId, status)
}.map { case ((workspaceId, status), recs) =>
(workspaceId, status, recs.map(_._3).max, recs.length)
}

val workflowDatesGroupedQuery = workflowDatesQuery.groupBy { case (wsId, status, _) => (wsId, status) }.
map { case ((wsId, wfStatus), records) => (wsId, wfStatus, records.map { case (_, _, lastChanged) => lastChanged }.max) }
// Note: a submission is successful if it contains _only_ successful workflows.
// A submission is a failure if it contains _any_ failed workflows.
val filteredSubmissionMaxDateQuery = submissionMaxDateQuery.filter { case (_, status, _, count) =>
status === WorkflowStatuses.Failed.toString || (status === WorkflowStatuses.Succeeded.toString && count === 1)
}.map { case (workspaceId, status, max, _) => (workspaceId, status, max)}

// running submission query: select workspaceId, count(1) ... where submissions.status === Submitted group by workspaceId
val runningSubmissionsQuery = (for {
submissions <- submissionQuery if submissions.workspaceId.inSetBind(workspaceIds) && submissions.status.inSetBind(SubmissionStatuses.activeStatuses.map(_.toString))
} yield submissions).groupBy(_.workspaceId).map { case (wfId, submissions) => (wfId, submissions.length)}

for {
workflowDates <- workflowDatesGroupedQuery.result
submissionDates <- filteredSubmissionMaxDateQuery.result
runningSubmissions <- runningSubmissionsQuery.result
} yield {
val workflowDatesByWorkspaceByStatus: Map[UUID, Map[String, Option[Timestamp]]] = groupByWorkspaceIdThenStatus(workflowDates)
val submissionDatesByWorkspaceByStatus: Map[UUID, Map[String, Option[Timestamp]]] = groupByWorkspaceIdThenStatus(submissionDates)
val runningSubmissionCountByWorkspace: Map[UUID, Int] = groupByWorkspaceId(runningSubmissions)

workspaceIds.map { wsId =>
val (lastFailedDate, lastSuccessDate) = workflowDatesByWorkspaceByStatus.get(wsId) match {
val (lastFailedDate, lastSuccessDate) = submissionDatesByWorkspaceByStatus.get(wsId) match {
case None => (None, None)
case Some(datesByStatus) =>
(datesByStatus.getOrElse(WorkflowStatuses.Failed.toString, None), datesByStatus.getOrElse(WorkflowStatuses.Succeeded.toString, None))
Expand Down Expand Up @@ -733,11 +756,22 @@ trait WorkspaceComponent {
}

private def groupByWorkspaceId(runningSubmissions: Seq[(UUID, Int)]): Map[UUID, Int] = {
Copy link
Contributor Author

@rtitle rtitle Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods seemed useful so I cats-ified and generalized them and moved to DriverComponent.

runningSubmissions.groupBy{ case (wsId, count) => wsId }.mapValues { case Seq((_, count)) => count }
CollectionUtils.groupPairs(runningSubmissions.toList)
}

private def groupByWorkspaceIdThenStatus(workflowDates: Seq[(UUID, String, Option[Timestamp])]): Map[UUID, Map[String, Option[Timestamp]]] = {
workflowDates.groupBy { case (wsId, _, _) => wsId }.mapValues(_.groupBy { case (_, status, _) => status }.mapValues { case Seq((_, _, timestamp)) => timestamp })
// There is no Monoid instance for Option[Timestamp] so we need to bring one into scope.
// However a Monoid for Timestamp doesn't really make sense -- what would it do, add them together?
// We can take advantage of the _universal_ monoid for Option which combines Option values using
// Option.orElse. It's called universal because it works no matter the type inside the Option.
// This is fine in this case because there are guaranteed no key conflicts due to the SQL query
// structure (group by, etc).
//
// TL/DR: The following line brings into scope a Monoid[Option[Timestamp]] which combines values
// using Option.orElse.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of my confusion here is that you're introducing Monoid, which is used to combine things, using a non-combining operation (orElse), in a situation where you never need to combine two of them anyway because the UUID/String combo is guaranteed to be unique. This is a pretty head-bendy way to achieve the desired result, even if it does work.

Copy link
Contributor Author

@rtitle rtitle Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the orElse is explicit now. Before, in this code:

...mapValues { case Seq((_, _, timestamp)) => timestamp })

it's still expecting a unique UUID/String combo, and if that were not the case, there would be a runtime MatchError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is less readable than doing it explicitly but I'm going on vacation tomorrow and thus can't afford to spend more time arguing about it :) Instead, if you replace your comment with the following one, I'll shut up and thumb this. In the long-term, we need a centralised place for these explanations, because throwing them in comments on their first use won't help new hires understand them if they happen upon the uncommented second or third use first. But we can worry about that (a little) later.


The function groupTriples, called below, transforms a Seq((T1, T2, T3)) to a Map(T1 -> Map(T2 -> T3)). It does this by calling foldMap, which in turn requires a monoid for T3. In our case, T3 is an Option[Timestamp], so we need to provide an implicit monoid for Option[Timestamp].

There isn't really a sane monoid implementation for Timestamp (what would you do, add them?). Thankfully it turns out that the UUID/String pairs in workflowDates are always unique, so it doesn't matter what the monoid does because it'll never be used to combine two Option[Timestamp]s. It just needs to be provided in order to make the compiler happy.

To do this, we use the universal monoid for Option, MonoidK[Option]. Note that the inner Option takes no type parameter: MonoidK doesn't care about the type inside Option, it just calls orElse on the Option for its "combine" operator. Finally, the call to algebra[Timestamp] turns a MonoidK[Option] into a Monoid[Option[Timestamp]] by leaving the monoid implementation alone (so it still calls orElse) and poking the Timestamp type into the Option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok :)


implicit val optionTimestampMonoid: Monoid[Option[Timestamp]] = MonoidK[Option].algebra[Timestamp]
CollectionUtils.groupTriples(workflowDates.toList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the toList here any more, now you've swapped them to use Seq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
package org.broadinstitute.dsde.rawls.util

import cats.Monoid
import cats.instances.list._
import cats.instances.map._
import cats.syntax.foldable._

object CollectionUtils {

//A saner group by than Scala's.
Expand All @@ -10,4 +15,20 @@ object CollectionUtils {
def groupByTuplesFlatten[A, B]( tupleSeq: Seq[(A, Seq[B])] ): Map[A, Seq[B]] = {
tupleSeq groupBy { case (a,b) => a } map { case (k, v) => k -> v.flatMap(_._2) }
}

/**
* Converts a `Seq[(A, B)]` into a `Map[A, B]`, combining the values with a `Monoid[B]` in case of key conflicts.
*
* For example:
* {{{
* scala> groupPairs(Seq(("a", 1), ("b", 2), ("a", 3)))
* res0: Map[String,Int] = Map(b -> 2, a -> 4)
* }}}
* */
def groupPairs[A, B: Monoid](pairs: List[(A, B)]): Map[A, B] =
pairs.foldMap { case (a, b) => Map(a -> b) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupByTuplesFlatten above could call this, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also these should probably prefer Seq to List

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re Seq: sure

I guess this would work:

  //A saner group by than Scala's.
  def groupByTuples[A, B]( tupleSeq: Seq[(A,B)] ): Map[A, Seq[B]] = {
    tupleSeq.toList.foldMap { case (a, b) => Map(a -> Seq(b)) }
  }

  def groupByTuplesFlatten[A, B]( tupleSeq: Seq[(A, Seq[B])] ): Map[A, Seq[B]] = {
    groupPairs(tupleSeq)
  }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it prefer Seq? Do you expect it to be taking arbitrary Seqs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seq is less specific and we use it all over the place. We'd have to start jamming .toList everywhere were we to use List.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe people shouldn't have overused Seq and then that wouldn't be an issue

We have the same problem too and it's annoying. We occasionally run into issues where the thing really only works properly on List but things compile due to Seq and someone jammed something bad in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant just groupByTuplesFlatten anyway, but oh well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtitle Cuz the Cats folk understand the value in saying what you mean ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input, Jeff! When we wholesale switch from Seq to List I'll let you know so you can say I told you so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change these to take Seq (and just call toList on them).

I won't touch groupByTuples/groupByTuplesFlatten because that would require introducing monoids and I don't want to break any code.


// Same as above but with triples
def groupTriples[A, B, C: Monoid](trips: List[(A, B, C)]): Map[A, Map[B, C]] =
trips.foldMap { case (a, b, c) => Map(a -> Map(b -> c)) }
}
Original file line number Diff line number Diff line change
Expand Up @@ -114,4 +114,39 @@ class WorkspaceComponentSpec extends TestDriverComponentWithFlatSpecAndMatchers
runAndWait(workspaceQuery.delete(workspace.toWorkspaceName))
}
}

it should "list submission summary stats" in withDefaultTestDatabase {
implicit def toWorkspaceId(ws: Workspace): UUID = UUID.fromString(ws.workspaceId)

val wsIdNoSubmissions: UUID = testData.workspaceNoSubmissions
assertResult(Map(wsIdNoSubmissions -> WorkspaceSubmissionStats(None, None, 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdNoSubmissions)))
}

val wsIdSuccessfulSubmission: UUID = testData.workspaceSuccessfulSubmission
assertResult(Map(wsIdSuccessfulSubmission -> WorkspaceSubmissionStats(Some(testDate), None, 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdSuccessfulSubmission)))
}

val wsIdFailedSubmission: UUID = testData.workspaceFailedSubmission
assertResult(Map(wsIdFailedSubmission -> WorkspaceSubmissionStats(None, Some(testDate), 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdFailedSubmission)))
}

val wsIdSubmittedSubmission: UUID = testData.workspaceSubmittedSubmission
assertResult(Map(wsIdSubmittedSubmission -> WorkspaceSubmissionStats(None, None, 1))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdSubmittedSubmission)))
}

// Note: a submission with both a successful and failed workflow is a failure
val wsIdMixedSubmission: UUID = testData.workspaceMixedSubmissions
assertResult(Map(wsIdMixedSubmission -> WorkspaceSubmissionStats(Some(testDate), Some(testDate), 1))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdMixedSubmission)))
}

val wsIdTerminatedSubmission: UUID = testData.workspaceTerminatedSubmissions
assertResult(Map(wsIdTerminatedSubmission -> WorkspaceSubmissionStats(Some(testDate), Some(testDate), 0))) {
runAndWait(workspaceQuery.listSubmissionSummaryStats(Seq(wsIdTerminatedSubmission)))
}
}
}