Area-Proportional Venn Diagram generator (WIP)
- Demo: gradient descent toward target region sizes
- Examples
- Prior art
- Status
- Background
- Methods
- Future directions
- Other notes/references
Live app: runsascoded.com/apvd
Here's a sample run with targets reflecting the proportion of natural numbers that are divisible by 3, 5, 7, or any combination thereof:
fizz.buzz.bazz.example.mp4
It's fairly "alpha" (see /issues), but I believe it's "state of the art" at creating area-proportional Venn diagrams using up to 4 ellipses.
Here's Fig. 3, along with the best layout I've found using ∧p∨d:
The error is just under 0.2%: about 1/500th of the overall area is in the wrong region.
Here's a closer look at the center, with region-size labels:
The "1.86" (Red ∩ Blue ∩ Yellow) should be 0, and there a missing { Red ∩ Green ∩ Yellow } of size 1. I use a rudimentary penalty that moves shapes closer together when a region is missing (the converse happens naturally via gradient descent), but it could definitely be made smarter.
Definitely lots of low-hanging UX improvements as well!
Fig. 7, paper and ∧p∨d versions:
- Also discussed by Lior Pachter
- Live links to best solutions: D, E, F
From the supplement:
∧p∨d struggles a bit with this one:
Error is 22.9 (4.64%); half of that is { Red ∩ Yellow ∩ Blue }, which is 1.56 instead of 12. Incorporating each region's relative error would likely produce more intuitive results (see shapes#9).
(Also, the "182" for "TP53 only" is on the { TP53 ∩ STK11 } region; see #7)
TODO: show ∧p∨d solutions to these…
Java applet, up to 3 ellipses, area-proportional:
Here is its solution to a simple 3-set example featured below:
Their paper is a great reference, and includes many examples from published papers.
Web app, "Euler Diagrams Drawn with Ellipses Area-Proportionally (Edeap)":
The area-proportionality seems pretty imprecise, in this example and others. It seems to use a force-directed algorithm that doesn't actually seek a perfect/converged solutions.
Several tools generate area-proportional Venn diagrams using circles, but this doesn't allow for modeling most inputs.
For example, here's ∧p∨d trying to fit the first example below using three circles:
benfred.circles_1M.mp4
(initial layout, best approximation; the reported "7.36%" error is "earth-mover distance", as a percentage of the overall diagram size)
Allowing just one set to be an ellipse (even "aligned" to the axes, with no rotation) is enough for this diagram to converge:
Web app, 3 circles:
Note that the solution above is imprecise:
- Web app, Python package, R package
- Up to 3 circles
Web app by the BioVenn author, up to 10 sets, circles only:
Windows executable, 2-3 circles:
Not area-proportional, but will draw up to 6 sets:
Web app, up to 4 ellipses, not area-proportional:
Web app, up to 4 rectangles, not area-proportional:
R package, 2-7 sets, not area-proportional:
Web app, up to 5 sets, not area-proportional:
The demo at runsascoded.com/apvd supports up to 4 ellipses (including allowing them to rotate relative to the axes), and arbitrary initial layouts and target region-sizes. It can gradient-descend for 100,000's of steps and converge, or reach negligible error levels, on most examples I've tested it on.
Future work could involve:
- command-line / server-based version (evolving models from multiple initial layouts in parallel)
- more shapes (rectangles, polygons, splines)
- more ability to configure examples, and save and share state
See /issues for more info.
When I first saw diagrams like this in the wild, I hadn't realized that 4 ellipses can intersect to form all 15 (
The wildly disproportionate areas bugged me, though. I read up on it a bit, and found some relatively accessible open problems related to generating "area-proportional Venn diagrams".
I believe this library pushes the state of the art forward a bit (e.g. by supporting 4 ellipses), and hopefully provides a platform for further progress.
This repo is primarily the "frontend" / web app; the interesting computation occurs in runsascoded/shapes.
The basic approach is to:
- model a "scene" with various intersecting shapes
- compute intersection points
- infer regions and sizes for each subset of the set of shapes
- compute error (
$|desired size - actual size|$ , for all regions) - gradient-descend the shapes' coordinates (e.g. center x/y and radius x/y) in the direction of decreasing error
Several of these steps turn out to be nontrivial, especially:
- computing intersection points between 2 ellipses (involves solving a quartic equation, as ellipses can intersect in up to 4 points)
- propagating useful gradients through all calculations (requires an autodiff abstraction, and some care to avoid numeric instability)
Computing ellipse intersections can be represented as solving a system of equations that looks like:
With some algebraic manipulation, this can be represented as a quartic equation in one variable (reflectin the fact that two ellipses can intersect at up to 4 points). I initially used the Rust roots crate, but hit issues (#29, #30), and wrote quartic.rs
instead.
The nb/ folder contains notebooks with relevant math and derivations, especially:
Most of the computations related to shapes' intersections and corresponding region areas are differentiable. I use a thin wrapper (dual.rs
) around num_dual
's DualDVec64
for this.
shapes#1 describes moving to statically-sized Duals, which would likely be faster, and make the code much cleaner.
The "missing region penalty" should be proportional to the distance each shape is from the required region existing. This would allow shapes to rotate to get closer, whereas currently I just move their centers closer together, which often ends up thrashing against added error in other regions.
See shapes#10.
Currently, absolute difference between target and actual region sizes is all that is considered. Incorporating relative error is appealing, but it raises questions about what to do with missing or erroneous (shouldn't exist but do) regions.
A mix of absolute and relative may be most intuitive/aesthetic; see shapes#9.
See also: shapes#11.
My approach should be extensible to arbitrary polygons (including non-convex), which would allow for intersections of 5 (or more?) sets.
I suspect the usefulness of even a perfect layout at that level to be limited, but it would be nice to have the option, and 5-set diagrams exist in the wild:
Polygons with rounded corners should also be doable, and might be more aesthetically pleasing.
Cubic-bezier splines would be ideal, but the math is much harder; I'm not sure how to compute it exactly (using my autodiff-based approach).
This SO answer quotes a lost forum post outlining how to use the divergence theorem to compute the area of a poly-bezier loop. It's possible that (or some approximation that is nevertheless autodiff-able) would work…
- List of Venn diagram tools for bioinformaticians
- RectEuler: many links to other tools
- Vsauce twitter thread
- O'Rawe et al 2013: 3- and 5-set Venn diagrams
I previously implemented an interface for computing ellipse intersections, which included draggable and resizable shapes:
drag-ellipses.mp4
(live demo: runsascoded.com/apvd/ellipses)
It's implemented in JS, pre-shapes, and computes intersections and region sizes, but isn't differentiable / can't gradient-descend.
There's also a partial Scala.js implementation in this repo's @scala branch, including cubic and quartic equation solvers.
https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennEJC.html
https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennSymmExamples.html
https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennTriangleEJC.html
https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennPoly67EJC.html
Shown below is a 6-Venn diagram formed entirely from curves drawn from axis-aligned edges. It is a minimum-area diagram; that is, each region is composed of a single square of unit area. Note that many edges overlap, so the diagram is infinitely intersecting. As with many other diagrams in these pages, regions are coloured by weight. The diagrams on this page are from [CR05].
The six component curves of the diagram, overlaid on a grayed-out version of the entire diagram:
This is a 7-Venn diagram formed entirely from curves drawn from axis-aligned edges. Like the above it is minimum-area and infinitely intersecting.
The seven component curves:
These libraries aim to convey set relationships:
Some other relevant libraries I've used or studied:
Dual / Autodiff libraries:
- https://github.com/itt-ustutt/num-dual (r/rust)
- https://crates.io/crates/hyperdual/
- https://github.com/elrnv/autodiff
- https://github.com/djmaxus/autodj
- https://github.com/raskr/rust-autograd includes reverse-mode
https://docs.rs/dual_num/latest/dual_num/(archived)- https://crates.io/crates/fwd_ad 3yrs stale
- https://gist.github.com/emilk/c027311e5d0e8b69953c83a3ec283b74
- https://docs.rs/roots/latest/roots/ real roots only
quartic.js (last release 2008) Core code is from a web solver written by David Binner