-
Notifications
You must be signed in to change notification settings - Fork 26
Kyrix‐S API Reference
Kyrix-S is an extension to the core Kyrix system, providing a simple declarative grammar for authoring large-scale zooming-based scatterplots, which we call Scalable Scatterplot Visualizations (or SSV). Kyrix-S's declarative grammar is a high-level concise grammar built on top of the lower-level Kyrix grammar, which enables authoring of a complex SSV in tens of lines of JSON.
The above GIF shows an SSV of NBA basketball games in the season of 2017~2018. The horizontal/vertical axis is the score of the home/away team. Each circle represents a cluster of games, with the number inside it being the cluster size. As one zooms in, the circles get collapsed into a bunch of smaller circles. When you hover over a circle, you see three games between the highest-ranked teams in that cluster, as well as a polygon indicating the boundary of the cluster.
To author this SSV using Kyrix-S, you only need to write the following JSON specification:
{
data: {
db: "nba",
query: “SELECT * FROM games"
},
layout: {
x: {
field: "home_score",
extent: [69, 149]
},
y: {
field: "away_score",
extent: [69, 148]
},
z: {
field: "agg_rank",
order: "asc"
}
},
marks: {
cluster: {
mode: "circle"
},
hover: {
rankList: {
mode: "tabular",
fields: ["home_team", "away_team", "home_score", "away_score"],
topk: 3
},
boundary: "convexhull"
}
},
config: {
axis: true
}
};
Run the following commands in the root folder to bring up this application after the docker containers are started:
> cd compiler/examples/template-api-examples
> cp ../../../docker-scripts/compile.sh compile.sh
> chmod +x compile.sh
> sudo ./compile.sh SSV_circle.js
More examples can be found here.
As an extension, Kyrix-S interoperates with Kyrix through the Project.addSSV call. By passing a JSON specification of an SSV into Project.addSSV
, you can add one SSV into an encompassing Kyrix application, either as a new set of canvases, or a set of new layers of existing canvases.
Project.addSSV creates two new layers on an existing canvas, or a new canvas. These two layers are respectively:
- Layer 0. The scatterplot layer, which consists of objects. Each object is bound with columns specified in query.
- Layer 1. A static legend layer with no data bindings.
We document how to specify an SSV in JSON down below. The complete JSON schema is here.
-
data
: defines the data being visualized, required.-
query
: a SQL query to fetch data for the SSV, required. Each record in the query result should correspond to one base object in the scatterplot, regardless of whether the SSV shows aggregated information. For example, if the SSV shows aggregated circles of NBA games, the query should select NBA games. Do not worry about the aggregation, which is handled by Kyrix-S. The query should not containLIMIT
. -
db
: the database in which data.query should be run, required. -
columnNames
: an optional array specifying the field names for the query results. This is used in specifying layout-related information, aggregation or tooltips. This field is optional. If not specified, Kyrix-S will use the column names returned by the database.
-
-
layout
: controls the placement of the marks in the multi-scale zooming space, required.-
x
: defines the horizontal axis of the SSV, required.-
field
: a quantitative field in the query result that maps to the horizontal axis of the SSV, required. This should be one of data.columnNames (if specified), or one of the column names returned in the query results by the database. -
extent
: an optional two-number array[a, b]
indicating the visible range of the field.a
can be larger thanb
. If not specified, min/max value of layout.x.field will be used as the visible range. This field is required when config.axis is true. This field should not be present when layout.geo is present.
-
-
y
: defines the vertical axis of the SSV.-
field
: a quantitative field in the query result that maps to the vertical axis of the SSV. This should be one of data.columnNames (if specified), or one of the column names returned in the query results by the database. -
extent
: an optional two-number array[a, b]
indicating the visible range of the field.a
can be larger thanb
. If not specified, min/max value of layout.x.field will be used as the visible range. This field is required when config.axis is true. This field should not be present when layout.geo is present.
-
-
z
: defines how objects are distributed across zoom levels, required.-
field
: a quantitative field in the query result which indicates that distribution of objects across zoom levels is based on ranking of this field, required. -
order
: can be eitherasc
ordesc
, indicating that the smaller/largerfield
is, the more likely that an object will appear on top zoom levels, required.
-
-
overlap
: a number between 0 and 1 indicating how much overlap between objects is desired, with 0 meaning arbitrary overlap is allowed and 1 meaning no overlap is allowed. Note that this only sets the upper bound on the amount of overlap. Kyrix-S will space the objects more if visual density becomes too high in some regions. This field is optional, defaults to 0 when mark.cluster.mode isheatmap
, and defaults to 1 otherwise. -
geo
: defines the initial viewport location for geographic dimensions. If this field is present, Kyrix-S assumes that x is longitude and y is latitude, and adds to the database table specified in data.query two columnskyrix_geo_x
andkyrix_geo_y
representing the screen coordinates on the top zoom level. Therefore, if you'd like Kyrix-S to help you transform lat/lon into screen coordinates, make sure the query has only one table in theFROM
clause. Lastly, to make sure Kyrix-S successfully transforms the coordinates, you need to make sure either data.db equalskyrix
, or run the following command to install d3 in your database:sudo docker exec -it kyrix_db_1 su - postgres -c "./install-d3.sh [dbname]"
-
center
: a two-number array specifying the center of the initial geo viewport, required when layout.geo is present. The first number is latitude. The second number is longitude. -
level
: an integer between 0 and 19 specifying the zoom level of the initial geo viewport, required when layout.geo is present. On zoom level 0, the entire world is projected onto a 256*256 tile. To specify a US-centric viewport, uselayout.geo.center=[39.5, -98.5]
andlayout.geo.level=5
.
-
Note that null values in layout.x.field, layout.y.field and layout.z.field will be regarded as 0. So make sure the missing values in the data are properly imputed.
-
-
marks
: defines the visual representation of one or more objects, and is consisted of two components,cluster
andhover
, required.-
cluster
: cluster marks are static marks rendering one or a cluster of objects, required.-
mode
: defines the type of visual marks, required. There're six types in total: circle, heatmap, radar, pie, dot and custom. The last modecustom
requires a custom renderer (see marks.cluster.custom), and the maximum width/height of an object (see marks.cluster.config.bboxW). -
aggregate
: defines the aggregation information needed to render a cluster of objects, and is consisted of an array ofmeasures
anddimensions
, which together forming a SQL aggregation query.-
measures
: defines what aggregation statistics to be calculated and on what fields, and is optional. If not specified, by default Kyrix-S computescount(*)
for each cluster of objects. If specified, it should be an array with each element being an object with the following fields:-
field
: name of the field on which this aggregation statistic is calculated, which should be either*
when specifyingcount
, or a quantitative field from the query results, required. -
function
: the aggregation statistic to be calculated, and can be one ofcount
,sum
,avg
,min
,max
andsqrsum
, required. -
extent
: an optional two-number array specifying the range of the calculated aggregation statistic, required forradar
.
In the case where you want to specify the same function for many fields, you can instead specify this component as an object, with
fields
being an array of field names (required),function
being the aggregation statistic (required), andextent
being the range for all measures. See here for an example. For modescircle
andheatmap
at most one measure can be specified. For modepie
there needs to be exactly one measure specified. For modedot
no measure is allowed. -
-
dimensions
: defines how objects are grouped when calculating aggregation statistics, and is optional. If not specified, no grouping is performed. If specified, it should be an array with each element being an object with the following fields:-
field
: name of the field of a grouping column, which should be a categorical field from the query results, required. -
domain
: an array of strings indicating all possible values offield
, required.
For modes
circle
,heatmap
,radar
anddot
, grouping is not supported. So you do not need to specifydimensions
for those modes. -
-
-
custom
: a rendering functionf(svg, data, args)
for the custom mode which converts a set of data itemsdata
to visual marks and attaches them tosvg
, required when marks.cluster.mode iscustom
. Each data item indata
is the representative object of a cluster of objects, with an additional fieldclusterAgg
containing aggregation statistics of this cluster. To access the size of the cluster, you can writed.clusterAgg["count(*)"]
whered
is the data item. If there is grouping, you can writed.clusterAgg["medical_male_avg(salary)"]
, which is the average salary of male employees in the medical department in this cluster.args
is a dictionary containing lots of useful information about the encompassing Kyrix application, similar to the input of a Kyrix layer renderer. An example. -
config
: a set of optional parameters for customizing the looks of the cluster marks.-
bboxW
: the width of the bounding box of all cluster marks. You need to specify this if and only if you are using thecustom
mode. -
bboxH
: the height of the bounding box of all cluster marks. You need to specify this if and only if you are using thecustom
mode. -
circleMinSize
: the minimum size of the circles in thecircle
mode. Defaults to 30. -
circleMaxSize
: the maximum size of the circles in thecircle
mode. Defaults to 70. -
heatmapRadius
: the radius of an object in theheatmap
mode. Defaults to 80. -
heatmapOpacity
: the opacity of heatmaps in theheatmap
mode, between 0 and 1. Defaults to 1. -
radarRadius
: the radius of a radar in theradar
mode. Defaults to 80. -
radarTicks
: the number of ticks on an axis of a radar in theradar
mode. Defaults to 5. -
pieInnerRadius
: the inner radius of a pie in thepie
mode. Defaults to 1 (pixel). -
pieOuterRadius
: the outer radius of a pie in thepie
mode. Defaults to 80. -
pieCornerRadius
: the corner radius of a pie in thepie
mode. Defaults to 5. -
pieLegendTitle
: title of the legend for pie-chart based SSVs. Defaults to"legend"
. -
pieLegendDomain
: domain of the legends for pie-chart based SSVs. Should be specified as an array of strings. If not specified, Kyrix-S will use all distinct combinations of domains as the domain for the legends. -
padAngle
: the amount of padding between pies in thepie
mode. Defaults to 0.05 (radians). -
dotSizeColumn
: a string which is the name of the field in the data that maps to the size of the dots in thedot
mode. If not specified, all dots have the same size (dotMaxSize). -
dotSizeDomain
: a two-number array which indicates the range of dotSizeColumn. Must be present when dotSizeColumn is present. -
dotSizeLegendTitle
: the title of the size legend for thedot
mode. Defaults toPoint Size
. -
dotMaxSize
: the maximum radius of the dots in thedot
mode. Defaults to 15 pixels. -
dotColorColumn
: a string which is the name of the field in the data that maps to the color of the dots in thedot
mode. If not specified, all dots have the same color (#38c2e0
). -
dotColorDomain
: an array of values which indicates the distinct values of dotColorColumn. Must be present when dotColorColumn is present. -
dotColorLegendTitle
: the title of the color legend for thedot
mode. Defaults toPoint Color
.
-
-
-
hover
: hover marks are shown when the user mouses over a cluster mark. This component is optional.-
rankList
: hover marks that show representative objects from a cluster. The ranking of objects is defined in layout.z. Cannot be specified together with marks.hover.tooltip.-
mode
: eithertabular
which displays representative objects in a table, orcustom
, which is used to customize how objects are rendered, required. Forcustom
,bboxW
andbboxH
must be specified in marks.hover.rankList.config indicating the size of the bounding box of an object. -
topk
: an integer greater than 0, indicating how many representative objects are displayed upon hovering. This field is optional and defaults to 1. -
fields
: an array of fields that will be displayed in thetabular
mode, required when marks.hover.rankList.mode istabular
. -
custom
: the custom renderer for thecustom
mode, required when marks.hover.rankList.mode iscustom
. -
orientation
: the direction in which representative objects are positioned, could be eitherhorizontal
orvertical
. This field is optional and defaults tovertical
. -
config
: a set of optional parameters for customizing the looks of the hover marks.
-
-
tooltip
: shows simple tooltips about a cluster, instead of a ranked list of objects. Cannot be specified together with marks.hover.rankList.-
columns
: an array of fields of the representative object to be displayed, required. The fields should be present in data.columnNames if it is specified, or in the result returned by data.query. -
aliases
: an array of aliases for the fields specified in columns. Should have the same number of elements as columns. This field is optional and defaults to marks.hover.tooltip.columns.
-
-
boundary
: hover marks that show the boundary of clusters. Can be eitherbbox
, which shows the boundary as the boundingbox, orconvexhull
, which shows a polygonal enclosure of the cluster. -
selector
: A D3/CSS selector string which helps identify what elements rendered by the custom cluster renderer are hoverable. Note that to enable hovering for the custom cluster mode, your custom cluster renderer should add ag
variable which is an SVG<g>
element and render everything as its direct child. What Kyrix-S does behind the scenes involves calling something likeg.selectAll("YOUR_SELECTOR").on("mouseover" ...)
. This field is optional and defaults to*
.
-
-
-
config
: a set of optional global parameters for customizing the SSV.-
axis
: a boolean representing whether axes are displayed. Defaults tofalse
. -
xAxisTitle
: the title of the x axis. Defaults to layout.x.field. -
yAxisTitle
: the title of the y axis. Defaults to layout.y.field. -
numLevels
: number of zoom levels in the SSV. Defaults to 10. -
topLevelWidth
: width of the top level. Defaults to 1000. -
topLevelHeight
: height of the top level. Defaults to 1000. -
zoomFactor
: zoom factor between adjacent levels. Defaults to 2. -
map
: a boolean indicating whether an OpenStreetMap background is rendered. Defaults to true if layout.geo is present and false otherwise. -
numberFormat
: a D3 format specifier controlling how numbers are displayed in the SSV. Defaults to~s
(decimal notation with an SI prefix, rounded to significant digits, trimming trailing zeros).
-
In the current release, Kyrix-S only works on a single node with sufficient main memory that can hold all data. To allocate memory to the kyrix container, run the following:
> sudo ./run-kyrix.sh --mavenopts -Xmx700m # allocate 700MB memory to the kyrix container
if not specified, the default memory allocated is 512MB. Generally, if the size of raw data is X, you'll need to allocate 10X memory to the kyrix container.
We do have a multi-node Kyrix-S that can scale to billions of objects. We are working on testing it more thoroughly and include it in a future release. Stay tuned!