Filter Expressions

DataGrid filter expressions use Python syntax for selecting matching rows. To use the value of a column in the expression, enclosing the column name like {"Column Name"}. Some examples:

To select all of the rows that have a fitness score less than 0.1, given that you have a column named "Fitness":

{"fitness"} < 0.1

Note that the column name is case-insensitive (i.e., you can use any case of letters inside quotes). Column values can be used any where in the filter expression. For example, if you wanted to select all of the rows where column "Score 1" was greater than or equal to "Score 2", you would write:

{"score 1"} >= {"score 2"}

You can use any of Python's operators, including:

< - less than
<= - less than or equal
> - greater than
>= - greater than or equal
== - equal to
!= - not equal to
is - is the same (e.g., is None)
is not - is not the same (e.g. is not None)
+ - addition
- - subtraction
* - multiplication
/ - division
// - integer division
** - raise to a power
not - flip the boolean value
in - is value in a list of values

Also you can use Python's comparison operator chaining. That is, any of the above operators can be used in a shorthand way as follows:

{"column a"} < {"column b"} < {"column c"}

which is shorthand for:

{"column a"} < {"column b"} and {"column b"} < {"column c"}

Use is None and is not None for selecting rows that have null values. Otherwise, null values are ignored in most other uses.

You can combine comparisons using Python's and and or (use parentheses to force evaluation order different from Python's defaults):

(({"loss"} - {"base line"}) < 0.5) or ({"loss"} > 0.95)

JSON attribute access

For JSON and metdata columns, you can use the dot operator to access values:

{"Image"}.extension == "jpg"

or nested values:

{"Image"}.labels.dog > 2

The dotted-name path following a column of JSON data deviates from Python semantics. In addition, the path can only be use for nested dictionaries. If the JSON has a list, then you will not be able to use the dotted-name syntax. However, you can use Python's list comprehension, like these examples:

any([x["label"] == 'dog' for x in {"Image"}.overlays]) - images with dogs
all([x["label"] == 'person' for x in {"Image"}.overlays]) - images with only people (no other labels)
any([x["label"] in ["car", "bicycle"] for x in {"Image"}.overlays]) - images with cars or bicycles
any([x["label"] == "cat" or x["label"] == "dog" for x in {"Image"}.overlays]) - images with dogs or cats
any([x["label"] == "cat" for x in {"Image"}.overlays]) and any([x["label"] == "dog" for x in {"Image"}.overlays]) - images with dogs and cats
any([x["score"] > 0.999 for x in {"Image"}.overlays]) - images with an annotation score greater than 0.999

List comprehension also deviates slightly from standard Python semantics:

[item for item in LIST] - same as Python (item is each element in the list)
[item for item in DICT] - item is the DICT

Note that you will need to wrap the list comprehension in either any() or all(). You can also use flatten() around nested list comprehensions.

See below for more information on contains() and other string and JSON methods.

Note that any mention of a column that contains an asset type (e.g., an image) will automatically reference its metadata.

Python String and JSON methods

These are mostly used with column values:

STRING in {"Column Name"}
{"Column Name"}.endswith(STRING)
{"Column Name"}.startswith(STRING)
{"Column Name"}.strip()
{"Column Name"}.lstrip()
{"Column Name"}.rstrip()
{"Column Name"}.upper()
{"Column Name"}.lower()
{"Column Name"}.split(DELIM[, MAXSPLITS])

Python if/else expression

You can use Python's if/else expression to return one value or another.

("low" if {"loss"} < 0.5 else "high") == "low"

Python Builtin Functions

You can use any of these Python builtin functions:

abs() - absolute value
round() - rounds to int
max(v1, v2, ...) or max([v1, v2, ...]) - maximum of list of values
min(v1, v2, ...) or min([v1, v2, ...]) - minimum of list of values
len() - length of item

Aggregate Functions

You can use the following aggregate functions. Note that these are more expensive to compute, as they require computing on all rows.

AVG({"Column Name"})
MAX({"Column Name"})
MIN({"Column Name"})
SUM({"Column Name"})
TOTAL({"Column Name"})
COUNT({"Column Name"})
STDEV({"Column Name"})

Examples:

Find all rows that have a loss value less than the average:

{"loss"} < AVG({"loss"})

Python Library Functions and Values

Functions from Python's random library:

random.random()
random.randint()

Note that random values are regenerated on each call, and thus only have limited use. That means that every time the DataGrid is accessed, the values will change. This would result in bizarre results if you grouped or sorted by a random value.

Functions and values from Python's math library:

math.pi
math.sqrt()
math.acos()
math.acosh()
math.asin()
math.asinh()
math.atan()
math.atan2()
math.atanh()
math.ceil()
math.cos()
math.cosh()
math.degrees()
math.exp()
math.floor()
math.log()
math.log10()
math.log2()
math.radians()
math.sin()
math.sinh()
math.tan()
math.tanh()
math.trunc()

Functions and values from Python's datetime library:

datetime.date(YEAR, MONTH, DAY)
datetime.datetime(YEAR, MONTH, DAY[, HOUR, MINUTE, SECOND])

Kangas DataGrid is completely open source; sponsored by Comet ML

Home
- User Guides
  - Installation - installing kangas
  - Reading data - importing data
  - Constructing DataGrids - building from scratch
  - Exploring data - exploration and analysis
  - Examples - scripts and notebooks
- Kangas Command-Line Interface
- Kangas Python API
  - kangas - top-level functions
  - DataGrid - DataGrid object and methods
  - Image - Image object and methods
  - Embedding - Embedding object and methods
  - Tensor - Tensor object and methods
- Integrations - with Hugging Face and Comet
- User Interface
  - Filter expressions - filter syntax
  - Cell Types
    - Boolean
    - Datetime
    - Embedding
    - Float
    - Image
    - Integer
    - JSON
    - Tensor
    - Text
    - Vector
- FAQ - Frequently Asked Questions
- Under the Hood
  - Security - issues related to security
  - Development - setting up a development environment
  - Roadmap - plans and known issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly