Skip to content
David Megginson edited this page Jun 14, 2020 · 13 revisions

The most-common way of working with HXL-tagged datasets in libhxl is through the use of filters. A filter is a mini-program that performs a single operation on incoming HXL data, then passes it on, possibly to other filters. libhxl supports the following filters:

Filter chains

Filters often work in chains. For example, the following command-line sequence selects rows where #org is "Red Cross", counts the number of rows for each #adm1, then renames the generic #meta+count column to #output+activities (assuming that each row represents an activity):

hxlselect -q 'org=Red Cross' \
  | hxlcount -t adm1 \
  | hxlrename -r 'meta+count:output+activities'

Here is the same sequence inside a Python program:

source = hxl.data(url) \
    .count('adm1') \
    .rename_columns('meta+count:output+activities')

Online filters

The HXL Proxy is a web application that lets you define these filter chains in your browser, then apply them to any online HXL dataset.