-
Notifications
You must be signed in to change notification settings - Fork 11
Filters
The most-common way of working with HXL-tagged datasets in libhxl is through the use of filters. A filter is a mini-program that performs a single operation on incoming HXL data, then passes it on, possibly to other filters. libhxl supports the following filters:
- Add columns filter
- Append datasets filter
- Clean data filter
- Count rows filter
- Cut columns filter
- Deduplicate rows filter
- Expand lists filter
- Merge columns filter
- Rename columns filter
- Replace data filter
- Select rows filter
- Sort rows filter
Filters often work in chains. For example, the following command-line sequence selects rows where #org
is "Red Cross", counts the number of rows for each #adm1
, then renames the generic #meta+count
column to #output+activities
(assuming that each row represents an activity):
hxlselect -q 'org=Red Cross' \
| hxlcount -t adm1 \
| hxlrename -r 'meta+count:output+activities'
Here is the same sequence inside a Python program:
source = hxl.data(url) \
.count('adm1') \
.rename_columns('meta+count:output+activities')
The HXL Proxy is a web application that lets you define these filter chains in your browser, then apply them to any online HXL dataset.
Standard: http://hxlstandard.org | Mailing list: hxlproject@googlegroups.com