The experiment is based on the Yahoo! Learning to Rank Challenge data version 2.0.
Go to the Yahoo! Webscope website, and download the C14B - Yahoo! Learn to Rank Challenge version 2.0 data from there.
By default, the data should be fully decompressed and placed under
data/Webscope_C14B
(including the file ltrc_yahoo.tgz
). To build the data,
simply do:
make -C data
or when alternative data path is used:
make SRC=/path/to/Webscope_C14B -C data
A number of files under data
will be used in the experiment, including
training/validation/test data (set1.*.npz
), qrels (qrels.set1
), and
costs/importance files (costs.txt
and importance.txt
).
To run the baseline tree models:
bash GBDT-BL.sh
bash GBRT-BL.sh
bash LambdaMART-BL.sh
The proposed linear cascade models use randomized search to obtain a good cascade configuration. GNU Parallel is used throughout this experiments to speed up the search. To run the linear cascades using 12 parallel processes:
bash LM_C.sh ./LM_C 12
bash LM_E.sh ./LM_E 12
bash LM_F.sh ./LM_F 12
Results and models will be generated in separate directories (e.g., LM_C
) to prevent clutter.
To train tree-based cascades based on the feature allocated in some linear cascade model
(replace model.foo
and the like with actual filenames):
bash TreeCascade.sh ./TreeCascade ./LM_C/models/model.foo
bash TreeCascade.sh ./TreeCascade ./LM_E/models/model.bar
bash TreeCascade.sh ./TreeCascade ./LM_F/models/model.baz
This script will train all 3 tree cascade models using the best tree params found in the
respective baselines, and store them under TreeCascade
.