-
Notifications
You must be signed in to change notification settings - Fork 8
/
make-tsv.py
executable file
·45 lines (33 loc) · 1.25 KB
/
make-tsv.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#!/usr/bin/env python3
"""
Script to produce a TSV file for a release of CILI.
The mappings to the Princeton WordNet generally don't need to be
released regularly as they are unlikely to change and are already
included in WN-LMF releases of the PWN, so this script reduces the
ili.ttl file to a two-column tab-separated-value file containing only
the ILI inventory and their definitions. This assumes that every ILI
has a definition, which is true by design. The resulting .tsv file is
less than half the size of the .ttl file when uncompressed, but
roughly the same size when compressed. TSV is generally much faster to
parse, however, and doesn't require an RDF library, so it is more
appealing for downstream applications.
Requirements:
- Python 3.6+
- rdflib
Usage:
python3 make-tsv.py > cili.tsv
"""
import sys
from rdflib import Graph
from rdflib.namespace import SKOS
g = Graph()
g.parse("ili.ttl", format='ttl')
# pair each ILI (ignoring the URL part) with its definition
data = [(subj.rpartition('/')[2], obj)
for subj, obj
in g.subject_objects(predicate=SKOS.definition)]
# sort by ILI number
data.sort(key=lambda pair: int(pair[0].lstrip('i')))
print('ILI\tDefinition')
for ili, definition in data:
print(f'{ili}\t{definition}')