This fork contains the original back end for semantic labelling, but the api has been changed. These changes have been introduced to allow common evaluation benchmark for various semantic typing approaches via serene-benchmark.
Automatically assign semantics to large data sets from heterogeneous sources based on their features using several Statistical and Machine Learning techniques.
- Java JRE: if not installed, download and install as described, e.g., here
- Elasticsearch: Download here.
- Pyspark:
Download Spark.
Extract, navigate to python dir and run
pip install -e .
- scikit-learn
- pandas
ElasticSearch
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.tar.gz
tar -xvf elasticsearch-5.1.1.tar.gz
cd elasticsearch-5.1.1/bin
./elasticsearch
You might need to increase virtual memory limits (on Linux it can be done):
sudo sysctl -w vm.max_map_count=262144
Spark
curl -L -O http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar -xvf spark-2.1.0-bin-hadoop2.7.tgz
cd spark-2.1.0-bin-hadoop2.7/python
pip install -e .
pip install py4j
Run
python server.py
By default the server will be started on port 8000.
The following are original API endpoints...
This has to be called the very first time the service is setup.
URL /ftu
Method: GET
Parameters: None
This adds a new semantic type and the corresponding column.
URL /semantic_type
Method: POST
Parameter | Description | Required |
---|---|---|
column | column data | Yes |
semantic_type | semantic type including domain and type | Yes |
Sample Payload:
"semantic_type": {
"domain": {
"uri": "http://erlangen-crm.org/current/E21_Person"
},
"type": {
"uri": "http://isi.edu/integration/karma/dev#classLink"
},
},
"column": {
"header": [...Rows in the column...]
}
URL /semantic_type/bulk
Method: POST
Parameter | Description | Required |
---|---|---|
columns | column data | Yes |
semantic_type | semantic type including domain and type | Yes |
Sample Payload:
"semantic_type": {
"domain": {
"uri": "http://erlangen-crm.org/current/E21_Person"
},
"type": {
"uri": "http://isi.edu/integration/karma/dev#classLink"
},
},
"columns": {
"header1": [...Rows in the column...],
"header2": [...Rows in the column...],
"header3": [...Rows in the column...],
"header4": [...Rows in the column...],
}
Deletes semantic type and all the corresponding column
URL /semantic_type
Method: DELETE
Parameter | Description | Required |
---|---|---|
semantic_type | semantic type including domain and type | Yes |
Sample Payload:
"semantic_type": {
"domain": {
"uri": "http://erlangen-crm.org/current/E21_Person"
},
"type": {
"uri": "http://isi.edu/integration/karma/dev#classLink"
},
}
Delete a column from a semantic type
URL /column
Method: DELETE
Parameter | Description | Required |
---|---|---|
column_name | Name of the column that has to be deleted | Yes |
semantic_type | semantic type including domain and type | Yes |
Sample Payload:
"semantic_type": {
"domain": {
"uri": "http://erlangen-crm.org/current/E21_Person"
},
"type": {
"uri": "http://isi.edu/integration/karma/dev#classLink"
},
},
"column_name": "header1"
}
Determine semantic type of a column
URL /column
Method: POST
Parameter | Description | Required |
---|---|---|
column | column data | Yes |
Sample Payload:
"column": {
"header": [...Rows in the column...]
}