DS4DM/DS4DM_Backend at master · AnLiGentile/DS4DM

History

Name		Name	Last commit message	Last commit date
parent directory ..
Units/Convertible		Units/Convertible
resources		resources
src/de/mannheim/uni		src/de/mannheim/uni
README.md		README.md
indexing.conf		indexing.conf
pom.xml		pom.xml

README.md

#Backend core

The Backend core encompasses two macro functionalities:

data indexing (TableIndex)
data searching (TableSearch)

##TableIndex We currently support the indexing of:

our normalized table format (as resulting from the preprocessing step)
csv tables
Dresden University proprietary format - for the sake of enabling result comparison with a similar research project Other formats will be supported adding conversion functionalities to the preprocessing component. The indexing process can be run offline as a batch process. There are currently three separate entry points for the supported formats (respectively for our standardised table format, for csv and for Dresden University format).

They all produce two index structures for the collection of tables:

an IndexOfHeaders, storing every item of the table headers
an IndexOfValues, stroring every cell of the tables

In the indexing step we perform data type detection for each column, using manually defined regular expressions to detect string, numeric value, date and link.

To generate an index starting from json tables (the output of the DS4DM_Preprocessing component) you can run:

java -jar ds4dm_backend-0.0.1-SNAPSHOT-jar-with-dependencies.jar indexing.conf test_indexing resources/json/articles web

Where:

indexing.conf is a configuration file where you can define the location of the generated indexes

[test_indexing] is a label for the process

resources/json/articles is the folder containing the json tables to index

##TableSearch The TableSearch component incorporates searching and integration facilities. TableSearch receives as input a query table, where the user indicates:

the subject column - the column in the table that contains entity names
a set of extraction attributes - that she wants to extend her table with

The TableSearch process:

retrieves a set of relevant tables from the connected indexes of tables. In the current prototype the search is a Keyword Based Search performed on the IndexOfHeaders.
identifies correspondences between relevant tables and the query table (currently implemented as a String matching process) at:
schema level (which columns in each retrieved table correspond to which columns in the query table)
instance level (which rows in each retrieved table correspond to which rows the query table)

As output TableSearch returns all the identified correspondences and a confidence score for each matching.

TableSearch fuctionalities are currently exposed as DS4DM_webservice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DS4DM_Backend

DS4DM_Backend

README.md

Files

DS4DM_Backend

Directory actions

More options

Directory actions

More options

Latest commit

History

DS4DM_Backend

Folders and files

parent directory

README.md