End-to-End MultiModal Machine Learning on Knowledge Graphs (MMLKG)

This packages provides multimodal node classification and link prediction for RDF knowledge graphs, by feeding literal nodes to modality-specific neural encoders, of which the resulting embeddings are used as input for a neural network (node classification) or translation model (link prediction). By default, the network is a simple two-layer MLP, whereas the translation model consists of DistMult plus LiteralE.

The purpose of this package is to provide baselines for the MR-GCN.

Getting Started

To install, clone the repository and run:

pip install .

Once installed, we must first prepare a dataset by calling generateInput, which expects graphs in HDT format. Use rdf2dt if your graphs are in another serialization format.

For node classification, we need the context as HDT, and the splits as CSVs with the entity IRIs and corresponding classes in the first and second column, respectively:

python generateInput.py -d ./mydata/ -c context.hdt -ts train.csv -ws test.csv -vs valid.csv

For link prediction, we need the three splits as HDT files:

python generateInput.py -d ./mydata/ -ts train_lp.hdt -ws test_lp.hdt -vs valid_lp.hdt

Running the above will generate our dataset as easy-to-use CSV files, as proposed by KGbench. The output will be stored in ./mydata/ See the example dataset in ./test/.

Run a task on the prepared dataset by running:

python node_classification.py -i ./mydata/ -c config.json --num_epoch 50 --lr 0.001

or

python link_prediction.py -i ./mydata/ -c config.json --num_epoch 50 --lr 0.001

The above call will require the preprocessing of the input data on every new run. Alternatively, the mkdataset helper script can be used to create a HDF5 file of the preprocessed data, and which can be used as input instead.

To generate the HDF5 dataset file, run:

python mkdataset.py -i ./mydata/ -c config.json -o ./mydata/

Please see the help functions (--help) of these scripts for more information and more options.

Note that you can set encoder-specific options in config.json. To do so, provide the file using the -c flag.

Supported data types

The following data types are supported and automatically encoded if they come with a well-defined data type declaration:

Booleans:

- xsd:boolean

Numbers:

- xsd:decimal
- xsd:double
- xsd:float
- xsd:integer
- xsd:long
- xsd:int
- xsd:short
- xsd:byte

- xsd:nonNegativeInteger
- xsd:nonPositiveInteger
- xsd:negativeInteger
- xsd:positiveInteger

- xsd:unsignedLong
- xsd:unsignedInt
- xsd:unsignedShort
- xsd:unsignedByte

Strings:

- xsd:string
- xsd:normalizedString
- xsd:token
- xsd:language
- xsd:Name
- xsd:NCName
- xsd:ENTITY
- xsd:ID
- xsd:IDREF
- xsd:NMTOKEN
- xsd:anyURI

Time/date:

- xsd:date
- xsd:dateTime
- xsd:gYear

Spatial:

- ogc:wktLiteral

Images:

- kgbench:base64Image (http://kgbench.info/dt)

Note that images are expected to be formatted as binary-encoded strings and included in the graph.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End MultiModal Machine Learning on Knowledge Graphs (MMLKG)

Getting Started

Supported data types

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
mmlkg		mmlkg
test		test
LICENSE		LICENSE
README.md		README.md
config.json		config.json
generateInput.py		generateInput.py
link_prediction.py		link_prediction.py
mkdataset.py		mkdataset.py
node_classification.py		node_classification.py
setup.py		setup.py

License

wxwilcke/mmlkg

Folders and files

Latest commit

History

Repository files navigation

End-to-End MultiModal Machine Learning on Knowledge Graphs (MMLKG)

Getting Started

Supported data types

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages