Skip to content

erum-data-idt/pd4ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Physics Data for Machine Learning (pd4ml)

This repository contains datasets and model for machine learning from the publication "Shared Data and Algorithms for Deep Learning in Fundamental Physics" (arXiv:2107.00656)

You can install this package as a python module with pip via:

pip install git+https://github.com/erum-data-idt/pd4ml

# or just git clone & 'pip install .' in this folder

The essential function is the load function to load the training and testing datasets. The datasets features "X" are returned as a list of numpy arrays. The labels are returend directly as a numpy array.

from pd4ml import Spinodal   # or any other dataset (see below) 

# loading training data into RAM (downloads dataset first time)
X_train, y_train  = Spinodal.load('train', path='./datasets')

# loading test data into RAM (downloads dataset first time)
X_test, y_test = Spinodal.load('test', path = './datasets')

Here a subfolder ./datasets is created. The datasets take up a total disk space of about 2.4 GB. For loading the training datasets a free RAM of at at least 5 GB is necessary (depending on the dataset).

Included datasets at the moment with the tags:

1: TopTagging, 2: Spinodal, 3: EOSL, 4: Airshower, 5: Belle

An description of the datasets can be printed via the function:

Spinodal.print_description()

Show all available datasets:

import pd4ml

for dataset in pd4ml.Dataset.datasets_register:
    print(dataset.name)

An additionally load_data function performs some basic preprocessing steps as well as allows the return of an adjecancy matrix:

from pd4ml import Spinodal   # or any other dataset
x_train, y_train = Spinodal.load_data('train', path = './datasets', graph = True)

x_train is dictionary with the contents features and adj_matrix. If no adjecancy matrix is required, one may set graph = False.

Some example plots can be found in the notebooks in the example folder.


Creating a model:

In the folder models multiple model implementations can be found. Each can be imported in the main.py script and run on the specified datasets. If you'd like to contribute a model, feel free to implement it using the template.py.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages