-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase Database performance #260
Comments
|
Restarting the application was helpful. At least the experiment is shown, now. However, this is not a long term solution 😉 |
I have an idea in mind how the new DB could look like: -- table for experiments remains as it is
CREATE TABLE IF NOT EXISTS Experiments (
id VARCHAR(300) NOT NULL,
taskId int NOT NULL FOREIGN KEY REFERENCES ExperimentTasks(id),
PRIMARY KEY (id, taskId)
);
-- table for experiment tasks will be shortened. annotatorName has been renamed to systemName
CREATE TABLE IF NOT EXISTS ExperimentTasks (
id int GENERATED BY DEFAULT AS IDENTITY (START WITH 1 INCREMENT BY 1) PRIMARY KEY,
experimentType VARCHAR(20),
matching VARCHAR(50),
systemName VARCHAR(100),
datasetName VARCHAR(100),
state int,
lastChanged TIMESTAMP,
version VARCHAR(20)
);
-- index on experiment tasks won't be changed (unless somebody has an idea how to improve it ;) )
DROP INDEX IF EXISTS ExperimentTaskConfig;
CREATE INDEX ExperimentTaskConfig ON ExperimentTasks (matching,experimentType,annotatorName,datasetName);
-- ExperimentTasks_Version table will be dropped
-- ExperimentTasks_AdditionalResults will be renamed to ExperimentTasks_DoubleResults
CREATE TABLE IF NOT EXISTS ExperimentTasks_DoubleResults (
resultId int NOT NULL FOREIGN KEY REFERENCES ResultNames(id),
taskId int NOT NULL FOREIGN KEY REFERENCES ExperimentTasks(id),
value double,
PRIMARY KEY (resultId, taskId)
);
-- New table added for int results (e.g., number of errors)
CREATE TABLE IF NOT EXISTS ExperimentTasks_IntResults (
resultId int NOT NULL FOREIGN KEY REFERENCES ResultNames(id),
taskId int NOT NULL FOREIGN KEY REFERENCES ExperimentTasks(id),
value int,
PRIMARY KEY (resultId, taskId)
);
-- New table added for blob results (e.g., ROC)
CREATE TABLE IF NOT EXISTS ExperimentTasks_BlobResults (
resultId int NOT NULL FOREIGN KEY REFERENCES ResultNames(id),
taskId int NOT NULL FOREIGN KEY REFERENCES ExperimentTasks(id),
value BLOB,
PRIMARY KEY (resultId, taskId)
);
-- New table added for mapping from resultId to resultName (optional but would make the solution cleaner)
CREATE TABLE IF NOT EXISTS ResultNames (
id int GENERATED BY DEFAULT AS IDENTITY (START WITH 1 INCREMENT BY 1) PRIMARY KEY,
name VARCHAR(50)
);
-- SubTask table remains the same
CREATE TABLE IF NOT EXISTS ExperimentTasks_SubTasks (
taskId int NOT NULL FOREIGN KEY REFERENCES ExperimentTasks(id),
subTaskId int NOT NULL FOREIGN KEY REFERENCES ExperimentTasks(id),
PRIMARY KEY (taskId, subTaskId)
); With the solution above we have a clear separation between the data every experiment task has and the results (which can vary between the different types of experiments). Additionally, we could query all the results for a given experiment (e.g., SELECT type, name, value
FROM (
SELECT 'double' AS type, name, value FROM ExperimentTasks_DoubleResults WHERE taskId=123
UNION
SELECT 'int' AS type, name, value FROM ExperimentTasks_IntResults WHERE taskId=123
UNION
SELECT 'blob' AS type, name, value FROM ExperimentTasks_BlobResults WHERE taskId=123
) JOIN ResultNames ON resultId and the program code use the Opinions? Ideas? @RicardoUsbeck ? |
I very much like that solution.
|
True. Following the HSQLDB documentation the database would create indexes for foreign keys. I updated the schema accordingly. I think the indexes are simply missing at the moment which causes these huge delays... 🤔 |
Can this be closed? |
Documentation: |
Description
The current structure of the database leads to a very slow performance, e.g., some experiments can not even be loaded anymore (http://gerbil.aksw.org/gerbil/experiment?id=201603140002). We need to increase the performance by restructuring the database and make it more flexible for future additions of data.
This should take #79 into account.
Solution
The text was updated successfully, but these errors were encountered: