Check if the scripts table still suits our needs #190

atruskie · 2015-05-07T13:50:29Z

The scripts table represents a high level command and control of the custom job scripts that can be executed.

It should:

Have admin access only
Each row represents a type of analysis
For each row, the ability to store a, templateable command, that will directly translate to an executed command run by a job runner. Note: there will often be duplication of commands (since AP.exe uses very similar CLI args for most analyses) but this is not really an issue.
For each row, store a settings file. May be templateable? Can't see any use case for templating config at the moment
User friendly identifier, other relevant metadata
Discuss: whether versioning the scripts (with cyclic parent linking) is still a feature that is relevant to us
- Originally proposed when users could execute their own scripts. However, secure execution environments are not part of the custom jobs spec at this stage.

cofiem · 2015-05-10T12:08:37Z

current db structure:

create_table "scripts", force: :cascade do |t|
    t.string   "name",                       limit: 255,                                         null: false
    t.string   "description",                limit: 255
    t.text     "notes"
    t.string   "settings_file_file_name",    limit: 255
    t.string   "settings_file_content_type", limit: 255
    t.integer  "settings_file_file_size"
    t.datetime "settings_file_updated_at"
    t.string   "data_file_file_name",        limit: 255
    t.string   "data_file_content_type",     limit: 255
    t.integer  "data_file_file_size"
    t.datetime "data_file_updated_at"
    t.string   "analysis_identifier",        limit: 255,                                         null: false
    t.decimal  "version",                                precision: 4, scale: 2, default: 0.1,   null: false
    t.boolean  "verified",                                                       default: false
    t.integer  "updated_by_script_id"
    t.integer  "creator_id",                                                                     null: false
    t.datetime "created_at",                                                                     null: false
  end

atruskie · 2015-05-10T13:05:03Z

Questions:

What are the data_file fields for?

What is updated_by_script_id intended for?

What is the difference between analysis_identifier and name? I'm guessing machine vs human readable.

This table does not have the concept of a command to execute. That seems to be the only thing missing from the spec (not inc. all the extra stuff that is there).

cofiem · 2015-05-12T12:45:07Z

settings_file_* and data_file_* are currently fields for file uploads using the paperclip gem.

updated_by_script_id is the self reference. Scripts created from an existing script get the previous script's id set here.

yes, analysis_identifier is machine readable, name is human readable.

I think I'll leave verified and the script versioning there for now. I think it's best to remove possibility to upload files, unless we think user scripts are at all a possibility?

Todo:

add a command_template field
add a settings_template field
remove the settings_file_* and data_file_*?

atruskie · 2015-06-04T14:09:13Z

I was thinking about this earlier today.

The hierarchal relationship in this table was originally designed so that the scripts could be immutable.

It wasn't clear if this property was still useful - but now I think it is. Particularly so we can always see what script was executed - from a historical perspective - because that information is not saved in the jobs table at the moment. See #191

However, I think the hierarchical relationship is too complex - we've seen in the past that hierarchial relationships are hard to deal with in SQL.

So instead of hierarchical:

id:     1    ->  2 -> 3
parent: null ->  1 -> 2

We instead 'group' the versions together:

id:        1 ->  2 -> 3
script_id: 1 ->  1 -> 1

And another script (independently created):

id:        1 ->  2 -> 3
script_id: 2 ->  2 -> 2

Much simpler structure (where the hierarchical structure is cyclic, the grouped structure is flat) means it's much easier to query for the latest version of a script.

I'd also suggest making the script_id a uuid (Guid)... that we don't have to count for highest unique script_id to generate a new script tuple.

cofiem · 2015-06-04T23:42:19Z

Wouldn't this be better done with two tables?

create_table "scripts", force: :cascade do |t|
    t.string   "name",                       limit: 255,                                         null: false
    t.string   "machine_name",               limit: 255,                                         null: false
    t.text     "description",                limit: 255
    t.decimal  "current_version",                        precision: 4, scale: 2, default: 0.1,   null: false
    t.integer  "creator_id",                                                                     null: false
    t.datetime "created_at",                                                                     null: false
    t.integer  "updater_id",                                                                     null: false
    t.datetime "updated_at",                                                                     null: false
  end

An additional script_versions table could store the versions of the script. Entries in this table are not mutable - any changes result in a new entry and the version number incremented (the version could be set automatically). The current version is stored in the scripts table. This allows for using old versions of the script as the default. It could also be possible to specify a script version when creating a job (defaulting to the current_version). The version would be unique per script_id.

create_table "script_versions", force: :cascade do |t|
    t.integer  "script_id",                                                                      null: false
    t.text     "command_template"
    t.text     "settings_template"
    t.decimal  "version",                                precision: 4, scale: 2, default: 0.1,   null: false
    t.integer  "creator_id",                                                                     null: false
    t.datetime "created_at",                                                                     null: false
  end

atruskie · 2015-06-05T01:58:40Z

Perhaps... It was one table already so my solution kept it that way.

Architecturally I'd prefer one table, it seems simpler conceptually, and the sql seems simple.

If it were split into two tables, we certainly wouldn't need two API endpoints.

I'm not up to date on my normal form, but from the gut I think either form is acceptable.
There is some duplication in some of the scripts fields for the single form, but we know modifications to this table will be in the low hundreds... Not in the hundreds of thousands for sure

atruskie added enhancement architectural discuss labels May 7, 2015

atruskie added this to the Custom Jobs milestone May 7, 2015

atruskie mentioned this issue May 7, 2015

Custom job infrastructure checklist #188

Closed

5 tasks

cofiem mentioned this issue May 10, 2015

Tracking issue for implementing custom jobs #200

Closed

14 tasks

cofiem added the triage:medium label Jun 5, 2015

cofiem closed this as completed in c3513ac Sep 4, 2015

cofiem added a commit that referenced this issue Sep 5, 2015

scripts now versioned in groups, see #190

3d3e359

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if the scripts table still suits our needs #190

Check if the scripts table still suits our needs #190

atruskie commented May 7, 2015

cofiem commented May 10, 2015

atruskie commented May 10, 2015

cofiem commented May 12, 2015

atruskie commented Jun 4, 2015

cofiem commented Jun 4, 2015

atruskie commented Jun 5, 2015

Check if the scripts table still suits our needs #190

Check if the scripts table still suits our needs #190

Comments

atruskie commented May 7, 2015

cofiem commented May 10, 2015

atruskie commented May 10, 2015

cofiem commented May 12, 2015

atruskie commented Jun 4, 2015

cofiem commented Jun 4, 2015

atruskie commented Jun 5, 2015