Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if the scripts table still suits our needs #190

Closed
5 of 6 tasks
atruskie opened this issue May 7, 2015 · 6 comments
Closed
5 of 6 tasks

Check if the scripts table still suits our needs #190

atruskie opened this issue May 7, 2015 · 6 comments

Comments

@atruskie
Copy link
Member

atruskie commented May 7, 2015

The scripts table represents a high level command and control of the custom job scripts that can be executed.

It should:

  • Have admin access only
  • Each row represents a type of analysis
  • For each row, the ability to store a, templateable command, that will directly translate to an executed command run by a job runner. Note: there will often be duplication of commands (since AP.exe uses very similar CLI args for most analyses) but this is not really an issue.
  • For each row, store a settings file. May be templateable? Can't see any use case for templating config at the moment
  • User friendly identifier, other relevant metadata
  • Discuss: whether versioning the scripts (with cyclic parent linking) is still a feature that is relevant to us
    • Originally proposed when users could execute their own scripts. However, secure execution environments are not part of the custom jobs spec at this stage.
@cofiem
Copy link
Contributor

cofiem commented May 10, 2015

current db structure:

create_table "scripts", force: :cascade do |t|
    t.string   "name",                       limit: 255,                                         null: false
    t.string   "description",                limit: 255
    t.text     "notes"
    t.string   "settings_file_file_name",    limit: 255
    t.string   "settings_file_content_type", limit: 255
    t.integer  "settings_file_file_size"
    t.datetime "settings_file_updated_at"
    t.string   "data_file_file_name",        limit: 255
    t.string   "data_file_content_type",     limit: 255
    t.integer  "data_file_file_size"
    t.datetime "data_file_updated_at"
    t.string   "analysis_identifier",        limit: 255,                                         null: false
    t.decimal  "version",                                precision: 4, scale: 2, default: 0.1,   null: false
    t.boolean  "verified",                                                       default: false
    t.integer  "updated_by_script_id"
    t.integer  "creator_id",                                                                     null: false
    t.datetime "created_at",                                                                     null: false
  end

@atruskie
Copy link
Member Author

Questions:

What are the data_file fields for?

What is updated_by_script_id intended for?

What is the difference between analysis_identifier and name? I'm guessing machine vs human readable.

This table does not have the concept of a command to execute. That seems to be the only thing missing from the spec (not inc. all the extra stuff that is there).

@cofiem
Copy link
Contributor

cofiem commented May 12, 2015

settings_file_* and data_file_* are currently fields for file uploads using the paperclip gem.

updated_by_script_id is the self reference. Scripts created from an existing script get the previous script's id set here.

yes, analysis_identifier is machine readable, name is human readable.

I think I'll leave verified and the script versioning there for now. I think it's best to remove possibility to upload files, unless we think user scripts are at all a possibility?

Todo:

  • add a command_template field
  • add a settings_template field
  • remove the settings_file_* and data_file_*?

@atruskie
Copy link
Member Author

atruskie commented Jun 4, 2015

I was thinking about this earlier today.

The hierarchal relationship in this table was originally designed so that the scripts could be immutable.

It wasn't clear if this property was still useful - but now I think it is. Particularly so we can always see what script was executed - from a historical perspective - because that information is not saved in the jobs table at the moment. See #191

However, I think the hierarchical relationship is too complex - we've seen in the past that hierarchial relationships are hard to deal with in SQL.

So instead of hierarchical:

id:     1    ->  2 -> 3
parent: null ->  1 -> 2

We instead 'group' the versions together:

id:        1 ->  2 -> 3
script_id: 1 ->  1 -> 1

And another script (independently created):

id:        1 ->  2 -> 3
script_id: 2 ->  2 -> 2

Much simpler structure (where the hierarchical structure is cyclic, the grouped structure is flat) means it's much easier to query for the latest version of a script.

I'd also suggest making the script_id a uuid (Guid)... that we don't have to count for highest unique script_id to generate a new script tuple.

@cofiem
Copy link
Contributor

cofiem commented Jun 4, 2015

Wouldn't this be better done with two tables?

create_table "scripts", force: :cascade do |t|
    t.string   "name",                       limit: 255,                                         null: false
    t.string   "machine_name",               limit: 255,                                         null: false
    t.text     "description",                limit: 255
    t.decimal  "current_version",                        precision: 4, scale: 2, default: 0.1,   null: false
    t.integer  "creator_id",                                                                     null: false
    t.datetime "created_at",                                                                     null: false
    t.integer  "updater_id",                                                                     null: false
    t.datetime "updated_at",                                                                     null: false
  end

An additional script_versions table could store the versions of the script. Entries in this table are not mutable - any changes result in a new entry and the version number incremented (the version could be set automatically). The current version is stored in the scripts table. This allows for using old versions of the script as the default. It could also be possible to specify a script version when creating a job (defaulting to the current_version). The version would be unique per script_id.

create_table "script_versions", force: :cascade do |t|
    t.integer  "script_id",                                                                      null: false
    t.text     "command_template"
    t.text     "settings_template"
    t.decimal  "version",                                precision: 4, scale: 2, default: 0.1,   null: false
    t.integer  "creator_id",                                                                     null: false
    t.datetime "created_at",                                                                     null: false
  end

@atruskie
Copy link
Member Author

atruskie commented Jun 5, 2015

Perhaps... It was one table already so my solution kept it that way.

Architecturally I'd prefer one table, it seems simpler conceptually, and the sql seems simple.

If it were split into two tables, we certainly wouldn't need two API endpoints.

I'm not up to date on my normal form, but from the gut I think either form is acceptable.
There is some duplication in some of the scripts fields for the single form, but we know modifications to this table will be in the low hundreds... Not in the hundreds of thousands for sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants