Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding table_prefix option to prevent database table collisions for auto-generated tables #495

Merged
merged 10 commits into from
May 20, 2021
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Added
- Separate config settings for mturk qualifications for `live` and `sandbox` modes (#505)
- Added configuration option for changing autogenerated dashboard-related tables to be unique per experiment -- for use when a single database is shared across multiple experiments (#495)

## [3.2.0]
### Added
Expand Down Expand Up @@ -35,7 +36,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Migrate from Travis CI to Github Actions (#500)
- API now uses a custom error handler to pass sometimes-gory exception messages back to the api user (#502)
- Change the Participant.datastring column to be lazy-loaded, causing the datastring to not be loaded by the
sqlalchemy model until explicilty requested. Leads to massive speed increases for any query involving the
sqlalchemy model until explicitly requested. Leads to massive speed increases for any query involving the
Participant table for cases where the datastring is large. (Thanks @evankirkiles!)(#504)

## [3.1.0]
Expand Down
11 changes: 8 additions & 3 deletions doc/databases_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ See :ref:`deploy-on-heroku`.
Using a SQL database server
~~~~~~~~~~~~~~~~~~~~~~~~~~~

A more robust solution is to set up a `MySQL <http://www.mysql.com/>`__ database.
Similar to postgresql on Heroku, another robust solution is to set up a `MySQL <http://www.mysql.com/>`__ database.
psiTurk's reliance on `SQLAlchemy`_ for interfacing
with database which means it is easy to switch between MySQL, PostgreSQL, or SQLite.

Expand Down Expand Up @@ -123,9 +123,14 @@ where `your_username`, `your_password` and `your_database` match the `USERNAME`,

The table specified in config.txt::

table_name = turkdemo
assignment_table_name = turkdemo

...will be created automatically when running the psiturk shell. In addition several
other tables are automatically created to manage other aspects of task scheduling
and hit management (see the `hits_table_name`, `campaigns_table_name`, and
`jobs_table_name` options in config.txt).


...will be created automatically when running the psiturk shell.
MySQL is (fairly) easy to install and free. However, a variety of web hosting
services offer managed MySQL databases. Some are even
`free <https://www.google.com/search?q=free+mysql+hosting>`__.
Expand Down
67 changes: 62 additions & 5 deletions doc/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -487,23 +487,80 @@ that you can connect to this url with a MySQL client prior to
launching.


table_name
~~~~~~~~~~

Specifies the table of the database you would like to write to.
assignments_table_name
~~~~~~~~~~~~~~~~~~~~~~

Specifies the table of the database you would like to write participant data to.

:Type: ``string``
:Default: assignments

**IMPORTANT**: psiTurk prevents the same worker
from performing as task by checking to see if the worker
appears in the current database table already. Thus, for a
single experiment (or sequence of related experiments) you want
to keep the `table_name` value the same. If you start a new
to keep the `assignments_table_name` value the same. If you start a new
design where it not longer matters that someone has done a
previous version of the task, you can change the `table_name`
previous version of the task, you can change the `assignments_table_name`
value and begin sorting the data into a new table.

If multiple experiments or users share the same database then this
table needs to be unique for each experiment/user.


table_name
~~~~~~~~~~

`Deprecated`

Specifies the table of the database you would like to write participant data to.

:Type: ``string``

For backwards compatibility, `assignments_table_name` is synonymous with `table_name`. If
both are set, `assignments_table_name` will be preferred over `table_name`.
Please use `assignments_table_name` going forward but either will work for now.



hits_table_name
~~~~~~~~~~~~~~~

Specifies the table of the database for storing information about current HITs.

:Type: ``string``
:Default: amt_hit

psiTurk creates several tables for bookkeeping. This one helps the dashboard
and command line manage HITs. If multiple experiments or users share the same database then this
table needs to be unique for each experiment/user.


campaigns_table_name
~~~~~~~~~~~~~~~~~~~~

Specifies the table of the database for storing campaigns.

:Type: ``string``
:Default: campaign

psiTurk creates several tables for bookkeeping. This one helps the dashboard
and command line manage campaigns. If multiple experiments or users share the same database then this
table needs to be unique for each experiment/user.


jobs_table_name
~~~~~~~~~~~~~~~

Specifies the table of the database for storing job information.

:Type: ``string``
:Default: apscheduler_jobs

psiTurk creates several tables for bookkeeping. This one helps the task
schedule manage automated jobs. If multiple experiments or users share the same database then this
table needs to be unique for each experiment/user.


Server Parameters
Expand Down
20 changes: 20 additions & 0 deletions psiturk/default_configs/local_config_defaults.txt
Original file line number Diff line number Diff line change
Expand Up @@ -149,9 +149,29 @@ ad_url_route = pub
# If ON_CLOUD=1, then this defaults to env var $DATABASE_URL, if set.
database_url = sqlite:///participants.db

# psiTurk creates several database tables to store data and to manage aspects of
# automated tasks, campaigns, etc...
# if multiple users share the same database these value need to be changed to be
# unique for each experiment.

# Name of the database table where participant data will be stored
assignments_table_name =

# Deprecated. Use `assignments_table_name` instead.
#
# For backwards compatibility, `assignments_table_name` is synonymous with `table_name`. If
# both are set, `assignments_table_name` will be preferred over `table_name`.
table_name = assignments

# Name of the database table where hit information is stored
hits_table_name = amt_hit

# Name of the database table where campaign information is stored
campaigns_table_name = campaign

# Name of the database table where the task scheduler (apscheduler) places jobs
jobs_table_name = apscheduler_jobs

############################# Server Parameters ################################
[Server Parameters]
# Host on which the psiturk gunicorn server will listen when `psiturk server on`
Expand Down
20 changes: 20 additions & 0 deletions psiturk/example/config.txt.sample
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,29 @@
# If ON_CLOUD=1, then this defaults to env var $DATABASE_URL, if set.
;database_url = sqlite:///participants.db

# psiTurk creates several database tables to store data and to manage aspects of
# automated tasks, campaigns, etc...
# if multiple users share the same database these value need to be changed to be
# unique for each experiment.

# Name of the database table where participant data will be stored
;assignments_table_name =

# Deprecated. Use `assignments_table_name` instead.
#
# For backwards compatibility, `assignments_table_name` is synonymous with `table_name`. If
# both are set, `assignments_table_name` will be preferred over `table_name`.
;table_name = assignments

# Name of the database table where hit information is stored
;hits_table_name = amt_hit

# Name of the database table where campaign information is stored
;campaigns_table_name = campaign

# Name of the database table where the task scheduler (apscheduler) places jobs
;jobs_table_name = apscheduler_jobs

############################# Server Parameters ################################
[Server Parameters]
# Host on which the psiturk gunicorn server will listen when `psiturk server on`
Expand Down
5 changes: 4 additions & 1 deletion psiturk/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,12 @@ def check_templates_exist():
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from pytz import utc



from .db import engine
JOBS_TABLENAME = CONFIG.get('Database Parameters', 'jobs_table_name')
jobstores = {
'default': SQLAlchemyJobStore(engine=engine)
'default': SQLAlchemyJobStore(engine=engine, tablename=JOBS_TABLENAME)
}
if 'gunicorn' in os.environ.get('SERVER_SOFTWARE', ''):
from apscheduler.schedulers.gevent import GeventScheduler as Scheduler
Expand Down
18 changes: 12 additions & 6 deletions psiturk/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,12 @@
config.load_config()


TABLENAME = config.get('Database Parameters', 'table_name')
# The "table_name" config key is deprecated -- it will be replaced by
# `assignments_table_name` in a future release.
ASSIGNMENTS_TABLENAME = config.get('Database Parameters', 'table_name')
HITS_TABLENAME = config.get('Database Parameters', 'hits_table_name')
CAMPAIGNS_TABLENAME = config.get('Database Parameters', 'campaigns_table_name')

CODE_VERSION = config.get('Task Parameters', 'experiment_code_version')

# Base class
Expand All @@ -40,7 +45,7 @@ class Participant(Base):
"""
Object representation of a participant in the database.
"""
__tablename__ = TABLENAME
__tablename__ = ASSIGNMENTS_TABLENAME

uniqueid = Column(String(128), primary_key=True)
assignmentid = Column(String(128), nullable=False)
Expand Down Expand Up @@ -154,7 +159,7 @@ def get_question_data(self):
print(("Error reading record:", self))

return ""

@classmethod
def count_completed(cls, codeversion, mode):
completed_statuses = [3, 4, 5, 7]
Expand Down Expand Up @@ -205,14 +210,15 @@ def all_but_datastring(cls):
class Hit(Base):
"""
"""
__tablename__ = 'amt_hit'
__tablename__ = HITS_TABLENAME
hitid = Column(String(128), primary_key=True)


class Campaign(Base):
"""
"""
__tablename__ = 'campaign'

__tablename__ = CAMPAIGNS_TABLENAME
id = Column(Integer, primary_key=True)
codeversion = Column(String(128), nullable=False)
mode = Column(String(128), nullable=False)
Expand Down Expand Up @@ -245,7 +251,7 @@ def _validate_greater_than_already_completed(self, goal):
assert goal > count_completed, \
f'Goal ({goal}) must be greater than the count of '\
f'already-completed {count_completed}).'

@validates('mode')
def validate_mode(self, key, mode):
assert mode in ['sandbox', 'live'], 'Mode {} not recognized.'.format(mode)
Expand Down