Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique together #288

Closed
colegleason opened this issue Aug 8, 2013 · 20 comments
Closed

Unique together #288

colegleason opened this issue Aug 8, 2013 · 20 comments
Labels

Comments

@colegleason
Copy link

It would be nice to specify that two or more columns have to be unique together, but not unique themselves.

@notheotherben
Copy link
Collaborator

You can do this by specifying multiple primary keys, which will cause the database engine to treat their combination as a unique primary key.

db.define('model', {
    field1: String,
    field2: String
}, {
    id: ['field1', 'field2']
});

@dxg
Copy link
Collaborator

dxg commented Aug 11, 2013

I've also just improved the unique validator to support scope. See the wiki.

We should also at some point add the following syntax:

db.define('item', {
  name:     { type: 'string', unique: 'name_cat' },
  category: { type: 'string', unique: 'name_cat' }
});

// CREATE UNIQUE INDEX item_name_cat ON item (name, category);

@dxg dxg closed this as completed Aug 11, 2013
@dresende
Copy link
Owner

Thank you @dxg , that idea is exactly what I was thinking when I read this issue.

notheotherben added a commit to notheotherben/node-orm2 that referenced this issue Aug 13, 2013
Shouldn't check undefined and null values, since ORM uses these as
defaults and it can lead to failed validations regardless of passed
values.
@notheotherben
Copy link
Collaborator

Surely it should be possible to use the suggested new syntax to generate something along the lines of

INDEX IDX_scope (`name`, `category`)

Thus ensuring the uniqueness on the database itself...

Disclaimer: I haven't actually checked the syntax there, just guessing at it from memory.

@dxg
Copy link
Collaborator

dxg commented Aug 13, 2013

Question is though, should adding those unique: 'name_cat' options cause a unique validator to be automatically added to the model? Adding required: true automatically adds a required validator.

@notheotherben
Copy link
Collaborator

I think, if possible, we should avoid client side validators and rather rely on the database engine (since it will be a factor faster). When we add required: true we are also including that information in the database definition by appending NOT NULL to any columns.

I'd recommend that if the database is able to, we avoid adding the validator and instead work with appending unique index statements to the database instantiation SQL. (Obviously we'd need to mention this, since some people with pre-written databases without these indexes would not experience any difference between unique: false and unique: true or unique: 'name').

@dxg
Copy link
Collaborator

dxg commented Aug 14, 2013

The question is though, do the supported database servers return parseable enough errors for that to be possible?

There's another problem; afterValidate and beforeSave will both fire before the unique validation error comes back. This is not what the user will expect.

@notheotherben
Copy link
Collaborator

I'm not sure about how parsable the errors would be, but I don't think we necessarily have to wrap them in anything special. Ideally the user shouldn't be running queries which violate the DB schema in the first place.

Maybe we can have both the validator and the unique property and allow them to fulfil different tasks. The validator can act as a stopgap between bad input into the user's application and the database queries, while the unique property can act as a database schema restriction. I think given the way the two would be used that it makes sense to do it that way, and the user would expect schema errors to be returned by the database engine post query.

Essentially we could have the following

db.define('model', {
    id: { type: 'text', required: true, unique: true },
    first_name: { type: 'text', required: true, unique: 'name' },
    last_name: { type: 'text', required: true, unique: 'name' },
    email: { type: 'text', unique: true }
}, {
    id: 'id',
    validations: {
        first_name: [orm.enforce.required(), orm.enforce.unique('name')],
        last_name: [orm.enforce.required(), orm.enforce.unique('name')],
        email: [orm.enforce.patterns.email(), orm.enforce.unique().orUndefined()]
    }    
});

It should then generate the following MySQL table creation query, using the model name and unique key if present, or the name of the field if unique = true to generate the index name.

CREATE TABLE model (
    id VARCHAR(255) NOT NULL, 
    first_name VARCHAR(255) NOT NULL, 
    last_name VARCHAR(255) NOT NULL,
    email VARCHAR(255),
    PRIMARY KEY(id),
    UNIQUE INDEX IX_model_name (first_name, last_name),
    UNIQUE INDEX IX_model_email (email)
    );

We can then keep the enforce validator for unique and allow that to do pre-query validation on the fields if the user wishes.

How does that sound to you guys?

@dxg
Copy link
Collaborator

dxg commented Aug 14, 2013

So the idea is to stick with what we have, and add the { type: 'text', required: true, unique: 'name' } syntax.
I'm happy with that.

@dresende
Copy link
Owner

I think the required option of a property, although it's validated in the database, it never gets there because there's code to check this. I think this is good if we can avoid the database (because the link can be a lot slower than an if statement). For consistency, we should remove that if and:

  1. Add an automatic required validation to the property (user will then see an error he might expect, with detailed information);
  2. Or rely on database (although not bad, I don't think this is the best option).

The unique option should then follow the same pattern.

@notheotherben
Copy link
Collaborator

I think the difference with the unique option is that it requires a query to the database to find any duplicate keys, which negates the efficiency bonus that we get with required. In the case of required we can avoid a query by implementing the logic in ORM whereas with unique we would be moving from a single insert query (with an error response) to two queries (one to determine whether there are any duplicate keys in the database and one to perform the insert/update) instead of the single update query which would be issued otherwise.

It also introduces the issue of handling NoSQL datastores, or databases which don't support whatever query we use to check for uniqueness - which then requires each database driver to define their own implementation of an isUnique function.

My vote would be to split database logic and application logic up into the properties and opts parameters for db.define. Thus, anything defined within properties will only affect how the database is generated and queried while opts.validations would allow application side validation of data. It makes it easy to comprehend where the validations will take place, and what error messages (and when) you can expect.

@dresende
Copy link
Owner

I agree with you that unique validator could be avoided by just trying to insert it. Perhaps we could just check what kind of error is returned for each driver and have a way for instance to know that. If we can do this, we might deprecate the unique validator since it might be inefficient on large databases. I have to check how to do this in NoSQL datastores.

@notheotherben
Copy link
Collaborator

Certainly is an option, maybe have a central error type wrapper within node-sql-query or ORM which translates, on a per driver basis, error codes into a number of standardized forms. Would allow us to easily interop for other features in the future, and allow the database engines to handle what they're best at while ORM acts as an interface layer (which it's best at). As for NoSQL, at the moment I've added a bit of code which disables the unique validator for NoSQL database engines, however from what I've gathered so far it would be extremely difficult to implement such a thing given the emphasis on horizontal scalability present in most of those engines (which attempts to do away with keys and relationships). I'm currently taking a course on MongoDB in Node.js and I'll see what (if any) of that I can apply to this project.

@dresende
Copy link
Owner

Ok, thank you for your effort on this :)

@notheotherben
Copy link
Collaborator

No worries, actually using this module in a project for one of my other courses - most of the functionality I end up adding is stuff I need there as well, so not really any extra effort :)

@dxg
Copy link
Collaborator

dxg commented Aug 14, 2013

That all sounds wonderful, however we need a reality check:

var should   = require('should');
var helper   = require('../support/spec_helper');
var ORM      = require('../../');

describe("unique messages", function() {
  var db = null;
  var Person = null;

  var setup = function () {
    return function (done) {
      Person = db.define("person", {
        name:   { type: 'text' },
        email:  { type: 'text', unique: true },
        someId: { type: 'text', unique: true }
      });

      return helper.dropSync(Person, done);
    };
  };

  before(function (done) {
    helper.connect(function (connection) {
      db = connection;
      done();
    });
  });

  after(function () {
    db.close();
  });

  before(setup());

  it("should be enumerated", function(done) {
    Person.create({ email: 'e@e.com', someId: 4 }, function (err, person) {
      should.not.exist(err);

      Person.create({ email: 'e@e.com', someId: 4 }, function (err, person) {
        should.exist(err);
        console.log('\n'+JSON.stringify(err, null, 2));

        return done();
      });
    });
  });
});

Now let's see if we have enough data to construct meaningful validation messages:

Postgres

{
  "length": 161,
  "name": "error",
  "severity": "ERROR",
  "code": "23505",
  "detail": "Key (email)=(e@e.com) already exists.",
  "file": "nbtinsert.c",
  "line": "396",
  "routine": "_bt_check_unique",
  "index": 0,
  "instance": {
    "email": "e@e.com",
    "someId": 4,
    "name": null
  }
}

Raw sql response:

ERROR:  duplicate key value violates unique constraint "person_email_key"
DETAIL:  Key (email)=(e@e.com) already exists.

Mysql

{
  "code": "ER_DUP_ENTRY",
  "index": 0,
  "instance": {
    "email": "e@e.com",
    "someId": 4,
    "name": null
  }
}

Raq sql response:

ERROR 1062 (23000): Duplicate entry 'e@e.com' for key 'email'

SQLite

{
  "errno": 19,
  "code": "SQLITE_CONSTRAINT",
  "index": 0,
  "instance": {
    "email": "e@e.com",
    "someId": 4,
    "name": null
  }
}

I believe the answer is no. Responses from sql servers don't enumerate all errors with the insert query; They return the first one whereas there should be two. In some cases, the error messages are very cryptic and as in the case of the mysql driver, are incomplete.

@notheotherben
Copy link
Collaborator

Wow, thanks for looking into that. I agree that the errors returned by the database certainly are lacking the details required to truly use them to generate validation messages for unique keys, but that takes us back to the issue of how to handle the validation in a performance friendly way - since in your demo case there would be 2 additional queries for each attempt to insert data (one for email and one for someId).

It does look like it may be possible to at least extract the first error from SQLite (assuming index refers to the index of the value in the query which caused the issue. Since we know the order that fields are used in ORM queries it should be possible to determine the field that caused the error and give some extra information.

var field = Object.keys(Model.fields)[error.index];
var value = Instance[field];

return new Error('Duplicate entry for unique field \'' + field '\': \'' + value.toString + '\'');

An alternative is to eagerly assume that the inserted data will pass validation, and if we get a validation error from the database we run the queries required to determine all the validation errors? That way we don't have the additional overhead for successful queries but are still able to get the information we need.

There is also the option of having validators which are executed only after a query has completed, so having unique: '...' would set the constraint on the database and the unique validator would (if unique: '...' was set on the property's column) only execute after the query, validating that no error occurred and generating meaningful information about the error if one did.

Person = db.define("person", {
        name:   { type: 'text' },
        email:  { type: 'text' },
        someId: { type: 'text', unique: true }
      }, {
        validations: {
            // Executes pre-query, making a request to the DB to check for uniqueness (current behaviour)
            email: orm.unique(), 

            // Executes post-query to check for a validation error, and if one is present - queries the DB for better info
            someId: orm.unique()
        }
      });

We should be able to check whether unique is set on a property by checking Model.fields[property].unique, from that we can define a property on the Validator (say, this.postQuery = true) and based on that handle validations.

Obviously I'm rather opposed to the idea of running additional queries against the database for every insert or update operation - for performance reasons - and in my case I'd be more than happy to settle for less information from a unique constraint error in exchange for better performance. That being said, there's definitely a place for a validator which gets that information, but I think it should be an opt-in alternative rather than the default (with warnings about the performance impact, especially with multiple unique keys).

@dxg
Copy link
Collaborator

dxg commented Aug 15, 2013

Also keep in mind that mysql returns the errors differently in different versions.
Eg: duplicate entry '0' for key 1

And if we change the table to id: ['id1', 'id2'] and try insert duplicate primary key:
ERROR 1062 (23000): Duplicate entry '5-6' for key 'PRIMARY'
and whilst postgres gives a relatively nice error, sqlite doesn't give one at all - which means you're gonna have to have some sort of lookup table, a whole bunch of regexs, and lots of custom logic to deal with all the different versions of all the various database servers.

I'm also unsure about the order in which hooks will be executed -
Wouldn't we have to trigger beforeSave & afterSave after the actual save operation has completed if it was successfull (but only for supported datastores)? What if the user wants to run a different query in a before filter before the data is saved? Do we revert to traditional validation behaviour if a before hook is detected?

Whilst I completely understand wanting to reduce the number of SQL queries, I'm not sure It's worth paying the price; significantly more complicated code.

Generally apps have frequent reads, and infrequent writes. And if I had a write-heavy table, I'd could manually optimize - by not adding the traditional 'slow' unique validator - and just handle the errors straight from the database.

I'm not sure if this is a problem that needs solving, especially given the increased complexity.

@dresende
Copy link
Owner

I was thinking more about looking into error codes, not descriptions. Error 1062 (in error.code) always means DUP-ENTRY.

http://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html#error_er_dup_entry

@anlebcoder
Copy link

anlebcoder commented Nov 7, 2018

I'm not sure about how parsable the errors would be, but I don't think we necessarily have to wrap them in anything special. Ideally the user shouldn't be running queries which violate the DB schema in the first place.

Maybe we can have both the validator and the unique property and allow them to fulfil different tasks. The validator can act as a stopgap between bad input into the user's application and the database queries, while the unique property can act as a database schema restriction. I think given the way the two would be used that it makes sense to do it that way, and the user would expect schema errors to be returned by the database engine post query.

Essentially we could have the following

db.define('model', {
    id: { type: 'text', required: true, unique: true },
    first_name: { type: 'text', required: true, unique: 'name' },
    last_name: { type: 'text', required: true, unique: 'name' },
    email: { type: 'text', unique: true }
}, {
    id: 'id',
    validations: {
        first_name: [orm.enforce.required(), orm.enforce.unique('name')],
        last_name: [orm.enforce.required(), orm.enforce.unique('name')],
        email: [orm.enforce.patterns.email(), orm.enforce.unique().orUndefined()]
    }    
});

It should then generate the following MySQL table creation query, using the model name and unique key if present, or the name of the field if unique = true to generate the index name.

CREATE TABLE model (
    id VARCHAR(255) NOT NULL, 
    first_name VARCHAR(255) NOT NULL, 
    last_name VARCHAR(255) NOT NULL,
    email VARCHAR(255),
    PRIMARY KEY(id),
    UNIQUE INDEX IX_model_name (first_name, last_name),
    UNIQUE INDEX IX_model_email (email)
    );

We can then keep the enforce validator for unique and allow that to do pre-query validation on the fields if the user wishes.

How does that sound to you guys?

good job.

index together: �set index same value ,like this: 'id_name_index';

unique together: set unique same value ,like this: 'id_name_unique';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants