AR: Why isn't Primary Key Given a Unique Validator?

evstevemd · January 17, 2018, 9:52am

I have observed that when I insert record with duplicate PK I get constraint violation. I understand that from Database point of view that is correct. But since PK is supposed to be unique, and I cannot think of single use case where one would want to insert dupe PK. So my question is, why doesn’t Yii ActiveRecord apply the unique validator automagically?

Is there a way to set all my ActiveRecord PK to automatically validate as unique without manually adding rule for each?

MetaCrawler · January 17, 2018, 10:46am

Hi,

When I remember correctly:

Usually you only want to validate data wich receives input from users.
Usually the PK is not set by a user.

That should be the reason there is no automatic unique validation.

When I’m wrong:

Anybody feel free to correct / enlighten me.

Best Regards

softark · January 17, 2018, 11:32am

I agree with MetaCrawler.

Yes, you are right in this.

But I’ve never been worried about it, because almost all the PKs in my tables are AUTO_INCREMENT and I don’t have to set a unique value to them manually.

So I’m curious about a situation where you would want to set a PK manually for inserting a record.

evstevemd · January 17, 2018, 12:32pm

Hi,

In many cases AI is sufficient but there are other cases where natural keys makes sense and hence user must set it

[list=1][]User Registration where you can use unique mobile number or SSN[]Streets where each street have a unique number[*]Cities where each city have code (same applies to countries or States)[/list] I can go on and on, but there are use cases for both camps. So I believe it is reasonable to have support for auto-validating PK automatically.

Since AR knows about schema, it can skip such validation for AI surrogate keys

Cheers!

evstevemd · January 17, 2018, 12:33pm

I have tried to put few real world use cases. I’m interested to hear what you guys have to say about such situation.

Cheers!

softark · January 17, 2018, 12:55pm

I see. Thank you for the response.

A meaningful PK instead of AI PK would be OK for some situation, but what you have to do is just manually adding the ‘unique’ validation rule for it.

BTW, please take a look at the following wiki.

Guidelines for good schema design (http://www.yiiframework.com/wiki/227/guidelines-for-good-schema-design)

It was written for Yii 1.1, but it is still valid for Yii 2.0. Or, I would like to say that it’s valid for all the web application frameworks out there in the world today.

One of the guidelines read like this:

I follow this guideline. What do you think about it?

evstevemd · January 17, 2018, 1:13pm

Right!

I would disagree with that. Suppose you have a System that require mobile number. You have two options: One is making a surrogate id and make another filed for mobile number and make it unique. I don’t see why should I have two columns when I could just have single PK that is a mobile number.

It is a debate of ages, that is Natural vs Surrogate key and as for all preferential debates there is no winning side. So catering for both worlds when it does not hurt the framework design and purposes is the best way IMHO!

Cheers!

softark · January 17, 2018, 1:48pm

I see.

I agree that adding ‘unique’ validation rule for PK that is not AUTO_INCREMENT would be a good idea, while I strongly believe that we should not add ‘unique’ validation rule for ALL PK including AI one because it would execute an unnecessary db access.

Why don’t you make a PR for yii2-gii for it?

evstevemd · January 17, 2018, 2:33pm

I agree with your comment completely. For now I would not create pull request as am unsure if it does not break some design within Yii framework. So I would like to hear first form any core developer and hear if it merits a PR. Then I will definitely do that!

bobonov · January 17, 2018, 8:39pm

2 simple reason:

user cellphone, email and other data used as primary key can change in time and update all the reference across the db would be a nightmare
a string data (email, cellphone, usrname…) field occupy much more byte than an integer field an this would increase db size (columns, indexes) consequently with db performance issue

evstevemd · January 18, 2018, 11:33am

Of course it would be a nightmare if you badly designed your FK constraints. If you app allows to later change the number, then simple CASCADE in the FK definition will do it.

We use INT for MySQL and BigInt for PGS. But still I don’t think it is that much important. With current storages and speed the data size added by for example AL for Alabama compared to let say 25 is negligible.

I believe your points are genuine but I think they have little effect in modern database unless poor choices (such as using DARESSALAAM as City code) are made

bobonov · January 18, 2018, 2:56pm

In my opinion is a bad idea to use cascade update since it would do a lot of unnecessary work on dbside in case of pk change.

This highly depend on the number of users your application has, think to an application where you have 300.000 users

When I matter about size is not about disk occupation, this is not a matter of discussion nowadays.

I always says that if your business is in need to buy a new storage it means you can afford it.

Think about indexes…

This is an optimistic case where you have only 2 char. 2 char goes only for a bunch of records, 676 max if I’m not wrong.

in mysql, if tyou use utf8, it will occupy

(2 char x 2 Byte) +1 Byte storage = 5 Byte

With an unsigned integer 4Byte (I always use unsigned for pk since autoincrement do not generate negatives value) I can store 4294967295 rows

So to hold 676 row is sufficient a 2 Byte smallint if not even a 1Byte tinyint unsigned if your cities are less than 256

When you spread this across 300.000 record it start to be something.

My concern is about indexes. Index performance are very dependent on index size, the small is the faster is.

Index are used on every operation, insert update delete and select.

Db keep indexes in memory (if they can) for faster access, the smaller the index the more can be in memory.

When you come to select is where you see the difference from good designed indexes and bad one, especially when you join multiple table and do filtering on data.

In a project my first task was to speedup the application, didn’t touch a line of code beside a couple sql join badly designed. Just added appropriate indexes and from more than 30 sec the load time dropped to less than 2 sec. This just by using combined indexes. But for this to be efficient indexed field must be as little as possible in order to shrink index size.

If I’m not wrong indexes performance degradation is on a logarithmic curve, this means that after a certain point performance degradation will be worse and worse at every record added

Changing PK from char to integer in production… good luck