-
-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split name column in customers table #8763
Split name column in customers table #8763
Conversation
I'm moving this into the In Dev column because it's a draft PR. Let us know once it's ready for review. |
Customer.where("backup_name LIKE '% %'").find_each do |customer| | ||
first_name, last_name = customer.backup_name.split(' ', 2) | ||
customer.first_name = first_name | ||
customer.last_name = last_name | ||
customer.save |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to break down the available data into sets and handle each set as optimally as we can, splitting the names in multiple passes. One thing we can do: a large percentage of customers records have a bill_address.first_name
and bill_address.last_name
that match the customer.name
but are already correctly split into two columns. We should favour that option where it's available, as it'll give the highest fidelity. This is important where we have names that currently contain three or more parts like Shia Le Bouf
or Mary Jane Watson
where it's not possible to determine whether or we need to split on the first or second space.
Another thing we could do just before dealing with the data is normalize it a bit. There are a lot of entries in the customer.name
column which have superfluous spaces in them like "Alice Jones". If we strip them out in a first pass before handling them, it'll be easier to deal with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it, I will use the bill_address.first_name
and bill_address.last_name
in priority, and if not present apply the current process with customer.name
value. Any new value will be stripped before being saved.
ebe48aa
to
ca448df
Compare
f375f42
to
34971c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic should be updated now.
Let me know if the migration script should be improved again.
There are still a Rubocop warning about a cyclomatic complexity on a method but I am not sure it is really realted to this PR, so I don't if it is fine or if we should fix it, let me know.
db/schema.rb
Outdated
@@ -51,9 +51,11 @@ | |||
t.datetime "updated_at", null: false | |||
t.integer "bill_address_id" | |||
t.integer "ship_address_id" | |||
t.string "name", limit: 255 | |||
t.string "backup_name", limit: 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the data for now, with a new column name to be sure to handle the plugged logic. We will delete the data later, once the migration is validated on each instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. We should create an issue for this so that we don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see which migration this schema change is associated with..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking forward to the finished PR. Are you currently waiting on feedback? When are you able to progress?
d17e3e2
to
99846c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkllnk Yes, actually I was waiting a Code Review, this is why I put the PR into Code Review pipeline. But maybe I missed something with the process, should I also communicate about it on Slack?
I have added your suggestions for now but still struggling with GoodMigration
issue, if you have any ìdeas about it, don't hesitate, I am a bit lost right now.
db/schema.rb
Outdated
@@ -51,9 +51,11 @@ | |||
t.datetime "updated_at", null: false | |||
t.integer "bill_address_id" | |||
t.integer "ship_address_id" | |||
t.string "name", limit: 255 | |||
t.string "backup_name", limit: 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. We should create an issue for this so that we don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your feedback @mkllnk! I have updated the PR, and tested locally with multiple cases, I hope I have not missed anything this time 🤞
Of course I forgot the specs... It was failing due to the new configuration for the new fields. The default value is not set by default actually, I needed to force it on I still have a rubocop flag, about Cyclomatic Complexity of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, very close.
app/models/spree/order.rb
Outdated
def ensure_customer | ||
return if associate_customer | ||
|
||
self.customer = Customer.new( | ||
self.customer = Customer.create( | ||
enterprise: distributor, | ||
email: email_for_customer, | ||
user: user, | ||
name: bill_address&.full_name, | ||
first_name: bill_address&.first_name.to_s, | ||
last_name: bill_address&.last_name.to_s, | ||
bill_address: bill_address&.clone, | ||
ship_address: ship_address&.clone | ||
) | ||
customer.save | ||
|
||
Bugsnag.notify(customer.errors.full_messages.join(", ")) unless customer.persisted? | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that you increased the complexity of this method much but if we want to satisfy Rubocop we can write this method as associate_customer || create_customer
. Remove the Bugsnag notification and move the customer creation into its own method. That's much easier to read, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Campsite rule: Leave the campsite better than you found it.
config/locales/en_GB.yml
Outdated
first_name: First name | ||
last_name: Last name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't change other locales than en.yml
because it would lead to conflicts with our Transifex process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ak ok, I just followed what was there in first commits, I will update it in Transifex then.
first_name: bill_address.firstname.strip, | ||
last_name: bill_address.lastname.strip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the database schema, these values could be NULL. The model has a conditional on the validation. 🤔
French data doesn't have any NULL records but Australia has:
Spree::Address.where(firstname: nil).count
=> 66
first_name: bill_address.firstname.strip, | |
last_name: bill_address.lastname.strip | |
first_name: bill_address.firstname.to_s.strip, | |
last_name: bill_address.lastname.to_s.strip |
9872afa
to
aea932b
Compare
def associate_customer | ||
return customer if customer.present? | ||
|
||
self.customer = Customer.of(distributor).find_by(email: email_for_customer) | ||
Customer.of(distributor).find_by(email: email_for_customer) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
associate_customer
is a before_validation action. After your change, it doesn't have any effect any more. But if ensure_customer
is called then we don't need to call associate_customer
beforehand anyway. So maybe we can remove that action.
openfoodnetwork/app/models/spree/order.rb
Lines 87 to 88 in aea932b
before_validation :associate_customer, unless: :customer_id? | |
before_validation :ensure_customer, unless: :customer_is_valid? |
I'm just wondering about edge cases now. So if ensure_customer is executed, it includes associate_customer and everything is good. The logic would only change if ensure_customer is not executed but associate_customer would. The conditions for that is:
- no customer id present
- customer is valid:
- the order is a new record or in cart state and therefore we don't need a customer record yet
- customer is present (impossible due to first condition above) and other conditions that don't matter
So the only case in which we would have tried to associate a customer without creating a new record is when the order is new or in cart state. I could imagine that this is needed to associate tags during shopping to that customer specific products, discounts or fees can be applied.
I'm not sure if this is handled somewhere else but you might be introducing a bug here. I do think that this part of the code needs refactoring but maybe the full refactor is out of scope. I would leave the self.customer =
in the code to keep the old behaviour here and then we clean it up another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think that I have an idea how to resolve this. I'll add a commit to your branch and we need to extend the testing notes with the case above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the associate_user
method is used in order_cart_reset.rb
so I guess I am introducing a bug here. I was too optimistic with the private
section. I should have checked more carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw about the mix between associate_customer
and the ensure_customer
methods but I was not sure how to refactor it quickly maintaining the existing specs. I should have refactor the specs first as you did, looks great.
By the way, I ran the migration on a local copy of the AU production database and it took 30 seconds. There are 5412 customer records with an empty I guess that the customer's billing address is just a stored default, a preference. And some people never save their address for future use in which case it's only stored on each order. Or the customer record was created before the billing address was set. I'm not sure about the sequence in the current checkout process. Anyway, I reckon that we can find more data for the name fields. The question is though: Will new customer records get the name saved? Is it bad if we don't have a name? I will ask in Slack what other people think. Looking at production data, there's only a small percentage of customers with a bill address and without a name. And slightly more, but still small amount of records with a name but no bill address. ~73% have a bill address and a name |
I have a new proposal for the migration but I'm waiting for feedback on Slack: We only look for billing address name parts if a customer record has a name. And then we choose a billing address which resembles the full name. That can either come from the customer record itself or from one of the orders. The point is that the stored billing address helps us to correctly split the full name and that we don't do a mistake by just splitting at the first whitespace. We don't have to populate customer records with names if they don't have one already. We can probably do some trimming and concatenating in the database to do this efficiently and only get matching billing addresses. |
Lynne reckons we should do this migration as easy as possible:
What do you think? Shall we simplify or use my suggestion above? It's not that hard. We have a solution already. |
@mkllnk I will follow your decision, you have better understanding of the whole logic behind than me. I found it nice to try to complete the missing data, but if it is not that important finally, we can leave it. Here are some numbers from our production data: |
The newly added specs were tested on the master branch and passed. But the previous commit broke one test case which I marked as pending here. The removed specs are completely replaced by the new ones. Their main downside was to test only small bits of internal behaviour without considering the whole callback chain of the order. The new specs are more realistic and catch bugs like mentioned above.
And simplify the before_validation actions.
Our app code will try to access the old attribute while the migration is running. That would lead to errors during checkout.
5a0e22c
to
a6dee77
Compare
Hey @pacodelaluna 👋 Thanks for this! 🚀 Deployment went on with no issues and the UI changes can be seen indeed:
Previous customer names are now split into two columns. As a quick sanity check I've placed a Stripe order for a new customer, to verify the name got correctly split: I'd say this is good to go 👍 |
These were added a couple of years ago in openfoodfoundation#8763 But I guess we never noticed the names weren't getting anonymised. The old 'name' field is still in the DB. It was kept for compatibility during migraiton but never cleaned up. I've added the tech debt task to the welcome new devs board now: openfoodfoundation#8835
These were added a couple of years ago in openfoodfoundation#8763 But I guess we never noticed the names weren't getting anonymised. The old 'name' field is still in the DB. It was kept for compatibility during migraiton but never cleaned up. I've added the tech debt task to the welcome new devs board now: openfoodfoundation#8835
What? Why?
Closes #8577.
This code is adding the two new columns (first_name and last_name) into customers table, and updating the values based on the existing column (backup_name).
We update then the logic where it was using the former field in order to use the two new ones. And maintain the specs accordingly.
What should we test?
Unit tests will cover most cases but you can see the two new fields on the Admin Customer page.
Release notes
Changelog Category: Technical changes
The title of the pull request will be included in the release notes.