Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split name column in customers table #8763

Merged
merged 27 commits into from
Feb 17, 2022

Conversation

pacodelaluna
Copy link
Contributor

What? Why?

Closes #8577.

This code is adding the two new columns (first_name and last_name) into customers table, and updating the values based on the existing column (backup_name).
We update then the logic where it was using the former field in order to use the two new ones. And maintain the specs accordingly.

What should we test?

Unit tests will cover most cases but you can see the two new fields on the Admin Customer page.

Release notes

Changelog Category: Technical changes

The title of the pull request will be included in the release notes.

@pacodelaluna pacodelaluna marked this pull request as draft January 19, 2022 21:43
@pacodelaluna pacodelaluna changed the title Split name column in customers table WIP: Split name column in customers table Jan 19, 2022
@mkllnk
Copy link
Member

mkllnk commented Jan 19, 2022

I'm moving this into the In Dev column because it's a draft PR. Let us know once it's ready for review.

Comment on lines 15 to 19
Customer.where("backup_name LIKE '% %'").find_each do |customer|
first_name, last_name = customer.backup_name.split(' ', 2)
customer.first_name = first_name
customer.last_name = last_name
customer.save
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to break down the available data into sets and handle each set as optimally as we can, splitting the names in multiple passes. One thing we can do: a large percentage of customers records have a bill_address.first_name and bill_address.last_name that match the customer.name but are already correctly split into two columns. We should favour that option where it's available, as it'll give the highest fidelity. This is important where we have names that currently contain three or more parts like Shia Le Bouf or Mary Jane Watson where it's not possible to determine whether or we need to split on the first or second space.

Another thing we could do just before dealing with the data is normalize it a bit. There are a lot of entries in the customer.name column which have superfluous spaces in them like "Alice     Jones". If we strip them out in a first pass before handling them, it'll be easier to deal with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it, I will use the bill_address.first_name and bill_address.last_name in priority, and if not present apply the current process with customer.name value. Any new value will be stripped before being saved.

@pacodelaluna pacodelaluna force-pushed the 8577-split-name-column-in-customers-table branch 5 times, most recently from ebe48aa to ca448df Compare January 21, 2022 13:06
@pacodelaluna pacodelaluna force-pushed the 8577-split-name-column-in-customers-table branch from f375f42 to 34971c9 Compare January 23, 2022 15:09
Copy link
Contributor Author

@pacodelaluna pacodelaluna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic should be updated now.
Let me know if the migration script should be improved again.
There are still a Rubocop warning about a cyclomatic complexity on a method but I am not sure it is really realted to this PR, so I don't if it is fine or if we should fix it, let me know.

db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
spec/factories.rb Show resolved Hide resolved
spec/system/admin/subscriptions_spec.rb Outdated Show resolved Hide resolved
db/schema.rb Outdated
@@ -51,9 +51,11 @@
t.datetime "updated_at", null: false
t.integer "bill_address_id"
t.integer "ship_address_id"
t.string "name", limit: 255
t.string "backup_name", limit: 255
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the data for now, with a new column name to be sure to handle the plugged logic. We will delete the data later, once the migration is validated on each instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. We should create an issue for this so that we don't forget.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see which migration this schema change is associated with..?

@pacodelaluna pacodelaluna self-assigned this Jan 23, 2022
@RachL RachL changed the title WIP: Split name column in customers table Split name column in customers table Jan 25, 2022
@pacodelaluna pacodelaluna marked this pull request as ready for review January 25, 2022 08:35
Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking forward to the finished PR. Are you currently waiting on feedback? When are you able to progress?

db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
@pacodelaluna pacodelaluna force-pushed the 8577-split-name-column-in-customers-table branch from d17e3e2 to 99846c3 Compare February 1, 2022 22:52
Copy link
Contributor Author

@pacodelaluna pacodelaluna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkllnk Yes, actually I was waiting a Code Review, this is why I put the PR into Code Review pipeline. But maybe I missed something with the process, should I also communicate about it on Slack?
I have added your suggestions for now but still struggling with GoodMigration issue, if you have any ìdeas about it, don't hesitate, I am a bit lost right now.

db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/migrate/20220105085729_split_customers_name.rb Outdated Show resolved Hide resolved
db/schema.rb Outdated
@@ -51,9 +51,11 @@
t.datetime "updated_at", null: false
t.integer "bill_address_id"
t.integer "ship_address_id"
t.string "name", limit: 255
t.string "backup_name", limit: 255
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. We should create an issue for this so that we don't forget.

Copy link
Contributor Author

@pacodelaluna pacodelaluna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your feedback @mkllnk! I have updated the PR, and tested locally with multiple cases, I hope I have not missed anything this time 🤞

@pacodelaluna
Copy link
Contributor Author

Of course I forgot the specs... It was failing due to the new configuration for the new fields. The default value is not set by default actually, I needed to force it on ensure_customer method.

I still have a rubocop flag, about Cyclomatic Complexity of the ensure_customer method. I was thinking of removing the BugSnag notification in case of the customer is not saved, I am not seeing such calls on the Bugsnag account of FR instance, it looks like something that was supposed to be setup temporary to control data flow, isn't it?

Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, very close.

Comment on lines 713 to 732
def ensure_customer
return if associate_customer

self.customer = Customer.new(
self.customer = Customer.create(
enterprise: distributor,
email: email_for_customer,
user: user,
name: bill_address&.full_name,
first_name: bill_address&.first_name.to_s,
last_name: bill_address&.last_name.to_s,
bill_address: bill_address&.clone,
ship_address: ship_address&.clone
)
customer.save

Bugsnag.notify(customer.errors.full_messages.join(", ")) unless customer.persisted?
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you increased the complexity of this method much but if we want to satisfy Rubocop we can write this method as associate_customer || create_customer. Remove the Bugsnag notification and move the customer creation into its own method. That's much easier to read, actually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Campsite rule: Leave the campsite better than you found it.

Comment on lines 374 to 375
first_name: First name
last_name: Last name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't change other locales than en.yml because it would lead to conflicts with our Transifex process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ak ok, I just followed what was there in first commits, I will update it in Transifex then.

Comment on lines 27 to 28
first_name: bill_address.firstname.strip,
last_name: bill_address.lastname.strip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the database schema, these values could be NULL. The model has a conditional on the validation. 🤔

French data doesn't have any NULL records but Australia has:

Spree::Address.where(firstname: nil).count
=> 66
Suggested change
first_name: bill_address.firstname.strip,
last_name: bill_address.lastname.strip
first_name: bill_address.firstname.to_s.strip,
last_name: bill_address.lastname.to_s.strip

@pacodelaluna pacodelaluna force-pushed the 8577-split-name-column-in-customers-table branch from 9872afa to aea932b Compare February 8, 2022 18:07
@pacodelaluna pacodelaluna requested a review from mkllnk February 8, 2022 18:38
Comment on lines 707 to 714
def associate_customer
return customer if customer.present?

self.customer = Customer.of(distributor).find_by(email: email_for_customer)
Customer.of(distributor).find_by(email: email_for_customer)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

associate_customer is a before_validation action. After your change, it doesn't have any effect any more. But if ensure_customer is called then we don't need to call associate_customer beforehand anyway. So maybe we can remove that action.

before_validation :associate_customer, unless: :customer_id?
before_validation :ensure_customer, unless: :customer_is_valid?

I'm just wondering about edge cases now. So if ensure_customer is executed, it includes associate_customer and everything is good. The logic would only change if ensure_customer is not executed but associate_customer would. The conditions for that is:

  • no customer id present
  • customer is valid:
    • the order is a new record or in cart state and therefore we don't need a customer record yet
    • customer is present (impossible due to first condition above) and other conditions that don't matter

So the only case in which we would have tried to associate a customer without creating a new record is when the order is new or in cart state. I could imagine that this is needed to associate tags during shopping to that customer specific products, discounts or fees can be applied.

I'm not sure if this is handled somewhere else but you might be introducing a bug here. I do think that this part of the code needs refactoring but maybe the full refactor is out of scope. I would leave the self.customer = in the code to keep the old behaviour here and then we clean it up another time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that I have an idea how to resolve this. I'll add a commit to your branch and we need to extend the testing notes with the case above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the associate_user method is used in order_cart_reset.rb so I guess I am introducing a bug here. I was too optimistic with the private section. I should have checked more carefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw about the mix between associate_customer and the ensure_customer methods but I was not sure how to refactor it quickly maintaining the existing specs. I should have refactor the specs first as you did, looks great.

@mkllnk
Copy link
Member

mkllnk commented Feb 9, 2022

By the way, I ran the migration on a local copy of the AU production database and it took 30 seconds. There are 5412 customer records with an empty backup_name. And 5180 records don't have a first name. That's roughly a quarter of the 21869 customer records. I looked further into the most recent case and while the customer record doesn't have a billing address, there are complete orders with billing address.

I guess that the customer's billing address is just a stored default, a preference. And some people never save their address for future use in which case it's only stored on each order. Or the customer record was created before the billing address was set. I'm not sure about the sequence in the current checkout process.

Anyway, I reckon that we can find more data for the name fields. The question is though: Will new customer records get the name saved? Is it bad if we don't have a name?

I will ask in Slack what other people think.

Looking at production data, there's only a small percentage of customers with a bill address and without a name. And slightly more, but still small amount of records with a name but no bill address.

~73% have a bill address and a name
~24% have no bill address and no name
~3% have either a bill address or a name, not both

@mkllnk
Copy link
Member

mkllnk commented Feb 9, 2022

I have a new proposal for the migration but I'm waiting for feedback on Slack:

We only look for billing address name parts if a customer record has a name. And then we choose a billing address which resembles the full name. That can either come from the customer record itself or from one of the orders. The point is that the stored billing address helps us to correctly split the full name and that we don't do a mistake by just splitting at the first whitespace. We don't have to populate customer records with names if they don't have one already.

We can probably do some trimming and concatenating in the database to do this efficiently and only get matching billing addresses.

@mkllnk
Copy link
Member

mkllnk commented Feb 9, 2022

@pacodelaluna

Lynne reckons we should do this migration as easy as possible:

The consequences of getting this wrong are minimal. If we split first name and last name in the wrong place it can:

easily be fixed
afaik wont create additional adverse consequences on shop workflow than are already present

So, personally I'm happy with whatever you do and suggest the simplest way possible. I'd be fine with:

if customer.name.exists -> split at first white space
if doesn't exist -> leave null

What do you think? Shall we simplify or use my suggestion above? It's not that hard. We have a solution already.

@pacodelaluna
Copy link
Contributor Author

@mkllnk I will follow your decision, you have better understanding of the whole logic behind than me. I found it nice to try to complete the missing data, but if it is not that important finally, we can leave it.

Here are some numbers from our production data:
~88% of customers with a bill address (all of them with firstname and lastname)
~91% of customers with a name
~8% of customers without a name or a bill address

François Turbelin and others added 20 commits February 16, 2022 10:23
The newly added specs were tested on the master branch and passed. But
the previous commit broke one test case which I marked as pending here.

The removed specs are completely replaced by the new ones. Their main
downside was to test only small bits of internal behaviour without
considering the whole callback chain of the order. The new specs are
more realistic and catch bugs like mentioned above.
And simplify the before_validation actions.
Our app code will try to access the old attribute while the migration is
running. That would lead to errors during checkout.
@mkllnk mkllnk force-pushed the 8577-split-name-column-in-customers-table branch from 5a0e22c to a6dee77 Compare February 15, 2022 23:33
@filipefurtad0 filipefurtad0 added the pr-staged-au staging.openfoodnetwork.org.au label Feb 17, 2022
@filipefurtad0 filipefurtad0 self-assigned this Feb 17, 2022
@filipefurtad0
Copy link
Contributor

Hey @pacodelaluna 👋

Thanks for this! 🚀

Deployment went on with no issues and the UI changes can be seen indeed:

  • /customers -> before this PR

image

  • /customers -> after this PR

image

Previous customer names are now split into two columns.

As a quick sanity check I've placed a Stripe order for a new customer, to verify the name got correctly split:

image

I'd say this is good to go 👍

@filipefurtad0 filipefurtad0 removed the pr-staged-au staging.openfoodnetwork.org.au label Feb 17, 2022
@Matt-Yorkley Matt-Yorkley merged commit 9a27add into master Feb 17, 2022
@Matt-Yorkley Matt-Yorkley deleted the 8577-split-name-column-in-customers-table branch February 17, 2022 11:57
dacook added a commit to dacook/openfoodnetwork that referenced this pull request Sep 16, 2024
These were added a couple of years ago in openfoodfoundation#8763
But I guess we never noticed the names weren't getting anonymised.

The old 'name' field is still in the DB. It was kept for compatibility during migraiton but never cleaned up. I've added the tech debt task to the welcome new devs board now: openfoodfoundation#8835
rioug pushed a commit to rioug/openfoodnetwork that referenced this pull request Dec 3, 2024
These were added a couple of years ago in openfoodfoundation#8763
But I guess we never noticed the names weren't getting anonymised.

The old 'name' field is still in the DB. It was kept for compatibility during migraiton but never cleaned up. I've added the tech debt task to the welcome new devs board now: openfoodfoundation#8835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split name column in customers table
6 participants