Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/v0.14.0 #122

Merged
merged 24 commits into from
Nov 15, 2023
Merged

Release/v0.14.0 #122

merged 24 commits into from
Nov 15, 2023

Conversation

fivetran-catfritz
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz commented Nov 9, 2023

PR Overview

This PR will address the following Issue/Feature:

This PR will result in the following new package version:

  • v0.14.0
  • I selected breaking since the addition of the property labels macro was a bigger code change

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

🚨 Breaking Change 🚨

  • When including a passthrough property_hs_* column, you now have the option to include the corresponding, human-readable label in the staging models.
    • See the Adding property label section of the README for instructions on how to enable this feature!
    • This update applies to models:
      • stg_hubspot__company
      • stg_hubspot__contact
      • stg_hubspot__deal
      • stg_hubspot__ticket

Features

  • Added the following staging models, along with documentation and tests:
    • stg_hubspot__property
    • stg_hubspot__property_option

Bug fixes

  • Updated macro remove_duplicate_and_prefix_from_columns to accommodate incoming custom column names containing characters such as - or $ that are not permitted. The resulting column name will have these characters removed or replaced in its stg_* model.
  • Removed extra comma from stg_hubspot__ticket, which was causing compilation issues when passing through all columns.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present)

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

  • See ticket for validation link.

Additional Comments

  • I feel confident that my solution works, however I do realize adding it to the source is open for discussion. I chose this path because these 4 target tables (companies, contacts, deals, tickets) all diverge in logic in the transform with no clear pattern, so I'd have to be very mindful of when to bring in these fields and wonder how maintenance would be in the long run.
  • While this does require joins in the source, I felt it would be acceptable since it would only be enabled if the user enables it, and this way it's easy to see where these labels are being brought in alongside the other passthrough columns.
  • Again open for discussion, but just wanted to give a little insight into what I was thinking.

updates
updates
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz thanks for working though all of these updates! I pretty much only have comments around the new feature of the property labels. All the other components of the PR look great and work as expected when testing on my end.

Please review my comments and let me know if you have any questions. Thanks!

@@ -0,0 +1,40 @@
{{ config(enabled=var('hubspot_sales_enabled', False)) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how did we align on this being the variable that is needed to enable these models?

Also, is there a reason the default is false? The variable in all other models is true by default. This results in a strange behavior that is in conflict with the rest of the models in the package leveraging this variable.

For example, what if you are only using the contact model and you want to leverage the property labels? Given this logic it will not work because the sales is disabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for property_option and tmp models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to test and verify the issue.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The property tables are part of the sales schema, but thinking about it, it should have its own variable. Not everyone has this table, but maybe should not be tied to sales as a whole. I have updated vars, readme and changelog with this. I had initially agreed with the scoping:

Source package

  1. Add property_option table in (make staging models, seed files, etc)
  2. This table is are part of the Sales schema, so add {{ config(enabled=var('hubspot_sales_enabled', True)) }} to the staging models(tmp and non-_tmp) and the source config.enabled. We may want to also add a hubspot_property_option_enabled variable (2/3 of customers have the property tables, but something tells me the sales_enabled flag may cover this)

Comment on lines 3 to 15
select {{ cte_name }}.*

{%- if var(passthrough_var_name, []) != [] -%}
{%- set col_list = var(passthrough_var_name) -%}

{%- for col in col_list -%} -- Create label cols
{%- if col.add_property_label or var('hubspot__enable_all_property_labels', false) -%}
{%- set col_alias = (col.alias | default(col.name)) %}
, {{ col.name }}_option.property_option_label as {{ col_alias }}_label
{% endif -%}
{%- endfor %}

from {{ cte_name }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the property label feature is not enabled it results in a weird spacing within the cte. Can we fix the spacing to make sure it isn't all bunched together. I worry we could run into some unforeseen errors with it spaced like this.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the whitespace control!

Comment on lines +67 to +68
), joined as (
{{ add_property_labels('hubspot__company_pass_through_columns', cte_name='fields') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for fields that are explicitly declared in the passthrough variable, right? What if a customer is using the hubspot__pass_through_all_columns: true config and having all the property_hs columns included without needing to explicitly update the hubspot__company_pass_through_columns variable? Or is that not really a need here?

I could also see that causing the model to be VERY computation expensive since joins and subqueries would be created for every single property_hs field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also talked about this in response to your comment on macros/add_property_labels. But I purposefully opted not to allow this. Also see scoping comments in the ticket, but I'll paste the section that also influenced me:

Other Considerations
This becomes a lot more complicated for people who have hubspot__pass_through_all_columns set to true. The customer who opened the ticket specifically asked for this to be incorporated into the individual hubspot__[table_name]_pass_through_columns vars, so perhaps we should just address this first and then address the other scenario if a customer opens a ticket asking for it. Open to opinions!

Comment on lines 21 to 33
left join -- create subset of property and property_options for property in question
(select
property_option.property_option_value,
property_option.property_option_label
from {{ ref('stg_hubspot__property_option') }} as property_option
join {{ ref('stg_hubspot__property') }} as property
on property_option.property_id = property._fivetran_id
and property_option.property_option_label = property.property_label
where property.property_name = '{{ col.name.replace('property_', '') }}'
) as {{ col.name }}_option

on cast({{ cte_name }}.{{ col_alias }} as {{ dbt.type_string() }})
= cast({{ col.name }}_option.property_option_value as {{ dbt.type_string() }})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing a lot of join logic (along with a subquery) here. I could see this becoming very computationally expensive if a customer utilizes this for all their fields.

I feel it is worth calling out somewhere that this may increase the runtime of the models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design wise I cannot think of a more performant approach. For the customers that want this sort of information, this is the best way to do it in my opinion. Although I do appreciate this being disabled by default and an opt in feature for those that truly want it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was one of the reasons why I opted to have it work only on defined fields. I will add a note to the readme. I had a prior version where I did allow both passthrough all columns and passthrough all labels on, but removed it. There were a few reasons:

  1. This took a very, very long time to run on I think, for example company, and the end results weren't great either. Not all hs_property fields have useful labels, so the result was most columns had null labels. I think if a user needs to bring in labels, it will only be useful for certain columsn.
  2. It was a bit of a nightmare for the staging models that use some of the hs_cols as defacto fields, tho this is a lesser issue than above.

README.md Outdated
Comment on lines 149 to 168
For `property_hs_*` columns, you can enable the corresponding, human-readable `property_option`.`label` to be included in the staging models.
- **Note** you cannot bring in labels if using `hubspot__pass_through_all_columns: true`.`
- To enable labels for a given property, set the property attribute `use_property_label: true`, using the below format.

```yml
vars:
hubspot__ticket_pass_through_columns:
- name: "property_hs_fieldname"
alias: "fieldname"
use_property_label: true
```
Alternatively, you can enable labels for all passthrough properties by using variable `hubspot__enable_all_property_labels: true`, formatted like the below example.

```yml
vars:
hubspot__enable_all_property_labels: true
hubspot__ticket_pass_through_columns:
- name: "property_hs_fieldname1"
- name: "property_hs_fieldname2"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not allowing this to be used when hubspot__pass_through_all_columns: true is used then we should call that out here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

README.md Outdated
hubspot__ticket_pass_through_columns:
- name: "property_hs_fieldname"
alias: "fieldname"
use_property_label: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with the code. In the code you have this as the add_property_label argument. Which should it be? I imagine it would be easiest to update the documentation to match the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the appropriate updates throughout the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gosh. Yes, I will update the docs to say add_property_label.

macros/add_property_labels.sql Show resolved Hide resolved
CHANGELOG.md Outdated
Comment on lines 5 to 11
- When including a passthrough `property_hs_*` column, you now have the option to include the corresponding, human-readable label in the staging models.
- See the [Adding property label section](https://github.com/fivetran/dbt_hubspot_source#adding-property-label) of the README for instructions on how to enable this feature!
- This update applies to models:
- `stg_hubspot__company`
- `stg_hubspot__contact`
- `stg_hubspot__deal`
- `stg_hubspot__ticket`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a breaking change? This seems more like a feature. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this. I decided to make it breaking since we are adding property and property_option, which everyone may not have. I was also worried that since I was adding this macro with joins to the source that it might cause unforeseen issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay I see what you mean. I would actually request this be folded into the feature updates and then provide a blurb that leveraging this new feature you will require the property tables and if you do not have them then you will need to disable the tables and you cannot leverage this feature.

When I see breaking change I think something from the previous build is now different and can change my downstream transformations. Reading through this I don't get the impression this is a 🚨 Breaking 🚨 change, but rather a feature that will result in a 0.X.0 version bump.

models/stg_hubspot__ticket.sql Show resolved Hide resolved
Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz Thanks for the all the feedback! Definitely got me thinking. I have responded and pushed new changes. Let me know what you think!

CHANGELOG.md Outdated
Comment on lines 5 to 11
- When including a passthrough `property_hs_*` column, you now have the option to include the corresponding, human-readable label in the staging models.
- See the [Adding property label section](https://github.com/fivetran/dbt_hubspot_source#adding-property-label) of the README for instructions on how to enable this feature!
- This update applies to models:
- `stg_hubspot__company`
- `stg_hubspot__contact`
- `stg_hubspot__deal`
- `stg_hubspot__ticket`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this. I decided to make it breaking since we are adding property and property_option, which everyone may not have. I was also worried that since I was adding this macro with joins to the source that it might cause unforeseen issues.

README.md Outdated
hubspot__ticket_pass_through_columns:
- name: "property_hs_fieldname"
alias: "fieldname"
use_property_label: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gosh. Yes, I will update the docs to say add_property_label.

README.md Outdated
Comment on lines 149 to 168
For `property_hs_*` columns, you can enable the corresponding, human-readable `property_option`.`label` to be included in the staging models.
- **Note** you cannot bring in labels if using `hubspot__pass_through_all_columns: true`.`
- To enable labels for a given property, set the property attribute `use_property_label: true`, using the below format.

```yml
vars:
hubspot__ticket_pass_through_columns:
- name: "property_hs_fieldname"
alias: "fieldname"
use_property_label: true
```
Alternatively, you can enable labels for all passthrough properties by using variable `hubspot__enable_all_property_labels: true`, formatted like the below example.

```yml
vars:
hubspot__enable_all_property_labels: true
hubspot__ticket_pass_through_columns:
- name: "property_hs_fieldname1"
- name: "property_hs_fieldname2"
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

macros/add_property_labels.sql Show resolved Hide resolved
Comment on lines 3 to 15
select {{ cte_name }}.*

{%- if var(passthrough_var_name, []) != [] -%}
{%- set col_list = var(passthrough_var_name) -%}

{%- for col in col_list -%} -- Create label cols
{%- if col.add_property_label or var('hubspot__enable_all_property_labels', false) -%}
{%- set col_alias = (col.alias | default(col.name)) %}
, {{ col.name }}_option.property_option_label as {{ col_alias }}_label
{% endif -%}
{%- endfor %}

from {{ cte_name }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the whitespace control!

Comment on lines 21 to 33
left join -- create subset of property and property_options for property in question
(select
property_option.property_option_value,
property_option.property_option_label
from {{ ref('stg_hubspot__property_option') }} as property_option
join {{ ref('stg_hubspot__property') }} as property
on property_option.property_id = property._fivetran_id
and property_option.property_option_label = property.property_label
where property.property_name = '{{ col.name.replace('property_', '') }}'
) as {{ col.name }}_option

on cast({{ cte_name }}.{{ col_alias }} as {{ dbt.type_string() }})
= cast({{ col.name }}_option.property_option_value as {{ dbt.type_string() }})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was one of the reasons why I opted to have it work only on defined fields. I will add a note to the readme. I had a prior version where I did allow both passthrough all columns and passthrough all labels on, but removed it. There were a few reasons:

  1. This took a very, very long time to run on I think, for example company, and the end results weren't great either. Not all hs_property fields have useful labels, so the result was most columns had null labels. I think if a user needs to bring in labels, it will only be useful for certain columsn.
  2. It was a bit of a nightmare for the staging models that use some of the hs_cols as defacto fields, tho this is a lesser issue than above.

Comment on lines +67 to +68
), joined as (
{{ add_property_labels('hubspot__company_pass_through_columns', cte_name='fields') }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also talked about this in response to your comment on macros/add_property_labels. But I purposefully opted not to allow this. Also see scoping comments in the ticket, but I'll paste the section that also influenced me:

Other Considerations
This becomes a lot more complicated for people who have hubspot__pass_through_all_columns set to true. The customer who opened the ticket specifically asked for this to be incorporated into the individual hubspot__[table_name]_pass_through_columns vars, so perhaps we should just address this first and then address the other scenario if a customer opens a ticket asking for it. Open to opinions!

@@ -0,0 +1,40 @@
{{ config(enabled=var('hubspot_sales_enabled', False)) }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The property tables are part of the sales schema, but thinking about it, it should have its own variable. Not everyone has this table, but maybe should not be tied to sales as a whole. I have updated vars, readme and changelog with this. I had initially agreed with the scoping:

Source package

  1. Add property_option table in (make staging models, seed files, etc)
  2. This table is are part of the Sales schema, so add {{ config(enabled=var('hubspot_sales_enabled', True)) }} to the staging models(tmp and non-_tmp) and the source config.enabled. We may want to also add a hubspot_property_option_enabled variable (2/3 of customers have the property tables, but something tells me the sales_enabled flag may cover this)

models/stg_hubspot__ticket.sql Show resolved Hide resolved
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz This looks good to me following the recent updates!! I am happy to approve this, but have one request to change the CHANGELOG to fold the breaking change callout into the feature updates but have a small 🚨 breaking notice for users who do not have the property tables and also it seems the version bump was not updated in the README. Be sure to update that before opening for release review.

Great job on this PR!

CHANGELOG.md Outdated
Comment on lines 5 to 11
- When including a passthrough `property_hs_*` column, you now have the option to include the corresponding, human-readable label in the staging models.
- See the [Adding property label section](https://github.com/fivetran/dbt_hubspot_source#adding-property-label) of the README for instructions on how to enable this feature!
- This update applies to models:
- `stg_hubspot__company`
- `stg_hubspot__contact`
- `stg_hubspot__deal`
- `stg_hubspot__ticket`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay I see what you mean. I would actually request this be folded into the feature updates and then provide a blurb that leveraging this new feature you will require the property tables and if you do not have them then you will need to disable the tables and you cannot leverage this feature.

When I see breaking change I think something from the previous build is now different and can change my downstream transformations. Reading through this I don't get the impression this is a 🚨 Breaking 🚨 change, but rather a feature that will result in a 0.X.0 version bump.

macros/add_property_labels.sql Show resolved Hide resolved
models/stg_hubspot__ticket.sql Show resolved Hide resolved
@@ -104,6 +104,7 @@ vars:
hubspot_engagement_note_enabled: false
hubspot_engagement_task_enabled: false
hubspot_owner_enabled: false
hubspot_property_enabled: false # Disables property and property_option tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to bump the version in the README

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated.

@fivetran-catfritz fivetran-catfritz merged commit ac7706b into main Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants