Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query stuck in PLANNING indefinitely #4638

Closed
rmgpinto opened this issue Jul 30, 2020 · 4 comments · Fixed by #4635
Closed

Query stuck in PLANNING indefinitely #4638

rmgpinto opened this issue Jul 30, 2020 · 4 comments · Fixed by #4635
Labels
bug Something isn't working RELEASE-BLOCKER

Comments

@rmgpinto
Copy link

Hi,

As @findepi requested on slack, I'm creating this issue.

I'm executing a query and it's stuck on PLANNING.

dump.log: https://gist.github.com/rmgpinto/68d6e1bcf4c986fde3995a35a1b28a6b
query:

SELECT
    'insights_' || ms.account_id || '_' || ms.campaign_id || '.json' AS filename,
    CAST(ms.date_start AS DATE) AS date_id,
    5 AS marketing_platform_id,
    CAST(ms.account_id AS VARCHAR) AS marketing_ad_account_id,
    CAST(ms.campaign_id AS VARCHAR) AS marketing_campaign_id,
    CASE WHEN ev_mapping.original_event_id IS NOT NULL AND DATE_DIFF('day', DATE_PARSE(ev_mapping.original_event_date_from, '%Y%m%d'), CAST(ms.date_start AS DATE)) <= 0
         THEN oe.event_id
         WHEN CAST(ms.date_start AS DATE) >= next_year_ev.event_date AND two_years_ahead_ev.event_id != 'unknown'
         THEN two_years_ahead_ev.event_id
         WHEN CAST(ms.date_start AS DATE) >= e.event_date AND next_year_ev.event_id != 'unknown'
         THEN next_year_ev.event_id
         WHEN CAST(ms.date_start AS DATE) < prev_year_ev.event_date AND prev_year_ev.event_id != 'unknown'
         THEN prev_year_ev.event_id
         ELSE COALESCE(e.event_id, 'unknown')
         END AS event_id,
    COALESCE(mc.classification_id, 'unknown_unknown') AS classification_id,
    cur.currency_id AS currency_id,
    COALESCE(cnt.country_id, '---') AS reach_country_id,
    CAST(ms.reach AS INTEGER) / COALESCE(umcg.conversions_adjustment, 1) AS reach,
    CAST(ms.inline_link_clicks AS INTEGER) / COALESCE(umcg.conversions_adjustment, 1) AS clicks,
    CAST(ms.impressions AS INTEGER) / COALESCE(umcg.conversions_adjustment, 1) AS impressions,
    CAST(ms.spend AS DOUBLE PRECISION) / COALESCE(umcg.conversions_adjustment, 1) AS campaign_spend,
    CASE WHEN cur.currency_id = 1
         THEN CAST(ms.spend AS DOUBLE PRECISION) / COALESCE(umcg.conversions_adjustment, 1)
         ELSE CAST(ms.spend AS DOUBLE PRECISION) / COALESCE(umcg.conversions_adjustment, 1) / COALESCE(ex.rate__c, 1)
         END AS campaign_spend_eur,
    mc.marketing_conversion_id AS campaign_conversion_id,
    CAST(umc.value AS INTEGER) AS campaign_conversion_value,
    CASE WHEN ev_mapping.original_event_id IS NOT NULL AND DATE_DIFF('day', DATE_PARSE(ev_mapping.original_event_date_from, '%Y%m%d'), CAST(ms.date_start AS DATE)) <= 0
         THEN DATE_DIFF('day', CAST(ms.date_start AS DATE), DATE_PARSE(ev_mapping.original_event_date_from, '%Y%m%d'))
         WHEN CAST(ms.date_start AS DATE) >= next_year_ev.event_date AND two_years_ahead_ev.event_id != 'unknown'
         THEN DATE_DIFF('day', CAST(ms.date_start AS DATE), two_years_ahead_ev.event_date)
         WHEN CAST(ms.date_start AS DATE) >= e.event_date AND next_year_ev.event_id != 'unknown'
         THEN DATE_DIFF('day', CAST(ms.date_start AS DATE), next_year_ev.event_date)
         WHEN CAST(ms.date_start AS DATE) < prev_year_ev.event_date AND prev_year_ev.event_id != 'unknown'
         THEN DATE_DIFF('day', CAST(ms.date_start AS DATE), prev_year_ev.event_date)
         ELSE DATE_DIFF('day', CAST(ms.date_start AS DATE), e.event_date)
         END AS days_to_event,
    CASE WHEN ev_mapping.original_event_id IS NOT NULL AND DATE_DIFF('day', DATE_PARSE(ev_mapping.original_event_date_from, '%Y%m%d'), CAST(ms.date_start AS DATE)) <= 0
         THEN DATE_DIFF('week', CAST(ms.date_start AS DATE), DATE_PARSE(ev_mapping.original_event_date_from, '%Y%m%d'))
         WHEN CAST(ms.date_start AS DATE) >= next_year_ev.event_date AND two_years_ahead_ev.event_id != 'unknown'
         THEN DATE_DIFF('week', CAST(ms.date_start AS DATE), two_years_ahead_ev.event_date)
         WHEN CAST(ms.date_start AS DATE) >= e.event_date AND next_year_ev.event_id != 'unknown'
         THEN DATE_DIFF('week', CAST(ms.date_start AS DATE), next_year_ev.event_date)
         WHEN CAST(ms.date_start AS DATE) < prev_year_ev.event_date AND prev_year_ev.event_id != 'unknown'
         THEN DATE_DIFF('week', CAST(ms.date_start AS DATE), prev_year_ev.event_date)
         ELSE DATE_DIFF('week', CAST(ms.date_start AS DATE), e.event_date)
         END AS weeks_to_event,
    CURRENT_DATE AS last_ingestion_date
FROM "data_lake"."facebook_ads"."insights" ms
LEFT JOIN "data_lake"."facebook_ads"."ad_accounts" a ON ms.account_id = a.account_id
LEFT JOIN "data_lake"."facebook_ads"."campaigns" c ON a.account_id = c.account_id AND ms.campaign_id = c.id
LEFT JOIN "data_lake"."staging"."marketing_campaign_name_matching" ev_mapping ON a.account_id = ev_mapping.marketing_ad_account_id AND c.id = ev_mapping.marketing_campaign_id
LEFT JOIN "data_lake"."data_warehouse"."d_events" e ON ev_mapping.event_id = e.event_id
LEFT JOIN "data_lake"."data_warehouse"."d_events" oe ON ev_mapping.original_event_id = oe.event_id
LEFT JOIN "data_lake"."data_warehouse"."d_events" next_year_ev ON e.next_year_event_id = next_year_ev.event_id
LEFT JOIN "data_lake"."data_warehouse"."d_events" two_years_ahead_ev ON next_year_ev.next_year_event_id = two_years_ahead_ev.event_id
LEFT JOIN "data_lake"."data_warehouse"."d_events" prev_year_ev ON e.previous_year_event_id = prev_year_ev.event_id
LEFT JOIN "data_lake"."data_warehouse"."i_marketing_classifications" mc ON LOWER(REPLACE(SPLIT_PART(COALESCE(REGEXP_EXTRACT(marketing_campaign_name, '(Ac.*?ty)=(.*?)(?=&)'), REGEXP_EXTRACT(marketing_campaign_name, '(Ac.*?ty)=(.*?)$')), '=', 2), ' ', '')) = LOWER(mc.activity)
LEFT JOIN "data_lake"."data_warehouse"."d_countries" cnt ON ms.country = cnt.country_iso2
LEFT JOIN "data_lake"."data_warehouse"."d_currencies" cur ON a.currency = cur.currency_iso3
LEFT JOIN "data_lake"."salesforce"."dated_exchange_rate__c" ex ON YEAR(CAST(ms.date_start AS DATE)) = YEAR(CAST(DATE_PARSE(ex.applied_month__c, '%M %Y') AS DATE))
                                                            AND MONTH(CAST(ms.date_start AS DATE)) = MONTH(CAST(DATE_PARSE(ex.applied_month__c, '%M %Y') AS DATE))
                                                            AND cur.currency_iso3 = ex.currency__c
LEFT JOIN "data_lake"."staging"."marketing_facebook_ads_conversions" umc ON ms.date_start = umc.date_start AND a.account_id = umc.account_id AND c.id = umc.campaign_id AND ms.country = umc.country
LEFT JOIN "data_lake"."staging"."marketing_facebook_ads_conversions_grouped" umcg ON ms.date_start = umcg.date_start AND a.account_id = umcg.account_id AND c.id = umcg.campaign_id AND ms.country = umcg.country
LEFT JOIN "data_lake"."data_warehouse"."d_marketing_conversions" mc ON umc.action_type = mc.marketing_conversion_id
WHERE LOWER(c.name) LIKE '%instagram%'

suspect: #4632

@findepi findepi added RELEASE-BLOCKER bug Something isn't working labels Jul 30, 2020
@findepi
Copy link
Member

findepi commented Jul 30, 2020

Thanks @rmgpinto for creating the issue.

Can you also provide schemas (SHOW CREATE TABLE) of the tables involved?

Also, would you be able to add a regression test cases in presto-product-tests/src/main/resources/sql-tests/testcases?

@findepi
Copy link
Member

findepi commented Jul 30, 2020

Also, @rmgpinto, would you be able to give #4635 a try?

@rmgpinto
Copy link
Author

@findepi that PR fixes it.
Thanks

@kasiafi
Copy link
Member

kasiafi commented Jul 31, 2020

I can confirm that #4635 fixes the issue.

It can be reduced in the following way:

ExchangeNode: output layout [b, a]
- single source node: output layout [a, a]

In UnaliasSymbolReferences, first ExchangeNode outputs are mapped to ExchangeNode inputs and symbol mapping b -> a is created. This is the mapping strategy intended for single-source ExchangeNode.

Then, symbol mappings are created within ExchangeNode output symbols. Mapping b <- a is created, because output symbols b and a both correspond to same input lists (i.e. [a]). This is the mapping strategy intended for multiple-source ExchangeNode.

After applying both mapping strategies, we get a mapping with a cycle:
a -> b, b -> a,
which causes the Optimizer loop.

The solution proposed in #4635 is to not apply both strategies. It reverts a bug introduced while refactoring the class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working RELEASE-BLOCKER
Development

Successfully merging a pull request may close this issue.

3 participants