Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/tiktok #68

Merged
merged 29 commits into from
Dec 18, 2024
Merged

Feat/tiktok #68

merged 29 commits into from
Dec 18, 2024

Conversation

sanjushahgupta
Copy link
Collaborator

@sanjushahgupta sanjushahgupta commented Dec 10, 2024

Following changes are made in this PR:

  • Added custom_report resource for the TikTok source
  • Users can fetch custom reports based on dimensions and metrics
  • Incremental loading is supported if the user provides the stat_time dimension
  • Pagination is supported

Note:
Filter logic is not yet implemented and will be added soon

@sanjushahgupta sanjushahgupta marked this pull request as ready for review December 16, 2024 21:11
and "ad_id" not in dimensions
):
raise ValueError(
"You must provide one ID dimension. Please use one ID dimension from the following options: [AUCTION_ADVERTISER, AUCTION_AD, AUCTION_CAMPAIGN, AUCTION_ADGROUP]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest writing the actual dimension names here, this error message is very confusing

Comment on lines 1037 to 1043
dimensions = fields[1].split(",")
if (
"campaign_id" not in dimensions
and "advertiser_id" not in dimensions
and "adgroup_id" not in dimensions
and "ad_id" not in dimensions
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be that the user puts space in between by accident? e.g. custom:campaign_id ,stat_time_day:clicks, in which case this would fail. I suggest cleaning them up first.

Comment on lines 1021 to 1034
if kwargs.get("interval_start"):
start_date = ensure_pendulum_datetime(
str(kwargs.get("interval_start"))
).in_tz(time_zone[0])
else:
Default_date = pendulum.now().subtract(days=90)
start_date = ensure_pendulum_datetime(Default_date).in_tz(time_zone[0])

if kwargs.get("interval_end"):
end_date = ensure_pendulum_datetime(str(kwargs.get("interval_end"))).in_tz(
time_zone[0]
)
else:
end_date = ensure_pendulum_datetime(pendulum.now()).in_tz(time_zone[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if kwargs.get("interval_start"):
start_date = ensure_pendulum_datetime(
str(kwargs.get("interval_start"))
).in_tz(time_zone[0])
else:
Default_date = pendulum.now().subtract(days=90)
start_date = ensure_pendulum_datetime(Default_date).in_tz(time_zone[0])
if kwargs.get("interval_end"):
end_date = ensure_pendulum_datetime(str(kwargs.get("interval_end"))).in_tz(
time_zone[0]
)
else:
end_date = ensure_pendulum_datetime(pendulum.now()).in_tz(time_zone[0])
start_date = pendulum.now().subtract(days=90).in_tz(time_zone[0])
end_date = ensure_pendulum_datetime(pendulum.now()).in_tz(time_zone[0])
if kwargs.get("interval_start"):
start_date = ensure_pendulum_datetime(kwargs.get("interval_start")).in_tz(time_zone[0])
if kwargs.get("interval_end"):
end_date = ensure_pendulum_datetime(kwargs.get("interval_end")).in_tz(
time_zone[0]
)

"data_level": data_level,
"start_date": start_time,
"end_date": end_time,
"page_size": 1000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this configurable? ideally we should get it from the --page-size variable in ingestr


while True:
self.params["page"] = current_page
response = create_client().get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create the client in every step of the loop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to create a client every time. It's my mistake. I have corrected that.

Comment on lines 102 to 104
flat_structure(items=items, time_zone=self.time_zone)

all_items.extend(items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest yielding these pages instead of collecting them, otherwise we might run into memory issues

@@ -37,6 +38,7 @@
from ingestr.src.slack import slack_source
from ingestr.src.stripe_analytics import stripe_source
from ingestr.src.table_definition import table_string_to_dataclass
from ingestr.src.tiktok_ads._init_ import tiktok_source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change ingestr/src/tiktok_ads/_init_.py to ingestr/src/tiktok_ads/__init__.py, you will be able to treat tiktok_ads as a python module.

That way, the import statement can change to:

from ingestr.src.tiktok_ads import tiktok_source

endpoint = "custom_reports"

parsed_uri = urlparse(uri)
source_fields = parse_qs(parsed_uri.query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can tiktok access token contain =, + or & ? We may need to escape those.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this I am not sure. We have not escaped them before, so we need to discuss this with the team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with leaving this as is.

str(kwargs.get("interval_start"))
).in_tz(time_zone[0])
else:
Default_date = pendulum.now().subtract(days=90)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a default start date in order to be able to load data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need it in this case because the API expects a start and end date.

endpoint = "custom_reports"

parsed_uri = urlparse(uri)
source_fields = parse_qs(parsed_uri.query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with leaving this as is.

@karakanb karakanb merged commit c5f31b0 into main Dec 18, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants