-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Empower Connector #902
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
Empower | ||
======= | ||
|
||
******** | ||
Overview | ||
******** | ||
|
||
The Empower class allows you to interact with the Empower API. Documentation for the Empower API can be found | ||
in their `GitHub <https://github.com/getempower/api-documentation/blob/master/README.md>`_ repo. | ||
|
||
.. note:: | ||
The Empower API only has a single endpoint to access all account data. As such, it has a very high overhead. This | ||
connector employs caching in order to allow the user to specify the tables to extract without additional API calls. | ||
You can disable caching as an argument when instantiating the class. | ||
|
||
========== | ||
Quickstart | ||
========== | ||
|
||
To instantiate the Empower class, you can either store your ``EMPOWER_API_KEY`` an environment | ||
variables or pass them in as arguments: | ||
|
||
.. code-block:: python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be nice to have an example of disabling caching in the quickstart There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can do. |
||
|
||
from parsons import Empower | ||
|
||
# First approach: Use API key environment variables | ||
|
||
# In bash, set your environment variables like so: | ||
# export EMPOWER_API_KEY='MY_API_KEY' | ||
empower = Empower() | ||
|
||
# Second approach: Pass API keys as arguments | ||
empower = Empower(api_key='MY_API_KEY') | ||
|
||
You can then request tables in the following manner: | ||
|
||
.. code-block:: python | ||
|
||
tbl = empower.get_profiles() | ||
|
||
*** | ||
API | ||
*** | ||
|
||
.. autoclass :: parsons.Empower | ||
:inherited-members: |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -199,6 +199,7 @@ Indices and tables | |
crowdtangle | ||
databases | ||
donorbox | ||
empower | ||
facebook_ads | ||
freshdesk | ||
github | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from parsons.empower.empower import Empower | ||
|
||
__all__ = ["Empower"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,256 @@ | ||
from parsons.utilities.api_connector import APIConnector | ||
from parsons.utilities import check_env | ||
from parsons.etl import Table | ||
import logging | ||
from datetime import datetime | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
EMPOWER_API_ENDPOINT = "https://api.getempower.com/v1/export" | ||
|
||
|
||
class Empower(object): | ||
""" | ||
Instantiate class. | ||
|
||
`Args:` | ||
api_key: str | ||
The Empower provided API key.The Empower provided Client UUID. Not | ||
required if ``EMPOWER_API_KEY`` env variable set. | ||
empower_uri: str | ||
The URI to access the Empower API. The default is currently set to | ||
https://api.getempower.com/v1/export. You can set an ``EMPOWER_URI`` env | ||
variable or use this URI parameter if a different endpoint is necessary. | ||
cache: boolean | ||
The Empower API returns all account data after each call. Setting cache | ||
to ``True`` stores the blob and then extracts Parsons tables for each method. | ||
Setting cache to ``False`` will download all account data for each method call. | ||
""" | ||
|
||
def __init__(self, api_key=None, empower_uri=None, cache=True): | ||
|
||
self.api_key = check_env.check("EMPOWER_API_KEY", api_key) | ||
self.empower_uri = ( | ||
check_env.check("EMPOWER_URI", empower_uri, optional=True) | ||
or EMPOWER_API_ENDPOINT | ||
) | ||
self.headers = {"accept": "application/json", "secret-token": self.api_key} | ||
self.client = APIConnector( | ||
self.empower_uri, | ||
headers=self.headers, | ||
) | ||
self.data = None | ||
self.data = self._get_data(cache) | ||
|
||
def _get_data(self, cache): | ||
""" | ||
Gets fresh data from Empower API based on cache setting. | ||
""" | ||
|
||
if not cache or self.data is None: | ||
r = self.client.get_request(self.empower_uri) | ||
logger.info("Empower data downloaded.") | ||
return r | ||
|
||
else: | ||
return self.data | ||
|
||
def _unix_convert(self, ts): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can ignore this, Justin, but just flagging that this is a potential utility method (see #554 ) |
||
""" | ||
Converts UNIX timestamps to readable timestamps. | ||
""" | ||
|
||
ts = datetime.utcfromtimestamp(int(ts) / 1000) | ||
ts = ts.strftime("%Y-%m-%d %H:%M:%S UTC") | ||
return ts | ||
|
||
def _empty_obj(self, obj_name): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there is a reason this is a separate method rather than just using this code in the one place it's referenced? You could write it with just that first line - the return True and return False aren't necessary. |
||
""" | ||
Determine if a dict object is empty. | ||
""" | ||
|
||
if len(self.data[obj_name]) == 0: | ||
return True | ||
else: | ||
return False | ||
|
||
def get_profiles(self): | ||
""" | ||
Get Empower profiles. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["profiles"]) | ||
for col in ["createdMts", "lastUsedEmpowerMts", "updatedMts"]: | ||
tbl.convert_column(col, lambda x: self._unix_convert(x)) | ||
tbl.remove_column( | ||
"activeCtaIds" | ||
) # Get as a method via get_profiles_active_ctas | ||
return tbl | ||
|
||
def get_profiles_active_ctas(self): | ||
""" | ||
Get active ctas assigned to Empower profiles. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["profiles"]).long_table("eid", "activeCtaIds") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, this documentation includes a pretty decent example. Basically, it takes a column that has a nested JSON in it -- in this case a list of active call to action ids -- and it creates a new table that is one row for each call to action id and the id of the profile. Then, if you are storing in a DB, you can easily join the two together. However, this transformation and, many others in parson, was built before JSONs in BigQuery/Redshift became a more widespread thing. So, in a world in which you can store JSONs and query the elements in SQL, this may no longer make sense. Tldr: Should parson's connectors be extracting nested JSONs? |
||
return tbl | ||
|
||
def get_regions(self): | ||
""" | ||
Get Empower regions. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["regions"]) | ||
tbl.convert_column("inviteCodeCreatedMts", lambda x: self._unix_convert(x)) | ||
return tbl | ||
|
||
def get_cta_results(self): | ||
""" | ||
Get Empower call to action results. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["ctaResults"]) | ||
tbl.convert_column("contactedMts", lambda x: self._unix_convert(x)) | ||
tbl = tbl.unpack_nested_columns_as_rows( | ||
"answerIdsByPromptId", key="profileEid", expand_original=True | ||
) | ||
tbl.unpack_list("answerIdsByPromptId_value", replace=True) | ||
col_list = [v for v in tbl.columns if v.find("value") != -1] | ||
tbl.coalesce_columns("answer_id", col_list, remove_source_columns=True) | ||
tbl.remove_column("uid") | ||
tbl.remove_column("answers") # Per docs, this is deprecated. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that remove_column errors if the column name isn't found. Is it possible that this column name will stop getting returned? Or do you mean it's deprecated in some other way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This column is still being returned by Empower, but their docs indicate that it isn't being used for anything any longer. So, I decided to not surface it for the user. |
||
return tbl | ||
|
||
def _split_ctas(self): | ||
""" | ||
Internal method to split CTA objects into tables. | ||
""" | ||
|
||
ctas = Table(self.data["ctas"]) | ||
for col in [ | ||
"createdMts", | ||
"scheduledLaunchTimeMts", | ||
"updatedMts", | ||
"activeUntilMts", | ||
]: | ||
ctas.convert_column(col, lambda x: self._unix_convert(x)) | ||
ctas.remove_column("regionIds") # Get as a table via get_cta_regions() | ||
ctas.remove_column("shareables") # Get as a table via get_cta_shareables() | ||
ctas.remove_column( | ||
"prioritizations" | ||
) # Get as a table via get_cta_prioritizations() | ||
ctas.remove_column("questions") # This column has been deprecated. | ||
cta_prompts = ctas.long_table( | ||
"id", "prompts", prepend=False, retain_original=False | ||
) | ||
cta_prompts.remove_column("ctaId") | ||
cta_prompt_answers = cta_prompts.long_table("id", "answers", prepend=False) | ||
|
||
return [ctas, cta_prompts, cta_prompt_answers] | ||
|
||
def get_ctas(self): | ||
""" | ||
Get Empower calls to action. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
return self._split_ctas()[0] | ||
|
||
def get_cta_prompts(self): | ||
""" | ||
Get Empower calls to action prompts. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
return self._split_ctas()[1] | ||
|
||
def get_cta_prompt_answers(self): | ||
""" | ||
Get Empower calls to action prompt answers. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
return self._split_ctas()[2] | ||
|
||
def get_cta_regions(self): | ||
""" | ||
Get a list of regions that each call to active is active in. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["ctas"]).long_table("id", "regionIds") | ||
return tbl | ||
|
||
def get_cta_shareables(self): | ||
""" | ||
Get a list of shareables associated with calls to action. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["ctas"]).long_table("id", "shareables") | ||
return tbl | ||
|
||
def get_cta_prioritizations(self): | ||
""" | ||
Get a list prioritizations associated with calls to action. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["ctas"]).long_table("id", "prioritizations") | ||
return tbl | ||
|
||
def get_outreach_entries(self): | ||
""" | ||
Get outreach entries. | ||
|
||
`Returns:` | ||
Parsons Table | ||
See :ref:`parsons-table` for output options. | ||
""" | ||
if self._empty_obj("outreachEntries"): | ||
logger.info("No Outreach Entries found.") | ||
return Table([]) | ||
|
||
tbl = Table(self.data["outreachEntries"]) | ||
for col in [ | ||
"outreachCreatedMts", | ||
"outreachSnoozeUntilMts", | ||
"outreachScheduledFollowUpMts", | ||
]: | ||
tbl.convert_column(col, lambda x: self._unix_convert(x)) | ||
logger.info(f"Unable to find column {col}") | ||
return tbl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a personal question, coming from lack of experience, but wondering what caching is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there is only one endpoint and it grabs a giant JSON blob that is a bit of a pain to parse through and convert to a tabular-ish format. I've broken up that JSON into multiple functions so that folks can grab the bits of data that they want.
However, that means that every single time you call it function, it is grabbing a lot of extraneous data. So, the caching just stores the blob and extracts from it rather than calling the Empower server again.
At the time, it seemed like a cute idea, but perhaps its unnecessary.