Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Empower Connector #902

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions docs/empower.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Empower
=======

********
Overview
********

The Empower class allows you to interact with the Empower API. Documentation for the Empower API can be found
in their `GitHub <https://github.com/getempower/api-documentation/blob/master/README.md>`_ repo.

.. note::
The Empower API only has a single endpoint to access all account data. As such, it has a very high overhead. This
connector employs caching in order to allow the user to specify the tables to extract without additional API calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a personal question, coming from lack of experience, but wondering what caching is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is only one endpoint and it grabs a giant JSON blob that is a bit of a pain to parse through and convert to a tabular-ish format. I've broken up that JSON into multiple functions so that folks can grab the bits of data that they want.

However, that means that every single time you call it function, it is grabbing a lot of extraneous data. So, the caching just stores the blob and extracts from it rather than calling the Empower server again.

At the time, it seemed like a cute idea, but perhaps its unnecessary.

You can disable caching as an argument when instantiating the class.

==========
Quickstart
==========

To instantiate the Empower class, you can either store your ``EMPOWER_API_KEY`` an environment
variables or pass them in as arguments:

.. code-block:: python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to have an example of disabling caching in the quickstart

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do.


from parsons import Empower

# First approach: Use API key environment variables

# In bash, set your environment variables like so:
# export EMPOWER_API_KEY='MY_API_KEY'
empower = Empower()

# Second approach: Pass API keys as arguments
empower = Empower(api_key='MY_API_KEY')

You can then request tables in the following manner:

.. code-block:: python

tbl = empower.get_profiles()

***
API
***

.. autoclass :: parsons.Empower
:inherited-members:
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ Indices and tables
crowdtangle
databases
donorbox
empower
facebook_ads
freshdesk
github
Expand Down
1 change: 1 addition & 0 deletions parsons/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
("parsons.turbovote.turbovote", "TurboVote"),
("parsons.twilio.twilio", "Twilio"),
("parsons.zoom.zoom", "Zoom"),
("parsons.empower.empower", "Empower"),
):
try:
globals()[connector_name] = getattr(
Expand Down
3 changes: 3 additions & 0 deletions parsons/empower/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from parsons.empower.empower import Empower

__all__ = ["Empower"]
256 changes: 256 additions & 0 deletions parsons/empower/empower.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
from parsons.utilities.api_connector import APIConnector
from parsons.utilities import check_env
from parsons.etl import Table
import logging
from datetime import datetime

logger = logging.getLogger(__name__)

EMPOWER_API_ENDPOINT = "https://api.getempower.com/v1/export"


class Empower(object):
"""
Instantiate class.

`Args:`
api_key: str
The Empower provided API key.The Empower provided Client UUID. Not
required if ``EMPOWER_API_KEY`` env variable set.
empower_uri: str
The URI to access the Empower API. The default is currently set to
https://api.getempower.com/v1/export. You can set an ``EMPOWER_URI`` env
variable or use this URI parameter if a different endpoint is necessary.
cache: boolean
The Empower API returns all account data after each call. Setting cache
to ``True`` stores the blob and then extracts Parsons tables for each method.
Setting cache to ``False`` will download all account data for each method call.
"""

def __init__(self, api_key=None, empower_uri=None, cache=True):

self.api_key = check_env.check("EMPOWER_API_KEY", api_key)
self.empower_uri = (
check_env.check("EMPOWER_URI", empower_uri, optional=True)
or EMPOWER_API_ENDPOINT
)
self.headers = {"accept": "application/json", "secret-token": self.api_key}
self.client = APIConnector(
self.empower_uri,
headers=self.headers,
)
self.data = None
self.data = self._get_data(cache)

def _get_data(self, cache):
"""
Gets fresh data from Empower API based on cache setting.
"""

if not cache or self.data is None:
r = self.client.get_request(self.empower_uri)
logger.info("Empower data downloaded.")
return r

else:
return self.data

def _unix_convert(self, ts):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore this, Justin, but just flagging that this is a potential utility method (see #554 )

"""
Converts UNIX timestamps to readable timestamps.
"""

ts = datetime.utcfromtimestamp(int(ts) / 1000)
ts = ts.strftime("%Y-%m-%d %H:%M:%S UTC")
return ts

def _empty_obj(self, obj_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there is a reason this is a separate method rather than just using this code in the one place it's referenced? You could write it with just that first line - the return True and return False aren't necessary.

"""
Determine if a dict object is empty.
"""

if len(self.data[obj_name]) == 0:
return True
else:
return False

def get_profiles(self):
"""
Get Empower profiles.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["profiles"])
for col in ["createdMts", "lastUsedEmpowerMts", "updatedMts"]:
tbl.convert_column(col, lambda x: self._unix_convert(x))
tbl.remove_column(
"activeCtaIds"
) # Get as a method via get_profiles_active_ctas
return tbl

def get_profiles_active_ctas(self):
"""
Get active ctas assigned to Empower profiles.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["profiles"]).long_table("eid", "activeCtaIds")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long_table is a new one on me! Mind sharing what it is and why you're using it here? Thank you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this documentation includes a pretty decent example. Basically, it takes a column that has a nested JSON in it -- in this case a list of active call to action ids -- and it creates a new table that is one row for each call to action id and the id of the profile. Then, if you are storing in a DB, you can easily join the two together.

However, this transformation and, many others in parson, was built before JSONs in BigQuery/Redshift became a more widespread thing. So, in a world in which you can store JSONs and query the elements in SQL, this may no longer make sense. Tldr: Should parson's connectors be extracting nested JSONs?

return tbl

def get_regions(self):
"""
Get Empower regions.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["regions"])
tbl.convert_column("inviteCodeCreatedMts", lambda x: self._unix_convert(x))
return tbl

def get_cta_results(self):
"""
Get Empower call to action results.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["ctaResults"])
tbl.convert_column("contactedMts", lambda x: self._unix_convert(x))
tbl = tbl.unpack_nested_columns_as_rows(
"answerIdsByPromptId", key="profileEid", expand_original=True
)
tbl.unpack_list("answerIdsByPromptId_value", replace=True)
col_list = [v for v in tbl.columns if v.find("value") != -1]
tbl.coalesce_columns("answer_id", col_list, remove_source_columns=True)
tbl.remove_column("uid")
tbl.remove_column("answers") # Per docs, this is deprecated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that remove_column errors if the column name isn't found. Is it possible that this column name will stop getting returned? Or do you mean it's deprecated in some other way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This column is still being returned by Empower, but their docs indicate that it isn't being used for anything any longer. So, I decided to not surface it for the user.

return tbl

def _split_ctas(self):
"""
Internal method to split CTA objects into tables.
"""

ctas = Table(self.data["ctas"])
for col in [
"createdMts",
"scheduledLaunchTimeMts",
"updatedMts",
"activeUntilMts",
]:
ctas.convert_column(col, lambda x: self._unix_convert(x))
ctas.remove_column("regionIds") # Get as a table via get_cta_regions()
ctas.remove_column("shareables") # Get as a table via get_cta_shareables()
ctas.remove_column(
"prioritizations"
) # Get as a table via get_cta_prioritizations()
ctas.remove_column("questions") # This column has been deprecated.
cta_prompts = ctas.long_table(
"id", "prompts", prepend=False, retain_original=False
)
cta_prompts.remove_column("ctaId")
cta_prompt_answers = cta_prompts.long_table("id", "answers", prepend=False)

return [ctas, cta_prompts, cta_prompt_answers]

def get_ctas(self):
"""
Get Empower calls to action.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

return self._split_ctas()[0]

def get_cta_prompts(self):
"""
Get Empower calls to action prompts.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

return self._split_ctas()[1]

def get_cta_prompt_answers(self):
"""
Get Empower calls to action prompt answers.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

return self._split_ctas()[2]

def get_cta_regions(self):
"""
Get a list of regions that each call to active is active in.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["ctas"]).long_table("id", "regionIds")
return tbl

def get_cta_shareables(self):
"""
Get a list of shareables associated with calls to action.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["ctas"]).long_table("id", "shareables")
return tbl

def get_cta_prioritizations(self):
"""
Get a list prioritizations associated with calls to action.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["ctas"]).long_table("id", "prioritizations")
return tbl

def get_outreach_entries(self):
"""
Get outreach entries.

`Returns:`
Parsons Table
See :ref:`parsons-table` for output options.
"""
if self._empty_obj("outreachEntries"):
logger.info("No Outreach Entries found.")
return Table([])

tbl = Table(self.data["outreachEntries"])
for col in [
"outreachCreatedMts",
"outreachSnoozeUntilMts",
"outreachScheduledFollowUpMts",
]:
tbl.convert_column(col, lambda x: self._unix_convert(x))
logger.info(f"Unable to find column {col}")
return tbl
Loading