Recently we conducted several customer development interviews asking our potential clients how they analyze their customer feedback (let’s say their product reviews) and what kind of tool could be helpful for this. We realized that we need to build a very simple product that would allow them to track experience of their customers by giving them a dashboard that displays metrics based on their customer feedback.
To build an MVP we gathered a dataset based on public data which represents Google Play and Apple App Store reviews for various mobile apps popular in the UK. We applied our machine learning algorithms to this data to extract themes (which are various aspects of our client’s business like payments, delivery, product quality etc.) of each review and sentiments of those themes.
For our MVP we need to build a frontend with a single screen that would display couple charts and a backend that would serve API for our frontend.
You received dataset.zip
which contains JSON representation of our dataset. Please use these files to populate data storage of your choice.
Dashboard screen is the only screen of our frontend and it contains filters and charts. Here are the filters we need to implement: ● Filter by theme ● Filter by category ● Filter by phrase in a product review
In theme and category filters user can select one of the human-readable options from dropdown. Phrase filter is a text input where we can type an arbitrary substring of a comment field of a product review and thus filter reviews containing this substring. When we add or remove a filter our charts are getting updated. We need to display 2 charts. One of them displays average sentiment with breakdown by category and another by theme. We recommend to use Highcharts JS to display these charts.
Besides API that your frontend needs you will need to implement one additional API endpoint that will be used by our clients to provide new reviews. This method creates a new review in your database and its request will contain a comment and themes with their sentiments. We will use this endpoint to populate your database with more reviews and test how your API performs for bigger datasets than the one we provided.
Feel free to choose any API protocol that feels right to you for this task as well as any database.
Keep in mind that one of the highest priorities for us is responsiveness of our charts because this is an analytical tool that will be used by our clients and they will be playing with different combinations of filters to find insights they need and they expect to see results right after updating the filter.
Another obvious requirement is a durability of data - our clients rely on us in understanding of how their customers feel and we think every single customer should be heard. For us it means we need to reduce the probability of losing any events in our system to the bare minimum.
This application is composed of 4 components:
- Dashboard - A React app that presentes the charts and filters as requested.
- Api Service - An Elixir application using the Phoenix framework which provides the endpoints for the frontend. It assumes you have
Elixir 1.9.4
andErlang/OTP 22
. - Message Broker - A RabbitMQ server to deal with the reviews creation
- Relational Database - A postgresql server to store the data.
The Phoenix application is an umbrella app under the service
directory. It means the real applications are under the service/apps
directory. There you will find two applications:
chattermill_review_service
-> Domain application which contains the business logic and domain objects. You will also find here the review_amqp_worker, responsible for publishing and consuming the messages from rabbitMQ.chattermill_review_service_web
-> Holds the controller logic for the endpoints we provided. This app depends on the domain objects from thechattermill_review_service
.
The dashboard app is a simple react app. The api requests it makes will point to http://localhost:4000
by default. You can change that in the dashboard/.env.development
file. You can run the tests with the yarn test
command.
You can start the app with the following commands:
yarn install
yarn start
It should open the app in your browser in http://localhost:3000
.
You will need a RabbitMQ and a Postgres services running to run the applications.
The default configuration expects a postgres on localhost:5432
and a RabbitMQ on localhost:30003
. You can override that with env vars, as follow:
AMQP_HOST=localhost
AMQP_PORT=30003
AMQP_QUEUE=chattermill_review_dev
PG_HOST=localhost
PG_PORT=5432
The AMQP_PORT
points to 30003 because I'm running rabbit from a kubernetes cluster which port range starts on 30000.
To run the tests you have to setup the database like this:
cd service
mix ecto.create
mix ecto.migrate
mix test
To start the application, you can load the seed data and start the phoenix server like that:
cd service # assuming you're not there already
mix run apps/chattermill_review_service/priv/repo/seeds.exs
mix phx.server
The seeds.exs
will load the provided datasets. The mix phx.server
will start the api endpoints in http://localhost:4000
. I enabled the CORS to accept configurations from all hosts, just to simplify.
I configured a kubernetes cluster for this application. It is not optimised for production, but is good enough to showcase some of my skills. You will need docker and kubectl up and running in your machine. I used the docker desktop kubernetes cluster for this challenge.
To build the local docker images for the dashboard run the following commands:
cd dashboard
docker image build -t chattermill-dashboard:0.1 .
To build the local docker image for the api service run:
cd service
docker image build -t chattermill-api:0.1 .
After that you can start the pods and services in the kubernetes with the following command:
kubectl apply -k .
Kubernetes is configuration starts with the kustomization.yml
file. It just loads the yaml files for our 4 services.
resources:
- dashboard/dashboard.yml
- service/api.yml
- rabbitmq.yml
- postgresql.yml
There you can see the environment variables and the configured ports for our applications. I used pods directly instead of Deployments for simplicity, so it is not ready for production.
The ports exposed to the host machine are as follows:
- Dashboard -> 30001
- RabbitMQ Management UI -> 30002
- RabbitMQ amqp -> 30003
- Postgres -> 30004
- Api service -> 30005
To view the charts you just have to navigate to the dashboard and use the filters. The default address is http://localhost:30001
when running it from kubernetes.
I created an endpoint to receive new reviews at POST api/reviews
, which should receive a body json data like so:
{
"review": {
"comment": "Felipero is excellent keep it up",
"themes": [
{
"theme_id": 6374,
"sentiment": 1
}
],
"created_at": "2019-07-18T23:18:53.000Z",
"id": 99457782
}
}
I tried to present the simplest architecture for and MVP. Obviously there are considerations to be made.
In real life I'd start without the AMQP server in the first version and add it after seeing need for it. I decide to add it here because I felt there is a hint to the text in the code challenge for it when you mention this:
Another obvious requirement is a durability of data - our clients rely on Chattermill in understanding of how their customers feel and we think every single customer should be heard. For us it means we need to reduce the probability of losing any events in our system to the bare minimum.
I thought to myself: They are asking for a message broker.
Probably not. I recommend you send new reviews directly to the RabbitMQ server, so the phoenix app can consume it. You use the endpoint just as an example, but sending data through the http endpoint just to forward then to the amqp server won't help.
The biggest bottleneck I see here is in reading the averages. I created an cool query to read it directly from postgres, but it is filled with joins and aggregating the average directly in the query. This is far from ideal and won't support too many records.
My next step would be create a faster way to read data. Could be either a view in the postgres, triggered by every insert in the db or a indexing db like elasticsearch. If I had more time to work on it, that would be the next step.
Too much. I dedicated around 5 days on this. First because I was missing a good reference implementation for myself, and it is good to have it, but mostly to research what would be good to show these days. For example how not to use an ILIKE
query in postgres/ecto, when I came out with using tsquery
contains operator.
There was also some back and forth in terms of what endpoints I would need, so if you look into the git history I started with an scaffolding and ended up deleting a bunch of code.
Anyway, I hope you like it.