Visualizing data is an integral part for people to understand data literacy. Data literacy allows a learner to ask and answer meaningful questions by collecting, analyzing and making sense of the data encountered in real life. Many students, data analysts, data scientists and machine learning enthusiasts can analyze data using integrated plots and determine the most appropriate way to visualize information. As smoothly as they ace through comprehending the data, some may find it difficult to write code to achieve this task, especially people without a programming background. Moreover, with a plethora of libraries available in the same language, it becomes a daunting task to go through the installation procedures and following lines of code step by step.
Here comes PlotBot - a solution that helps user not to get overwhelmed. There is a need for a quick solution to map any dataset into a graph for people with little to no knowledge on plotting graphs. It is quick and easy with a friendly interaction platform which caters to a specific task instead of providing a million solutions. This library bot makes it easy for the users to choose the kind of plot and makes their work easy by also providing details of how to plot their data. One of the widely used library in Python for plotting graphs is Seaborn in Matplotlib. So for the users working with Python, it might be difficult to navigate through their extensive documentation to find just the right functions and library dependencies. Our bot provides the required libraries.Not only that, PlotBot will provide users with sample code for the kind of plot user wants to use to let them try it out on their own system. For users who want their plot to be made and do not want to go through the rigorous lines of code, Plotbot provides an additional feature of plotting the graph for them with their data they provide. Moreover, through Plotbot, users can retrieve their past plots if they never stored them and want to see their previous outputs before making new changes to their data.
Therefore, to sum it up -
This bot provides users with correct function and libraries through sample source code. The user can also test the correctness of the plot by visualizing their data input on the plot generated by the bot. The types of plots the bot can plot are:
- Scatterplot
- Box-plot
- Bar-plot
The bot’s primary features include library support for scatterplot, boxplot and barplot. In which the bot can help provide a code snippet to the user, and also help them generate a plot itself from a given dataset. The bot can also retrieve the plots that have been generated. Below lists more details and screenshots of each feature that has been implemented.
Gives the user with code snippet for required type of graph
The user can request to provide sample code snippet and a sample of plot for a particular plot type. In the screenshot below you can see that the user requested for a sample of box plot, then scatterplot and then bar plot and the bot at each time, replied with the code snippet and a sample of how the graph would look like
Plot the graph for the user with their custom data
The user can request for a plot to be generated by either uploading the dataset or using the pre uploaded datasets and providing the axis information (i.e. X axis and Y axis labels). The bot responds with the plot generated.
In the screenshot below you can see that the user requested for a plot to be generated and they provided a dataset along with axis information, the bot in response replied with the graph computed using the dataset and axes information. The three screenshots show user trying to plot a scatterplot, barplot and boxplot where they can upload a dataset or they can use a pre existing dataset once its already been uploaded
Provide user the ability to retrieve their plots
The user has the ability to retrieve his previous plot(s), either fetching one by one by giving the specific plot-ID or they can get all their previous graphs or graphs which were plotted between a timestamp to other timestamp. In the below screenshots you can see that the user is trying to access (1st cast ) there bots from a time range, and the bot returns a zip, containing all the plots. Next, when accessing all the plots (2ns case), user just asks for all, and the bot replies with all the bots for that user.
Following good architecture patterns, having implemented the testing practices, following scrum-ban practices and also having configuration management in place helped us to develop our project very efficiently. It is very easy to make any changes to any module without breaking any functionality in other modules as the architecture pattern pipe and filter had all the components as separate services. Thinking about the design and architecture patterns beforehand while proposing the idea helped a lot to have a clear cut idea on what exactly we would want to do, what is possible and not possible to achieve and how to implement it efficiently. In our project, we have used database for storing all the generated plots and fetch them when user wants to retrieve. So, we compared various implementations of database, and went ahead with the design that would work for all our use-cases. It was clear that we need centralised datastore, so we chose data-centered repository architecture over blackboard architecture. Similarly, pipe and filter architecture was chosen over batch-sequential as we wanted to stream and process every user request independently.
Integration testing was implemented using Puppeteer to verify all our use cases. It was very helpful to have the web automation in place and give various user requests and verify if the output is as expected. These automated tests helped a lot in having bug-free code and fix the bugs as early as possible.
Following the various scrum-ban practices was the best part of the project. This made managing the tasks a lot easier. Instead of pushing all the work to last and worrying on fixing things as they break, scrum-ban practices helped us to correctly estimate deadlines and start work beforehand. Regular scrum meetings brought down the problem of communication gap between the team members and increased productivity to a greater extent. Code reviews helped a lot to have an efficient and improved code. Mostly the visualization of tasks on the Github project Kanban board was very useful than just discussing in scrum meetings and knowing the tasks. Using kanban board, we were able to know what others are working on, their progress and also go for help to the right person in the team in case of dependencies involved.
Also, by having configuration management in place, it has become easy to deploy our application anywhere and manage it without worrying about the required dependencies every time we deploy the application. We have also used Jenkins which runs the integration tests on every build job that gets triggered whenever there is a commit made in the repository. This makes sure that pushed new changes do not break any other functionalities of our application. Having this method of continuous integration in place, helped us to detect any bugs beforehand and fix them as early as possible without having a heavy impact on our application.
-
Plotbot targets Python Seaborn library to draw plots for users. As future work, the bot can show visualizations by incorporating other python libraries like Plotly and ggplot.
-
As an additional feature, Plotbot will plot graphs which are customizable by users - For instance, changing the color and styling of the plots. That information shall be taken by the users as a parameter.
-
Specifically for use case 3, data upload can be done through Google Drive apart from local machine.