-
Notifications
You must be signed in to change notification settings - Fork 20
Home
It's a laborious task to collect and synthesize the perspectives of customers. There's an immense amount of customer data from a variety of digital channels: survey data, StackOverflow, Reddit, email, etc. Even for internal tools teams at Microsoft, there are at least 10,000 user feedback documents generated per quarter.
To help solve this problem, BrowseCloud is an application that summarizes feedback data via smart word clouds, called counting grids. On a word cloud, the size of the text simply scales with the frequency of the word. Text is scattered randomly on word clouds. In BrowseCloud, we have a word cloud where the position of the word matters. As the user scans along the visualization, themes smoothly transition between each other.
Go to https://aka.ms/browsecloud-demo to give our web app a try! You can also download the app to run it locally via the command line, or you can setup the infrastructure needed for the full experience.
Explore Workflow
Uploading a new data set is only supported by the internal version of BrowseCloud at Microsoft for now.
Getting Started Workflow
Upload Workflow
Use the issues tab or email browsecloud-team@microsoft.com
We are using GitHub to store and manage the project's source code! https://github.com/microsoft/browsecloud/
If you work at Microsoft, use https://browsecloud-client.azurewebsites.net, add your own data, and train on the site. If you do not work at Microsoft, then clone the source code and run dumpCountingGrids.py on your new data. You can then visualize using the demo angular app. Put your model files in the browsecloud-client/assets/demo directory and run.
We assume that there is some space into which a set of tight distributions is embedded, and that these distributions are then combined using a windowing operation to create a resultant distribution from which the observed bags of words or features are generated. However, we do not assume that the mapping is given a priori. For simplicity, we assume that the space is a discrete grid of counts, but of arbitrary dimension (we experimented with 2- and 3-dimensional grids) and we consider iterative estimation of counts on this grid and the mapping of the data to the overlapping windows on it. Our experiments indicate that the thematic shifts are indeed present in a variety of datasets, and as a result, our model outperforms standard topic mixing (LDA) there. We analyzed a wide variety of data types, including text, images, gene expression and viral peptides, and used the learned counting grids to perform regression or classification.
To learn where to map documents to the grid, which is a set of latent variables, we run generalized expectation maximization and update the counts of the words on the grid.
Link to Paper: https://arxiv.org/ftp/arxiv/papers/1202/1202.3752.pdf