Social_Media_Corpus_on_COVID-19

Welcome to our bilingual social media corpus collected from Twitter and Weibo in March 2020. English tweets and Chinese Weibo posts related to COVID-19 are the focus of this corpus.

Setting up the web interface

Please follow these instructions in order to use our web interface.

Downloads

The following files need to be downloaded from the repository and saved to the same directory:

keyword_search_backend.py: the backend python code.
keyword_search_frontend.css: the style sheets for displaying the html in a certain way.
keyword_search_frontend.html: the html code that displays is used by the browser to render the html tags.
keyword_search_frontend.js: the java script code to join the html to the backend Python code.
SourceHanSansSC-Regular.otf: dependent file for generating word clouds.

It is important to save them all in the same ../user_dir directory!

Dependent packages

Some dependent packages should already be installed on your computer (e.g., nltk, pandas, numpy, random, io, and http). Please ensure the following packages are also installed:

jieba: pip install jieba
hanzidentifier: pip install hanzidentifier
wordcloud: pip install wordcloud
matplotlib: pip install matplotlib
PIL: pip install Pillow
requests: pip install requests

Executing the code

Navigate to the ../user_dir directory in terminal or command-line interface and run the code python keyword_search_backend.py. Wait until you see "Ready for query" as shown; this means the web interface is ready to load in the browser!

Now, open a browser in your system (preferably Chrome) and paste localhost:9999 in the address bar. You will be able to see the webpage like below:

Troubleshooting

If you encounter a problem with the port, try changing the port number 9999 in line 308 of keyword_search_backend.py to any unused port in your local system.

How to use the web interface:

Search by keyword

Enter the keyword in English or Chinese in the text box and press the Submit button to see a list of returned posts and a word cloud that matches the query.

Please note:

Keywords must be one word with no spaces. For example, a search for "virus\ " will not return any posts. However, if the input is "virus", a total of 3,052 posts will be found and a random sample of 10 from the search results will be displayed. One Chinese word can be multiple characters. The same no-space rule applies; a search like "学校" (meaning school) will return 59 posts but "学\ 校" will not.
Do not press the enter key because the query will not send. Click Submit only.
There may be some lag after submitting the keyword due to the word cloud generation, please be patient.
To see different social media posts if more than 10 were found, click the Submit button again.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Corpus		Corpus
Image		Image
Interface		Interface
file		file
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Social_Media_Corpus_on_COVID-19

Setting up the web interface

Executing the code

How to use the web interface:

About

Releases

Packages

Languages

moolieloo/Social_media_corpus_on_COVID_19

Folders and files

Latest commit

History

Repository files navigation

Social_Media_Corpus_on_COVID-19

Setting up the web interface

Executing the code

How to use the web interface:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages