Final_Project: Google Scholar Comparator

Regenerated from my previous git folder called "Final_Project"

What it does/What to expect:

Accepts multiple Google Scholar search terms separated by a comma (up to 5 is suitable for the graphic). Returns a graphic of boxplots to describe the distribution of number of citations for top 50 search results for Google Scholar in that field. Also returns a table for each topic of the top 5 Google Scholar publications for each topic, including the title, authors, year, journal, link to the paper, and number of citations
Adds every unique entry into a database of publications. Two databases are created:
1. A table of search terms
2. A table of publications, with a foreign key connected to search terms
Returns a csv-format of results for each search term (e.g. if you typed Memristors, ebola virus, reduced order models, hepatocellular carcinoma, HIV you will see five files: MEMRISTORS.csv, EBOLA VIRUS.csv, REDUCED ORDER MODELS.csv, HEPATOCELLULAR CARCINOMA.csv, HIV.csv, with 50 results each)

Program Requirements:

You must use a Mac computer (Backend was changed to tkAgg. Should work in Windows too but not tested)
Please use Chrome Browser, because cache-clearing function doesn't work in Safari
You must use from MWireless or public wifi to avoid overloading requests. Should your requests be overloaded on a public IP address, limit your search terms until requests are met. Once those search terms are cached, you can try comparing the multiple terms again.
If you were met with an error, reduce your search terms. You were likely overloaded requests (see previous bullet)
Program requires python 3.6 installed with packages located in requirements.txt
Make sure to update config.py with the appropriate database name and database user

How to run:

Download packages in requirements.txt
Update config.py with database name and database user (note, database name can be anything, but must be a database already created. Store the database name to variable db_name and the user to db_user)
Type: python SI507F17_finalproject.py runserver into the command prompt to start local server
Type: localhost:5000 in your Chrome browser
Type multiple Google Search terms into the search bar separated by a comma

Screenshots of software: What to Expect

Google Scholar Comparator Landing Page. Type search terms here separated by a comma

Google Scholar Comparator's Citation Results Visual

List of top Google Scholar Search Results Table

Technical Details: What Happens When the Code Runs

The browser cache for the static folder is deleted. The user visits localhost:5000 and sees a landing page with a search bar. This is run by Flask, which borrows a template called interface.html
Search terms are separated and stripped. The Google Scholar Search Results for each term are first searched for in the cache file, then request from Google Scholar if not in the cache. This data is ultimately cached.
The data is parsed to return only Publications and information about them. A Class called Paper is used to parse the data, and will also return information set in formats for csv and Flask manipulation
All incoming data is inserted into a database (wrapper_call()). The database is first set up with the setup_database(). The insert() function is used to insert the data into the "Publications" table. Information sent to the "Subjects" table is directly handled through the wrapper_call() function.
Number of citations for each Publication in each Subject is saved for later
A csv file of each search term data is saved as an extra
The plotdata() function takes the citation information from each search term and creates one graph with multiple boxplots based on the number of citations for each search term.
Information about the top 5 Publications for each search term is sent from Class Paper (method: package_html()) to results.html. The figure (saved in the static folder) is also integrated into the Results page.
The User can refresh the page to start again.

SUMMARY

Input: Search terms separated by a comma in landing page search bar Output:

CSV files for top 50 Google Scholar results from each search term (Number of CSV files = Number of search terms). CSV files will be contained in a folder called csv_files
Two Database tables: "Publications" (containing information about Publications), and "Search Terms" (containing all previously entered search terms)
An html Results Page including a boxplot graph of the distribution of citations per search term, and the top 5 Google Scholar hits for each search term. The boxplot will be called fig1.png and be contained in a folder called static

Acknowledgments and Citations:

Complex caching system with expiration date courtesy of: Anand Doshi, nytimes.py
Unique Cache Key Function (params_unique_combination) courtesy of: Jackie Cohen, Runestone text
matplotlib example courtesy of: http://blog.bharatbhole.com/creating-boxplots-with-matplotlib/
setup_database() function courtesy of Jackie Cohen, Project-6
insert() function for database courtesy of Anand Doshi, section-week-11

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Extra Screen Shots		Extra Screen Shots
__pycache__		__pycache__
csv_files		csv_files
static		static
templates		templates
Google_Comparator-2.png		Google_Comparator-2.png
Google_Comparator_1.png		Google_Comparator_1.png
Google_Comparator_3.png		Google_Comparator_3.png
README.md		README.md
SI507F17_finalproject.py		SI507F17_finalproject.py
SI507F17_finalproject_tests.py		SI507F17_finalproject_tests.py
cache_file.json		cache_file.json
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final_Project: Google Scholar Comparator

Program Requirements:

How to run:

Screenshots of software: What to Expect

Technical Details: What Happens When the Code Runs

SUMMARY

Acknowledgments and Citations:

About

Releases

Packages

Languages

elisawarner/Google_Scholar_Comparator

Folders and files

Latest commit

History

Repository files navigation

Final_Project: Google Scholar Comparator

Program Requirements:

How to run:

Screenshots of software: What to Expect

Technical Details: What Happens When the Code Runs

SUMMARY

Acknowledgments and Citations:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages