GitHub - abbylane/data-structures-project: Data Structures Spring 2018 Final Project: Wikipedia Trick

Data Structures Spring 2018 Final Project: Wikipedia Trick

This is the gitlab repository for our final data structures project. We utilized python, C++ and the MacPorts graphviz library to graphically represent the path taken by opening the first non-bolded, non-italicized link in consecutive Wikipedia pages, until a loop was reached.

Setup

Unfortunately, our project can only be run on machines with Mac OS X. Before running, users must install Xcode and MacPorts on their machine. Instructions to do that are at "Chapter 2" on this site. Once MacPorts is installed, the user must run the following commands to install the ports used for this project:

sudo port install graphviz sudo port install graphviz-gui

We linked the various parts of our project in one shell script "run.sh". This script can be run on the command line with ./run.sh -n (# of keywords) (Keywords separated by spaces)

It will save a file called graph.dot.png to the same directory where you ran the script which contains a visual representation of the pages visted.

Files:

Makefile

Compiles philo.cpp and tests program against sample plain text output.

crawl.py

Uses Python request capabilities to get code from user requested Wikipedia page. Parses HTML for first link and opens link. Recrusively opens links on each consecutive page until a loop is reached in pages (one page is leading to the same cycle of page openings)

output1, output3, output5

Expected output of program for set keywords to crawl.

philo.cpp

Takes in output from crawl.py and pushes traversed pages into a map, where the key is the word and the value is its list of neighbors. Outputs the correct formatted list to a "graph.dot" file so that a visual graphic may be created.

run.sh

Combines Python and C++ progams, and creates a png from .dot file.

Members

Abby Lane (alane3, Section 01)

Abigail Mrenna (amrenna, Section 02)

Margaret West (mwest6, Section02)

Contributions

Abby Lane:

Creation of functions in crawl.py
creation of Makefile
creation of run.sh
benchmarking

Margaret West:

Planning overall strategy and design choices for each part of the project
work on crawl.py, especially with HTML parsing
Research on use of Graphviz
work on philo.cpp, especially with formatting output

Abigail Mrenna:

creation of philo.cpp
general debugging
benchmarking
Extensive help manually checking correctness of crawl.py

Demonstration Video

https://drive.google.com/file/d/1WDw_X9t3u2Yc6kWg7fqUwTKvOFlBHJPW/view?usp=sharing

Presentation Slides

https://docs.google.com/presentation/d/1ESmDPcPllamQNsFswE3y96GqDRV6hgMkoI5nBZmALfY/edit?usp=sharing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Structures Spring 2018 Final Project: Wikipedia Trick

Setup

Files:

Makefile

crawl.py

output1, output3, output5

philo.cpp

run.sh

Members

Contributions

Demonstration Video

Presentation Slides

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Makefile		Makefile
README.md		README.md
crawl.py		crawl.py
output1		output1
output3		output3
output5		output5
philo.cpp		philo.cpp
run.sh		run.sh

abbylane/data-structures-project

Folders and files

Latest commit

History

Repository files navigation

Data Structures Spring 2018 Final Project: Wikipedia Trick

Setup

Files:

Makefile

crawl.py

output1, output3, output5

philo.cpp

run.sh

Members

Contributions

Demonstration Video

Presentation Slides

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages