Skip to content

Project for Advanced Software Engineering, Master Data Science

Notifications You must be signed in to change notification settings

TeodorChiaburu/Webscraper_IMDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Webscraper_IMDB

The program scrapes data about the current top 100 most popular movies on IMDB. We are particularly interested in year of appearance, IMDB score, increase/decrease in popularity and number of votes; this information is stored in a csv file (see imdb_top100.csv as an example).

Tasks

  • UML diagrams

  • Metrics (via SonarQube, see node definition)

    • reliability
    • security
    • maintainability
    • duplications
    • coverage: SonarQube sets a high par when it comes to test coverage (min. 80%). Given that TestWebscraper is fairly small and only checks three test cases, my code only reached little over 9% coverage. Here is a link to the xml file of the generated coverage analysis.
    • others: such as lines of code, percent of comment lines, cyclomatic and cognitive complexity, number of (open) issues

    An overview of the metrics can be seen here and here.

  • Clean Code Development

    Refer to the code for the library and the main script:

    • there is no useless/commented out code
    • sufficient documentation at the library level, where user should get an insight into what each class does
    • readability: code can be read as plain English (e.g. lines 155 or 179 in library)
    • precise naming of variables (e.g. top_url, list_csv, soup, row_pop, dict_row, title_col) and functions/methods (e.g. replace_brackets, get_table, iterate_films, add_films, test_shape, test_isnan)
    • tests for states of variables: see class TestWebscraper
    • fields define state: temporary variables are only declared within local scope (e.g. dict_row on line 67)
    • correct exception handling: in method add_films of class Webscraper and testing methods of class TestWebscraper
    • avoid negative conditionals (e.g. line 95 in library)
    • DRY: there are no pieces of code that repeat themselves (no duplications in SonarQube)
    • KISS: simple function definitions (e.g. line 16 in library)
    • 'divide and conquer': no long method chaining (e.g. lines 76-79, 110-112)
    • assertions: in all the testing methods of TestWebscraper
    • split long methods: see methods add_films and iterate_films in class Webscraper (also definition of replace_brackets outside the class)
    • design and implementation do not overlap: there are two separate files for the classes and their instantiation
    • consistency: use of term 'webscraper' in the name of the class TestWebscraper to match the tested class Webscraper; also both methods that are applied on film data have the term 'film' in them: add_films and iterate_films
  • Build Management

    I used Maven (see successful built in Jenkins here and even more evidence in pom.xml).

  • Unit Tests

    Integrated in Maven, take a look at the test script.

  • Continuous Delivery

    See Jenkinsfile and Jenkins Pipeline. The building process was successful (proof1) and also the integration of SonarQube (proof2).

  • IDE

    For coding this project I used Spyder (snapshot here). My favourite key-shortcuts are Ctrl+S (save), Ctrl+1 (comment/uncomment), (Shift+)Tab (indent/unindent), Ctrl+F (find), Ctrl+R (replace), F5 (run).

  • DSL

    I wrote a simple DSL example. It prints a greeting message, by calling the function defined in module_introduction.py with the arguments written in dsl_source.

  • Functional Programming

    • lambda function: line 16 in the library
    • function passed as parameter: line 58 in the library
    • even the construction of the library itself, that contains classes with methods that encompass all the program directives, adheres to the principles of functional programming

Bonus

  • Logical Solver: the testing methods in TestWebscraper already have more or less the structure of a logical solver

About

Project for Advanced Software Engineering, Master Data Science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages