Skip to content

Web scrapping of data from Latin American women writers and network graph visualization πŸ•ΈοΈ

License

Notifications You must be signed in to change notification settings

DataCritica/escritoras-latinas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

90 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Latin American Women Writers

This repository was developed for the code and data behind the story: Una constelaciΓ³n de escritoras latinoamericanas (nacidas en el siglo XX).

The analysis uses web scrapping of Wikipedia entries for Latin American women writers and network graph visualization in order to create a web application.


Directory Structure

β”œβ”€β”€ app.py                              # Streamlit app file
β”œβ”€β”€ assets                              # Resources for the project
β”‚Β Β  β”œβ”€β”€ datacritica
β”‚Β Β  β”œβ”€β”€ imgs
β”‚Β Β  β”œβ”€β”€ imgs_processed
β”‚Β Β  β”œβ”€β”€ mosaics
β”‚Β Β  β”œβ”€β”€ targets
β”‚Β Β  └── targets_processed
β”œβ”€β”€ data                                # Categorized data 
β”‚Β Β  β”œβ”€β”€ processed                       # Cleaned data
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ escritoras_wiki.csv
β”‚Β Β  β”‚Β Β  └── escritores_destacados.csv
β”‚Β Β  └── raw                             # Original data
β”‚Β Β      └── escritoras.csv
β”œβ”€β”€ Dockerfile                          # Commands to build a docker image
β”œβ”€β”€ docs                                # Explanatory materials
β”‚Β Β  β”œβ”€β”€ data-dictionary.md              # Information about the data
β”‚Β Β  └── references                      # Papers, manuals, articles, etc.
β”œβ”€β”€ escritoras_latinas                  # Python package
β”‚Β Β  β”œβ”€β”€ data                            # Functions to manipulate data
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ analyze.py                  # Module to analyze data
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ export.py                   # Module to save exports
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ load.py                     # Module to load data and paths
β”‚Β Β  β”‚Β Β  └── process.py                  # Module to process data
β”‚Β Β  └── utils                           # Functions to make common patterns
β”‚Β Β      └── paths.py                    # Module to generate relative paths
β”œβ”€β”€ LICENSE                             # Project license
β”œβ”€β”€ notebooks                           # Jupyter notebooks
β”‚Β Β  β”œβ”€β”€ 0.0-scrapping-text.ipynb
β”‚Β Β  β”œβ”€β”€ 0.1-scrapping-text.ipynb
β”‚Β Β  β”œβ”€β”€ 0.2-scrapping-images.ipynb
β”‚Β Β  β”œβ”€β”€ 1.0-annotate-data.ipynb
β”‚Β Β  β”œβ”€β”€ 1.1-process-images.ipynb
β”‚Β Β  β”œβ”€β”€ 2.0-visualize-network.ipynb
β”‚Β Β  β”œβ”€β”€ 2.1-visualize-network.ipynb
β”‚Β Β  └── 2.2-visualize-donut-chart.ipynb
β”œβ”€β”€ outputs                             # Exports generated by notebooks
β”‚Β Β  β”œβ”€β”€ figures                         # Generated graphics, maps, etc.
β”‚Β Β  β”‚Β Β  └── index.html
|   β”œβ”€β”€ networks                        # Generated graph network
β”‚Β Β  β”‚Β Β  └── index.html
β”‚Β Β  └── tables                          # Generated pivot tables
β”‚Β Β  β”œβ”€β”€ LICENSE
β”‚Β Β  β”œβ”€β”€ photomosaics
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ photomosaics.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ run.py
β”‚Β Β  β”‚Β Β  └── scrape.py
β”‚Β Β  β”œβ”€β”€ README.md
β”‚Β Β  └── requirements.txt
β”œβ”€β”€ Pipfile                             # Project dependencies
β”œβ”€β”€ Pipfile.lock                        # Specific versions of packages on Pipfile
β”œβ”€β”€ README.md                           # Top-level README for this project
β”œβ”€β”€ README-ES.md                        # README in Spanish
β”œβ”€β”€ requirements.txt                    # Project dependencies
β”œβ”€β”€ setup.py                            # Import project as a python module
└── style.css                           # Styles for streamlit app

License

This project is released under MIT License.


This repository was generated with cookiecutter using a data-journalism template for python.

About

Web scrapping of data from Latin American women writers and network graph visualization πŸ•ΈοΈ

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published