GitHub - Chan2k20/Data-Collection-using-R: Performed data collection using R programming language by scrapping and crawling the require fields from a given website for any year user wants the data for. The data scrapped is stored in the text file as per the requirement.

Project Title: Data Collection using R

Description/Problem Statement: The project involved developing two scripts: Scrapping.R and After.R, for comprehensive data collection from the Mobile DNA journal. The Scrapping.R script utilizes various R packages such as stringr, xml2, XML, writexl, and rvest to retrieve article information, including title, authors, affiliations, publication date, abstract, keywords, and more, from Mobile DNA articles for a specified year. The collected data is then organized into a dataframe and exported to both a plain text file (Summary.txt) and an Excel file (summary.xlsx), excluding full-text due to character limitations.

Skills Utilized:

Data Scraping
R Programming
Data Manipulation
Package Management (stringr, xml2, XML, writexl, rvest)

Solution:

The Scrapping.R script first retrieves the URLs for all articles published in Mobile DNA for a specified year provided by the user.
It then crawls each URL to extract relevant information about each article, such as title, authors, affiliations, publication date, abstract, and keywords.
The collected data is structured into a dataframe and exported to a plain text file (Summary.txt) and an Excel file (summary.xlsx), omitting the full-text due to Excel's character limit.
To collect data for multiple years, the After.R script interacts with the user, prompting for an input year. It then iterates through all years from the input year to 2020, calling Scrapping.R for each year.
Upon completion, Summary.txt contains comprehensive information for all Mobile DNA articles published in and after the input year.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
After.R		After.R
README.md		README.md
Scrapping.R		Scrapping.R
summary.txt		summary.txt
summary.xlsx		summary.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Chan2k20/Data-Collection-using-R

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages