Skip to content

A Python and SQL implementation to find patterns in 7000 publication metadata from SCOPUS

Notifications You must be signed in to change notification settings

akifnu/Wicked-Problem-UNU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 

Repository files navigation

Nexus_Wicked-Problems-UNU

A Python and SQL implementation to find patterns in 7000 publication metadata from SCOPUS

This is a part of Wicked Science project from United Nations University - FLORES, Dresden.

We were curious about how the different problem sructure keywords (Wicked, Complex, Uncertain and Conflict) trends with Social Science Dimension (Policy and Governance) in Resource Nexus (Water,Soil, Food, Waste and Energy) around different regions in the world.

We first downloaded the csv data from Scopus using the following keyword combination:

( AUTHKEY ( wicked* ) OR AUTHKEY ( uncertain* ) OR AUTHKEY ( complex* ) OR AUTHKEY ( conflict* ) AND AUTHKEY ( "Water" ) OR AUTHKEY ( "Soil" ) OR AUTHKEY ( "Waste" ) OR AUTHKEY ( "Energy" ) OR AUTHKEY ( "Food" ) AND AUTHKEY ( "Governance" ) OR AUTHKEY ( "Policy" ) ) OR ( TITLE ( wicked* ) OR TITLE ( uncertain* ) OR TITLE ( complex* ) OR TITLE ( conflict* ) AND TITLE ( "Water" ) OR TITLE ( "Soil" ) OR TITLE ( "Waste" ) OR TITLE ( "Energy" ) OR TITLE ( "Food" ) AND TITLE ( "Governance" ) OR TITLE ( "Policy" ) ) OR ( ABS ( wicked* ) OR ABS ( uncertain* ) OR ABS ( complex* ) OR ABS ( conflict* ) AND ABS ( "Water" ) OR ABS ( "Soil" ) OR ABS ( "Waste" ) OR ABS ( "Energy" ) OR ABS ( "Food" ) AND ABS ( "Governance" ) OR ABS ( "Policy" ) ) AND ( LIMIT-TO ( SUBJAREA , "SOCI" ) ) AND ( EXCLUDE ( PUBYEAR , 2021 ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) )

This Generated a 7041 results!

image

We download the following data image

Data Cleaning and concatnation(Tool: Python, Compiler: Google Colab)

However we mostly need the author name, author keywords, title, abstract and their affiliation. Scopus allows only data of 2000 at a time thus the data was downloaded in 4 year wise chunks

  1. 1953-2010.csv
  2. 2011-2014.csv
  3. 2015-2017.csv
  4. 2018-2010.csv A python joiner was built using pandas framework to join the files The joined file is named as: 1953-2020.csv 1953-2020.xlsx

assign an index and outputting a csv file for pattern matching in SQL

SQL Pattern Matching (Tool: PostGreSQL) Structure

the pattern matching in intended to find co-occurance or the intersections of the keywords of Problem Structure, Social Science Dimensions and Resource Nexus. The implementation

Problem Structure:

  1. Wicked
  2. Conflict
  3. Complex
  4. Uncertain

Social Science Dimensions:

  1. Governance
  2. Policy

Resource Nexus:

  1. Water
  2. Soil
  3. Food
  4. Energy
  5. Waste

The details of column creation, pattern matching codes, formulas are listed in this file

The results are included in the results folder

Country/ Regional Diffusion (Tool: Google Sheet)

We were curious about how over the time the keywords diffused in different regions globally to find a trend in publication. The results was generated from the affiliation column by detecting countries abd joining the results from the SQL outputs

About

A Python and SQL implementation to find patterns in 7000 publication metadata from SCOPUS

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published