Skip to content

anchardo/PGCC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PGCC

(Parsing-Grouping-Cleaning-Clustering)

A Python (v3) script to Parse, Group, Clean and Cluster user queries from digital catalogues, using Piwik raw data*.

The code source, reusable, presents a concrete example based on the State Archives of Belgium. This case study will be presented in the context of the 15th International Symposium of Information Science (ISI 2017 : http://isi2017.ib.hu-berlin.de/, proceedings: https://t.co/GTRP2GgHIZ), Humboldt-Universität zu Berlin.

Code written by Anne Chardonnens, Simon Hengchen, Raphaël Hubain - Université libre de Bruxelles.

*Here is an example of input data (visitID, visitorID, timestamp, URL): A4EB7F66122DFB4B 649796 2016-01-01 01:00:41 search.arch.be/nl/zoeken-naar-archieven/zoekresultaat/index/index/zoekterm/antwerpen/findaidstatus/verified-complete-draft/dao/1/lang/nl

About

Parsing-Grouping-Cleaning-Clustering

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published