From c8c57f2e6420c80ec684ef71407d20ea2ed69f08 Mon Sep 17 00:00:00 2001 From: user624086 Date: Thu, 23 May 2019 12:20:25 +0100 Subject: [PATCH 1/2] Updated System Performance section (System Requirements) --- README.md | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 9ddbe4a..1abf3bf 100644 --- a/README.md +++ b/README.md @@ -46,19 +46,9 @@ pyGrams.py has been developed to work on both Windows and MacOS. To install: This will install all the libraries and run some tests. If the tests pass, the app is ready to run. If any of the tests fail, please email [ons.patent.explorer@gmail.com](mailto:ons.patent.explorer@gmail.com) with a screenshot of the failure so that we may get back to you, or alternatively open a [GitHub issue here](https://github.com/datasciencecampus/pyGrams/issues). -### System requirements +### System Performance -We have stress-tested `pygrams.py` using Windows 10 (64-bit) with 8GB memory (VM hosted on 2.1GHz Xeon E5-2620). We observed a linear increase in both execution time and memory usage in relation to number of documents analysed, resulting in: - -- Processing time: 41.2 documents/sec -- Memory usage: 236.9 documents/MB - -For the sample files, this was recorded as: - -- 1,000 documents: 0:00:37 -- 10,000 documents: 0:04:45 (285s); 283MB -- 100,000 documents: 0:40:10 (2,410s); 810MB -- 500,000 documents: 3:22:08 (12,128s); 2,550MB +The system performance was tested using a 2.7GHz Intel Core i7 16GB MacBook Pro using 3.2M US patent abstracts from approximately 2005 to 2018. It initially takes about 6 hours to produce a specially optimised 100,000 term TFIDF Dictionary with a file size under 100MB. Once this is created however, it takes approximately 1 minute to run a pygrams popular terminology query, or approximately 7 minutes for an emerging terminology query. ## User guide From a94de01aa76235b17390ae556b30013af4272f4f Mon Sep 17 00:00:00 2001 From: user624086 Date: Thu, 23 May 2019 13:25:30 +0100 Subject: [PATCH 2/2] minor mods --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1abf3bf..81d713a 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ pyGrams.py has been developed to work on both Windows and MacOS. To install: ### System Performance -The system performance was tested using a 2.7GHz Intel Core i7 16GB MacBook Pro using 3.2M US patent abstracts from approximately 2005 to 2018. It initially takes about 6 hours to produce a specially optimised 100,000 term TFIDF Dictionary with a file size under 100MB. Once this is created however, it takes approximately 1 minute to run a pygrams popular terminology query, or approximately 7 minutes for an emerging terminology query. +The system performance was tested using a 2.7GHz Intel Core i7 16GB MacBook Pro using 3.2M US patent abstracts from approximately 2005 to 2018. Indicatively, it initially takes about 6 hours to produce a specially optimised 100,000 term TFIDF Dictionary with a file size under 100MB. Once this is created however, it takes approximately 1 minute to run a pyGrams popular terminology query, or approximately 7 minutes for an emerging terminology query. ## User guide