Exponential emergence #306

user624086 · 2019-08-06T09:04:42Z

exponential like emergence scrore

…into develop

fix bug in stationary emergence

IanGrimstead

Few things

IanGrimstead · 2019-08-06T10:15:15Z

scripts/algorithms/emergence.py

+        # todo: Modify not to use weekly_values
+        # todo: Create -exp parameter, e.g. power of weight function (currently linear = 1)
+
+        # exponential like emergence score


Suggested change

# exponential like emergence score

I've moved this to the function documentation line to inform that's it's not a real exponential

scripts/algorithms/emergence.py

IanGrimstead · 2019-08-06T10:24:20Z

scripts/pipeline.py

-            if em.init_vars(row_indices, row_values, porter=not curves):
-                escore = em.calculate_escore() if not curves else em.escore2()
+            if em.init_vars(row_indices, row_values):
+                if exponential:


Move weekly_value definition here (or move work into escore_exponential method)

Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>

codecov · 2019-08-06T11:59:51Z

Codecov Report

Merging #306 into develop will increase coverage by 0.14%.
The diff coverage is 70%.

@@             Coverage Diff             @@
##           develop     #306      +/-   ##
===========================================
+ Coverage    58.49%   58.64%   +0.14%     
===========================================
  Files           38       38              
  Lines         2848     2875      +27     
===========================================
+ Hits          1666     1686      +20     
- Misses        1182     1189       +7

IanGrimstead

Very neat! 👍
Only stylistic change with the tests is to not bother with comments, just leave line breaks to format the test into 3 'paragraphs' - which are then assumed to be arrange, act, assert to save having to write the comments. But stick with the comments at least in the short term as you're getting used to it.

* argschecker updated #178 * Reverted to latest pdmarima (#212) * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Reverted to original test specification (#212) * Reverted to original test specification (#212) * Added version debug code (#212) As requested on pmdarima bug reporting page * Added version debug code (#212) As requested on pmdarima bug reporting page * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * resolves #217 * resolves #217 * test * test * test * test * test * test * changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies * changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies * Updated Arima to use pmdarima rather than pyramid-arima (#212) * Updated Arima to use pmdarima rather than pyramid-arima (#212) * Test pmdarima 1.0.0 to test windows (#212) Seeing if an earlier version of pmdarima works in windows * Test pmdarima 1.0.0 to test windows (#212) Seeing if an earlier version of pmdarima works in windows * emtech report to file! * emtech report to file! * Reverted to latest pdmarima (#212) * Reverted to latest pdmarima (#212) * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Reverted to original test specification (#212) * Reverted to original test specification (#212) * Added version debug code (#212) As requested on pmdarima bug reporting page * Added version debug code (#212) As requested on pmdarima bug reporting page * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * analyzer ngrams processing was not stopping unigrams :) * analyzer ngrams processing was not stopping unigrams :) * adjusted tests to reflect bug fixes in stoplists processing * adjusted tests to reflect bug fixes in stoplists processing * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * pmdarima>=110 * pmdarima>=110 * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * rid of vectorizer. Only vocabulary needed * rid of vectorizer. Only vocabulary needed * 225 ridof pmdarima (#226) * rid of vectorizer. Only vocabulary needed * rid of pmd. Also realized that two of our test series were identical. No need to test them twice :) * pmd left. * just to check why one excepts and other doesn't * rid of vectorizer. Only vocabulary needed * scipy was the proble, in the end. Has to be >=1.2.1 * 225 ridof pmdarima (#226) * rid of vectorizer. Only vocabulary needed * rid of pmd. Also realized that two of our test series were identical. No need to test them twice :) * pmd left. * just to check why one excepts and other doesn't * rid of vectorizer. Only vocabulary needed * scipy was the proble, in the end. Has to be >=1.2.1 * 223 pipeline bug (#224) * rid of vectorizer. Only vocabulary needed * pickle-depickle tfidf test now represents different executions (#223) WordAnalyser reset between calls to main() - will catch if stopwords etc not populated * 223 pipeline bug (#224) * rid of vectorizer. Only vocabulary needed * pickle-depickle tfidf test now represents different executions (#223) WordAnalyser reset between calls to main() - will catch if stopwords etc not populated * Travis now reports python packages in use Added `pip freeze` to travis.yml * Travis now reports python packages in use Added `pip freeze` to travis.yml * Corrected pip listing of packages * Corrected pip listing of packages * 228 data path (#229) * Removed override to 'data' path and added date info #228 Now reports date range of patents in use * Removed 2nd construction of WordAnalyser #228 * 228 data path (#229) * Removed override to 'data' path and added date info #228 Now reports date range of patents in use * Removed 2nd construction of WordAnalyser #228 * 230 arima failing (#231) * Alternative method to annoy ARIMA #230 * 230 arima failing (#231) * Alternative method to annoy ARIMA #230 * 227 bug csv date (#233) * Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227) * Checks if DF date column is a string and converts to datetime #227 * Oops. Test failing as date_column not always corrected to datetime #227 * 227 bug csv date (#233) * Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227) * Checks if DF date column is a string and converts to datetime #227 * Oops. Test failing as date_column not always corrected to datetime #227 * csv dates come as strings. Type-check to see what's going on and conv… (#232) * moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of.. * csv dates come as strings. Type-check to see what's going on and conv… (#232) * moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of.. * Remove leading zero trimming (#235) (#239) * Remove leading zero trimming (#235) (#239) * added argument for embeddings threshold * added argument for embeddings threshold * resolves #250 (#251) * scipy==1.2.1 else breaks * new gensim breaks windows! Force 3.4.0 * resolves #250 (#251) * scipy==1.2.1 else breaks * new gensim breaks windows! Force 3.4.0 * filtering rows now gets rid of corresponding rows in df (#249) * filtering rows now gets rid of corresponding rows in df * gensim & scipy version limited due to introduced instability in current versions * filtering rows now gets rid of corresponding rows in df (#249) * filtering rows now gets rid of corresponding rows in df * gensim & scipy version limited due to introduced instability in current versions * Update pygrams.py Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com> * Update pygrams.py Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com> * 248 tfidf filter (#254) * Added prefilter of terms (#248) * 248 tfidf filter (#254) * Added prefilter of terms (#248) * del * del * Update README.md Missing `.` on `pip install -e .` * Corrected check for empty CPC list (#261) * cache 2 initial commit! (#269) * cache 2 initial commit! * fix-imports was calling the properties nd populating tfidf_mat. Disabled it. Plus some cosmetics * helper function to safeguard from None idf or tfidf * 257 add nmf code (#271) Added NMF output * resolves #272 (#275) * Dictionary used to store CPC rather than list inside data frame * 273 dates as ints (#277) * Dates now pickled as integer array to save space (#273) Tidied up date related utilities - added to date_utils from utils Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso Column filter removed from DocumentsFilter Removed time and CPC document weighting * Update README.md * 279 small adjustments (#280) * Dates now pickled as integer array to save space (#273) * Tidied up date related utilities - added to date_utils from utils * Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso * Column filter removed from DocumentsFilter * Removed time and CPC document weighting * Removed unused parameters and synchronised variable names (#273) * Added timing report and progress reports * 278 move mask (#283) * resolves #278 * Changed folders for cached outputs (#281) (#284) * 285 data uspto (#286) * error checks change... * resolves #286 * 287 update system requirements section (#288) * Updated System Performance section (System Requirements) * minor mods * Small bug (#289) * threshold not a list * save time series to file (#270) * Update README.md -it option was outdated * 291 bug (#292) resolves #291 * 294 fb (#295) * resolves #294 * 296 emtech facelift (#297) * resolves #296 * 298 nltk installation (#299) NLTK data now downloaded during execution of `pip install` (fixes #298) * 256 tech report 2 (#301) resolves # 256 * Ch comments (#304) * ch comments * Checking changes were propagated correctly #256 (#305) * Checking changes were propagated correctly #256 * Checking changes were propagated correctly - more missing #256 * Few american spellings caught #256 * Exponential emergence (#306) * add exponential emergence * #255 convert r scripts (#308) state space model resolves #308 #255 * General facelift * Refactoring for readability * Corrected issue with calculation of Porter (was using head not tail of dataset) * State space (#317) * cache state-space data! * two-stage grid search * Corrected test with duplicated args (good spot...) Now copes if min/max time series dates are not defined * If smoothing not requested, ensure None is returned for smoothed dictionary * Default predictor set now excludes LSTMs * 319 cache (#320) * #319 updated code and tests to reflect new cache usage * 321 test stopwords (#322) * #321 added stopwords to test folder for test specific variant * #319 consistent cmd line args, GloVe can now be placed anywhere * 315 clamp redo (#323) * #315 clamp smoothed values at 0 * cast smoothed data back to lists (from numpy arrays) for consistency * command line args now restricted to available smoothing and emergence * added simple test for holt-winters to confirm -ve values not handled * 326 mpq (#327) * mpq tweak and cached data * #328 added tests for example command line (#329) * #328 added tests for example command line * fixed: date not defined when not required causes failure * #328 corrected execution folder for README tests * Corrected merge * Whitespace changes ready for merge to master * Cleanup state space modelling * Whups. Now checks tests again and only runs on travis... and not win32 * Whups. Now checks tests again and only runs on travis... and not win32 * 324 state space predictions (#325) * #324 create table from state space results - work in progress * tests TBA * first commit * #324 create table from state space results - with tests * Trimmed SD not implemented * #324 Trimmed SD implemented * #324 report window size to HTML * #324 WIP - needs refinement, but works for non-test. Test may blow graph generation. * #328 multiplot added as option * Cleanup state space modelling * Whups. Now checks tests again and only runs on travis... and not win32 * Whups. Now checks tests again and only runs on travis... and not win32 * Merge issue with SSM

user624086 added 9 commits May 23, 2019 12:20

Updated System Performance section (System Requirements)

c8c57f2

minor mods

a94de01

Merge branch 'develop' of https://github.com/datasciencecampus/pyGrams …

77dd1ca

…into develop

Merge branch 'develop' of https://github.com/datasciencecampus/pyGrams …

533dca1

…into develop

Merge branch 'develop' of https://github.com/datasciencecampus/pyGrams …

8b8fdea

…into develop

Merge branch 'develop' of https://github.com/datasciencecampus/pyGrams …

49884ad

…into develop

add exponential emergence -exp parameter

d3bc238

fix bug in stationary emergence

escore_exponential function (not integrated yet)

c15fa72

integrate escore_exponential function

0e1f9a6

user624086 requested a review from IanGrimstead August 6, 2019 09:04

IanGrimstead suggested changes Aug 6, 2019

View reviewed changes

user624086 and others added 4 commits August 6, 2019 11:42

Update scripts/algorithms/emergence.py

a7398bc

Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>

Update scripts/algorithms/emergence.py

97a90d6

Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>

Update scripts/algorithms/emergence.py

8987631

Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>

Update scripts/algorithms/emergence.py

7b74c7d

Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>

#306 addressed review comments

36de363

user624086 requested a review from IanGrimstead August 8, 2019 06:04

user624086 added 9 commits August 9, 2019 07:31

exponential emergence code update

b879b40

expand examples of escores

577d740

include function description

f8c3c97

added unit tests

8ccd1a4

account for 52.1775 weeks in a years (miss 53rd week)

4a728d6

comment out save time series to file

3f6b6e2

#306 allowed non-integer weeks per years (52.1775)

2861d72

#306 allowed non-integer weeks per years (52.1775)

5550994

#306 allowed non-integer weeks per years (52.1775)

44177d8

IanGrimstead previously approved these changes Aug 16, 2019

View reviewed changes

#306 updated readme and function description

8b45ffa

user624086 dismissed IanGrimstead’s stale review via 8b45ffa August 20, 2019 08:31

IanGrimstead merged commit ef8d654 into develop Aug 20, 2019

IanGrimstead deleted the exponential_emergence branch August 20, 2019 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exponential emergence #306

Exponential emergence #306

user624086 commented Aug 6, 2019

IanGrimstead left a comment

IanGrimstead Aug 6, 2019

user624086 Aug 7, 2019

IanGrimstead Aug 6, 2019

codecov bot commented Aug 6, 2019 •

edited

Loading

IanGrimstead left a comment

Exponential emergence #306

Exponential emergence #306

Conversation

user624086 commented Aug 6, 2019

IanGrimstead left a comment

Choose a reason for hiding this comment

IanGrimstead Aug 6, 2019

Choose a reason for hiding this comment

user624086 Aug 7, 2019

Choose a reason for hiding this comment

IanGrimstead Aug 6, 2019

Choose a reason for hiding this comment

codecov bot commented Aug 6, 2019 • edited Loading

Codecov Report

IanGrimstead left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 6, 2019 •

edited

Loading