Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential emergence #306

Merged
merged 24 commits into from
Aug 20, 2019
Merged

Exponential emergence #306

merged 24 commits into from
Aug 20, 2019

Conversation

user624086
Copy link
Collaborator

exponential like emergence scrore

@user624086 user624086 requested a review from IanGrimstead August 6, 2019 09:04
Copy link
Contributor

@IanGrimstead IanGrimstead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things

# todo: Modify not to use weekly_values
# todo: Create -exp parameter, e.g. power of weight function (currently linear = 1)

# exponential like emergence score
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# exponential like emergence score

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this to the function documentation line to inform that's it's not a real exponential

scripts/algorithms/emergence.py Outdated Show resolved Hide resolved
scripts/algorithms/emergence.py Outdated Show resolved Hide resolved
scripts/algorithms/emergence.py Outdated Show resolved Hide resolved
scripts/algorithms/emergence.py Outdated Show resolved Hide resolved
scripts/algorithms/emergence.py Outdated Show resolved Hide resolved
scripts/algorithms/emergence.py Outdated Show resolved Hide resolved
if em.init_vars(row_indices, row_values, porter=not curves):
escore = em.calculate_escore() if not curves else em.escore2()
if em.init_vars(row_indices, row_values):
if exponential:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move weekly_value definition here (or move work into escore_exponential method)

user624086 and others added 4 commits August 6, 2019 11:42
Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>
Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>
Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>
Co-Authored-By: IanGrimstead <38883454+IanGrimstead@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Aug 6, 2019

Codecov Report

Merging #306 into develop will increase coverage by 0.14%.
The diff coverage is 70%.

@@             Coverage Diff             @@
##           develop     #306      +/-   ##
===========================================
+ Coverage    58.49%   58.64%   +0.14%     
===========================================
  Files           38       38              
  Lines         2848     2875      +27     
===========================================
+ Hits          1666     1686      +20     
- Misses        1182     1189       +7

@user624086 user624086 requested a review from IanGrimstead August 8, 2019 06:04
IanGrimstead
IanGrimstead previously approved these changes Aug 16, 2019
Copy link
Contributor

@IanGrimstead IanGrimstead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat! 👍
Only stylistic change with the tests is to not bother with comments, just leave line breaks to format the test into 3 'paragraphs' - which are then assumed to be arrange, act, assert to save having to write the comments. But stick with the comments at least in the short term as you're getting used to it.

@IanGrimstead IanGrimstead merged commit ef8d654 into develop Aug 20, 2019
@IanGrimstead IanGrimstead deleted the exponential_emergence branch August 20, 2019 08:51
thanasions pushed a commit that referenced this pull request Sep 25, 2019
* argschecker updated #178

* Reverted to latest pdmarima (#212)

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Reverted to original test specification (#212)

* Reverted to original test specification (#212)

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* resolves #217

* resolves #217

* test

* test

* test

* test

* test

* test

* changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies

* changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies

* Updated Arima to use pmdarima rather than pyramid-arima (#212)

* Updated Arima to use pmdarima rather than pyramid-arima (#212)

* Test pmdarima 1.0.0 to test windows (#212)

Seeing if an earlier version of pmdarima works in windows

* Test pmdarima 1.0.0 to test windows (#212)

Seeing if an earlier version of pmdarima works in windows

* emtech report to file!

* emtech report to file!

* Reverted to latest pdmarima (#212)

* Reverted to latest pdmarima (#212)

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Reverted to original test specification (#212)

* Reverted to original test specification (#212)

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* analyzer ngrams processing was not stopping unigrams :)

* analyzer ngrams processing was not stopping unigrams :)

* adjusted tests to reflect bug fixes in stoplists processing

* adjusted tests to reflect bug fixes in stoplists processing

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* pmdarima>=110

* pmdarima>=110

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* rid of vectorizer. Only vocabulary needed

* rid of vectorizer. Only vocabulary needed

* 225 ridof pmdarima (#226)

* rid of vectorizer. Only vocabulary needed

* rid of pmd. Also realized that two of our test series were identical. No need to test them twice :)

* pmd left.

* just to check why one excepts and other doesn't

* rid of vectorizer. Only vocabulary needed

* scipy was the proble, in the end. Has to be >=1.2.1

* 225 ridof pmdarima (#226)

* rid of vectorizer. Only vocabulary needed

* rid of pmd. Also realized that two of our test series were identical. No need to test them twice :)

* pmd left.

* just to check why one excepts and other doesn't

* rid of vectorizer. Only vocabulary needed

* scipy was the proble, in the end. Has to be >=1.2.1

* 223 pipeline bug (#224)

* rid of vectorizer. Only vocabulary needed

* pickle-depickle tfidf test now represents different executions (#223)
WordAnalyser reset between calls to main() - will catch if stopwords
etc not populated

* 223 pipeline bug (#224)

* rid of vectorizer. Only vocabulary needed

* pickle-depickle tfidf test now represents different executions (#223)
WordAnalyser reset between calls to main() - will catch if stopwords
etc not populated

* Travis now reports python packages in use

Added `pip freeze` to travis.yml

* Travis now reports python packages in use

Added `pip freeze` to travis.yml

* Corrected pip listing of packages

* Corrected pip listing of packages

* 228 data path (#229)

* Removed override to 'data' path and added date info #228
Now reports date range of patents in use
* Removed 2nd construction of WordAnalyser #228

* 228 data path (#229)

* Removed override to 'data' path and added date info #228
Now reports date range of patents in use
* Removed 2nd construction of WordAnalyser #228

* 230 arima failing (#231)

* Alternative method to annoy ARIMA #230

* 230 arima failing (#231)

* Alternative method to annoy ARIMA #230

* 227 bug csv date (#233)


* Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227)

* Checks if DF date column is a string and converts to datetime #227

* Oops. Test failing as date_column not always corrected to datetime #227

* 227 bug csv date (#233)


* Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227)

* Checks if DF date column is a string and converts to datetime #227

* Oops. Test failing as date_column not always corrected to datetime #227

* csv dates come as strings. Type-check to see what's going on and conv… (#232)

* moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of..

* csv dates come as strings. Type-check to see what's going on and conv… (#232)

* moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of..

* Remove leading zero trimming (#235) (#239)

* Remove leading zero trimming (#235) (#239)

* added argument for embeddings threshold

* added argument for embeddings threshold

* resolves #250 (#251)

* scipy==1.2.1 else breaks

* new gensim breaks windows! Force 3.4.0

* resolves #250 (#251)

* scipy==1.2.1 else breaks

* new gensim breaks windows! Force 3.4.0

* filtering rows now gets rid of corresponding rows in df (#249)

* filtering rows now gets rid of corresponding rows in df
* gensim & scipy version limited due to introduced instability in current versions

* filtering rows now gets rid of corresponding rows in df (#249)

* filtering rows now gets rid of corresponding rows in df
* gensim & scipy version limited due to introduced instability in current versions

* Update pygrams.py

Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com>

* Update pygrams.py

Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com>

* 248 tfidf filter (#254)

* Added prefilter of terms (#248)

* 248 tfidf filter (#254)

* Added prefilter of terms (#248)

* del

* del

* Update README.md

Missing `.` on `pip install -e .`

* Corrected check for empty CPC list (#261)

* cache 2 initial commit! (#269)

* cache 2 initial commit!

* fix-imports was calling the properties nd populating tfidf_mat. Disabled it. Plus some cosmetics

* helper function to safeguard from None idf or tfidf

* 257 add nmf code (#271)

Added NMF output

* resolves #272 (#275)

* Dictionary used to store CPC rather than list inside data frame

* 273 dates as ints (#277)

* Dates now pickled as integer array to save space (#273)
Tidied up date related utilities - added to date_utils from utils
Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso
Column filter removed from DocumentsFilter
Removed time and CPC document weighting

* Update README.md

* 279 small adjustments (#280)

* Dates now pickled as integer array to save space (#273)
* Tidied up date related utilities - added to date_utils from utils
* Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso
* Column filter removed from DocumentsFilter
* Removed time and CPC document weighting
* Removed unused parameters and synchronised variable names (#273)
* Added timing report and progress reports

* 278 move mask (#283)

* resolves #278

* Changed folders for cached outputs (#281) (#284)

* 285 data uspto (#286)


* error checks change...
* resolves #286

* 287 update system requirements section (#288)

* Updated System Performance section (System Requirements)

* minor mods

* Small bug (#289)

* threshold not a list

* save time series to file (#270)

* Update README.md

-it option was outdated

* 291 bug (#292)

resolves #291

* 294 fb (#295)



* resolves #294

* 296 emtech facelift (#297)

* resolves #296

* 298 nltk installation (#299)

NLTK data now downloaded during execution of `pip install` (fixes #298)

* 256 tech report 2 (#301)

resolves # 256

* Ch comments (#304)

* ch comments

* Checking changes were propagated correctly #256 (#305)

* Checking changes were propagated correctly #256

* Checking changes were propagated correctly - more missing #256

* Few american spellings caught #256

* Exponential emergence (#306)

* add exponential emergence

* #255 convert r scripts (#308)

state space model resolves #308 #255

* General facelift

* Refactoring for readability
* Corrected issue with calculation of Porter (was using head not tail of dataset)

* State space (#317)

* cache state-space data!

* two-stage grid search

* Corrected test with duplicated args (good spot...)
Now copes if min/max time series dates are not defined

* If smoothing not requested, ensure None is returned for smoothed dictionary

* Default predictor set now excludes LSTMs

* 319 cache (#320)

* #319 updated code and tests to reflect new cache usage

* 321 test stopwords (#322)


* #321 added stopwords to test folder for test specific variant

* #319 consistent cmd line args, GloVe can now be placed anywhere

* 315 clamp redo (#323)

* #315 clamp smoothed values at 0
* cast smoothed data back to lists (from numpy arrays) for consistency
* command line args now restricted to available smoothing and emergence
* added simple test for holt-winters to confirm -ve values not handled

* 326 mpq (#327)

* mpq tweak and cached data

* #328 added tests for example command line (#329)

* #328 added tests for example command line
* fixed: date not defined when not required causes failure
* #328 corrected execution folder for README tests

* Corrected merge

* Whitespace changes ready for merge to master

* Cleanup state space modelling

* Whups. Now checks tests again and only runs on travis... and not win32

* Whups. Now checks tests again and only runs on travis... and not win32

* 324 state space predictions (#325)

* #324 create table from state space results - work in progress
* tests TBA

* first commit

* #324 create table from state space results - with tests
* Trimmed SD not implemented

* #324 Trimmed SD implemented

* #324 report window size to HTML

* #324 WIP - needs refinement, but works for non-test. Test may blow graph generation.

* #328 multiplot added as option

* Cleanup state space modelling

* Whups. Now checks tests again and only runs on travis... and not win32

* Whups. Now checks tests again and only runs on travis... and not win32

* Merge issue with SSM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants