Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emergent mode fails with AttributeError: 'str' object has no attribute 'isocalendar' #227

Closed
pjlegato opened this issue Apr 3, 2019 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@pjlegato
Copy link

pjlegato commented Apr 3, 2019

Describe the bug
When running with -emt, CSV input, and a date field given in the required format YYYY/MM/DD, it crashes with:

  File "/Users/paul/src/pyGrams/scripts/tfidf_reduce.py", line 77, in <listcomp>
    [d.isocalendar() for d in dates]]
AttributeError: 'str' object has no attribute 'isocalendar'

The code does not appear to be parsing the input date string as a date object; it attempts to call date methods on a string.

To Reproduce
Steps to reproduce the behavior:

  1. Create a CSV input file with a date field in YYYY/MM/DD format
  2. Run python pygrams.py -emt -ds=posts.csv -dh='date' -th text > emergent.out
  3. Crashes with above error

Expected behavior
Should not crash; should calculate emerging terms.

Desktop (please complete the following information):

  • OS: Mac OS
  • Python 3.7.2
@thanasions
Copy link
Contributor

Thanks for raising the issue. We are currently looking into this!

@thanasions thanasions added the bug Something isn't working label Apr 5, 2019
@IanGrimstead
Copy link
Contributor

Created branch 227-bug-csv-date

@IanGrimstead
Copy link
Contributor

3.7.3 running under Xenial linux dist; appveyor should be ok (worked with 3.7) - awaiting build...
So we should be fine with >3.7

IanGrimstead pushed a commit that referenced this issue Apr 5, 2019
thanasions pushed a commit that referenced this issue Apr 5, 2019
* Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227)

* Checks if DF date column is a string and converts to datetime #227

* Oops. Test failing as date_column not always corrected to datetime #227
@thanasions
Copy link
Contributor

bug is fixed on brunch develop now.

@IanGrimstead
Copy link
Contributor

Thanks for reporting the issue - a new build has been pushed to develop and master, so should now be fixed. Anything else not working, please raise it & we'll take a look.

mshodge pushed a commit that referenced this issue May 8, 2019
* Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227)

* Checks if DF date column is a string and converts to datetime #227

* Oops. Test failing as date_column not always corrected to datetime #227
thanasions pushed a commit that referenced this issue Sep 25, 2019
* argschecker updated #178

* Reverted to latest pdmarima (#212)

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Reverted to original test specification (#212)

* Reverted to original test specification (#212)

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* resolves #217

* resolves #217

* test

* test

* test

* test

* test

* test

* changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies

* changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies

* Updated Arima to use pmdarima rather than pyramid-arima (#212)

* Updated Arima to use pmdarima rather than pyramid-arima (#212)

* Test pmdarima 1.0.0 to test windows (#212)

Seeing if an earlier version of pmdarima works in windows

* Test pmdarima 1.0.0 to test windows (#212)

Seeing if an earlier version of pmdarima works in windows

* emtech report to file!

* emtech report to file!

* Reverted to latest pdmarima (#212)

* Reverted to latest pdmarima (#212)

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Removed erroneous 2nd arima fit (#212)

arima fits on construction, don't need to explicitly call 'fit'

* Reverted to original test specification (#212)

* Reverted to original test specification (#212)

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Added version debug code (#212)

As requested on pmdarima bug reporting page

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* Report warnings and errors (#212)

May have accidentely been surpressing errors - that could be reporting why the test fails

* analyzer ngrams processing was not stopping unigrams :)

* analyzer ngrams processing was not stopping unigrams :)

* adjusted tests to reflect bug fixes in stoplists processing

* adjusted tests to reflect bug fixes in stoplists processing

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* pmdarima>=110

* pmdarima>=110

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf

* rid of vectorizer. Only vocabulary needed

* rid of vectorizer. Only vocabulary needed

* 225 ridof pmdarima (#226)

* rid of vectorizer. Only vocabulary needed

* rid of pmd. Also realized that two of our test series were identical. No need to test them twice :)

* pmd left.

* just to check why one excepts and other doesn't

* rid of vectorizer. Only vocabulary needed

* scipy was the proble, in the end. Has to be >=1.2.1

* 225 ridof pmdarima (#226)

* rid of vectorizer. Only vocabulary needed

* rid of pmd. Also realized that two of our test series were identical. No need to test them twice :)

* pmd left.

* just to check why one excepts and other doesn't

* rid of vectorizer. Only vocabulary needed

* scipy was the proble, in the end. Has to be >=1.2.1

* 223 pipeline bug (#224)

* rid of vectorizer. Only vocabulary needed

* pickle-depickle tfidf test now represents different executions (#223)
WordAnalyser reset between calls to main() - will catch if stopwords
etc not populated

* 223 pipeline bug (#224)

* rid of vectorizer. Only vocabulary needed

* pickle-depickle tfidf test now represents different executions (#223)
WordAnalyser reset between calls to main() - will catch if stopwords
etc not populated

* Travis now reports python packages in use

Added `pip freeze` to travis.yml

* Travis now reports python packages in use

Added `pip freeze` to travis.yml

* Corrected pip listing of packages

* Corrected pip listing of packages

* 228 data path (#229)

* Removed override to 'data' path and added date info #228
Now reports date range of patents in use
* Removed 2nd construction of WordAnalyser #228

* 228 data path (#229)

* Removed override to 'data' path and added date info #228
Now reports date range of patents in use
* Removed 2nd construction of WordAnalyser #228

* 230 arima failing (#231)

* Alternative method to annoy ARIMA #230

* 230 arima failing (#231)

* Alternative method to annoy ARIMA #230

* 227 bug csv date (#233)


* Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227)

* Checks if DF date column is a string and converts to datetime #227

* Oops. Test failing as date_column not always corrected to datetime #227

* 227 bug csv date (#233)


* Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227)

* Checks if DF date column is a string and converts to datetime #227

* Oops. Test failing as date_column not always corrected to datetime #227

* csv dates come as strings. Type-check to see what's going on and conv… (#232)

* moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of..

* csv dates come as strings. Type-check to see what's going on and conv… (#232)

* moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of..

* Remove leading zero trimming (#235) (#239)

* Remove leading zero trimming (#235) (#239)

* added argument for embeddings threshold

* added argument for embeddings threshold

* resolves #250 (#251)

* scipy==1.2.1 else breaks

* new gensim breaks windows! Force 3.4.0

* resolves #250 (#251)

* scipy==1.2.1 else breaks

* new gensim breaks windows! Force 3.4.0

* filtering rows now gets rid of corresponding rows in df (#249)

* filtering rows now gets rid of corresponding rows in df
* gensim & scipy version limited due to introduced instability in current versions

* filtering rows now gets rid of corresponding rows in df (#249)

* filtering rows now gets rid of corresponding rows in df
* gensim & scipy version limited due to introduced instability in current versions

* Update pygrams.py

Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com>

* Update pygrams.py

Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com>

* 248 tfidf filter (#254)

* Added prefilter of terms (#248)

* 248 tfidf filter (#254)

* Added prefilter of terms (#248)

* del

* del

* Update README.md

Missing `.` on `pip install -e .`

* Corrected check for empty CPC list (#261)

* cache 2 initial commit! (#269)

* cache 2 initial commit!

* fix-imports was calling the properties nd populating tfidf_mat. Disabled it. Plus some cosmetics

* helper function to safeguard from None idf or tfidf

* 257 add nmf code (#271)

Added NMF output

* resolves #272 (#275)

* Dictionary used to store CPC rather than list inside data frame

* 273 dates as ints (#277)

* Dates now pickled as integer array to save space (#273)
Tidied up date related utilities - added to date_utils from utils
Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso
Column filter removed from DocumentsFilter
Removed time and CPC document weighting

* Update README.md

* 279 small adjustments (#280)

* Dates now pickled as integer array to save space (#273)
* Tidied up date related utilities - added to date_utils from utils
* Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso
* Column filter removed from DocumentsFilter
* Removed time and CPC document weighting
* Removed unused parameters and synchronised variable names (#273)
* Added timing report and progress reports

* 278 move mask (#283)

* resolves #278

* Changed folders for cached outputs (#281) (#284)

* 285 data uspto (#286)


* error checks change...
* resolves #286

* 287 update system requirements section (#288)

* Updated System Performance section (System Requirements)

* minor mods

* Small bug (#289)

* threshold not a list

* save time series to file (#270)

* Update README.md

-it option was outdated

* 291 bug (#292)

resolves #291

* 294 fb (#295)



* resolves #294

* 296 emtech facelift (#297)

* resolves #296

* 298 nltk installation (#299)

NLTK data now downloaded during execution of `pip install` (fixes #298)

* 256 tech report 2 (#301)

resolves # 256

* Ch comments (#304)

* ch comments

* Checking changes were propagated correctly #256 (#305)

* Checking changes were propagated correctly #256

* Checking changes were propagated correctly - more missing #256

* Few american spellings caught #256

* Exponential emergence (#306)

* add exponential emergence

* #255 convert r scripts (#308)

state space model resolves #308 #255

* General facelift

* Refactoring for readability
* Corrected issue with calculation of Porter (was using head not tail of dataset)

* State space (#317)

* cache state-space data!

* two-stage grid search

* Corrected test with duplicated args (good spot...)
Now copes if min/max time series dates are not defined

* If smoothing not requested, ensure None is returned for smoothed dictionary

* Default predictor set now excludes LSTMs

* 319 cache (#320)

* #319 updated code and tests to reflect new cache usage

* 321 test stopwords (#322)


* #321 added stopwords to test folder for test specific variant

* #319 consistent cmd line args, GloVe can now be placed anywhere

* 315 clamp redo (#323)

* #315 clamp smoothed values at 0
* cast smoothed data back to lists (from numpy arrays) for consistency
* command line args now restricted to available smoothing and emergence
* added simple test for holt-winters to confirm -ve values not handled

* 326 mpq (#327)

* mpq tweak and cached data

* #328 added tests for example command line (#329)

* #328 added tests for example command line
* fixed: date not defined when not required causes failure
* #328 corrected execution folder for README tests

* Corrected merge

* Whitespace changes ready for merge to master

* Cleanup state space modelling

* Whups. Now checks tests again and only runs on travis... and not win32

* Whups. Now checks tests again and only runs on travis... and not win32

* 324 state space predictions (#325)

* #324 create table from state space results - work in progress
* tests TBA

* first commit

* #324 create table from state space results - with tests
* Trimmed SD not implemented

* #324 Trimmed SD implemented

* #324 report window size to HTML

* #324 WIP - needs refinement, but works for non-test. Test may blow graph generation.

* #328 multiplot added as option

* Cleanup state space modelling

* Whups. Now checks tests again and only runs on travis... and not win32

* Whups. Now checks tests again and only runs on travis... and not win32

* Merge issue with SSM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants