Remove `pattern` dependency #3012

mpenkov · 2020-12-18T06:39:02Z

Fix Cannot use gensim 3.8.x when nltk package is installed #2697

Trying to work around "has no attribute '__reduce_cython__'" problem

Why was it removed? Parts of the code still need it.

gensim/scripts/segment_wiki.py

setup.py

This reverts commit 4e52814.

docs/src/scripts/make_wiki_online.rst

mpenkov · 2020-12-26T12:48:08Z

Is there anything left to do here? I think this may be ready to merge.

gensim/scripts/segment_wiki.py

piskvorky · 2020-12-26T20:10:14Z

Yes, looks ready, but all tests are failing.

mpenkov · 2020-12-27T04:39:09Z

Those failures are unrelated to this PR. In between me making this PR one week ago, and the build triggered by your changes to segment_wiki.py yesterday, there has been a new sklearn release and they deprecated one of the modules we were relying on.

$ python
Python 3.8.5 (default, Jul 21 2020, 10:48:26)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn.datasets.base
/Users/misha/envs/gensim/lib/python3.8/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.datasets.base module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.datasets. Anything that cannot be imported from sklearn.datasets is now part of the private API.
  warnings.warn(message, FutureWarning)

I'll disable the failing tests and open a ticket to deal with the problem properly.

mpenkov · 2020-12-27T07:42:12Z

#3016

gojomo · 2020-12-27T20:49:10Z

This is another case where for the sake of maintaining the tests a trustworthy indicator of whether the code is ready for release, I would not be so quick to disable them, even with a note-linking-to-a-pending-issue.

If previous functionality is now broken due to outside library changes – and also broken on develop, not just this PR branch where it's 1st noticed – the tests should remain failing until either (a) a true fix/update is made; or (b) an explicit decision is made/announced/added-to-release-notes that such functionality is no longer supported (or temporarily not supported).

If test-disablement is chosen, because that functionality can be ignored for a while, it'd be best done in a standalone PR that's applied to develop separately, rather than mixed with a different set of in-process changes.

piskvorky · 2020-12-27T21:40:36Z

@mpenkov can you please look into removing sklearn, as the proper fix, independent of this PR? (part of #2852, due for 4.0.0 anyway).

I agree the changes to unit tests do not belong here.

mpenkov · 2020-12-28T01:00:14Z

Sure, I'll have a look at it, most likely in the New Year.

I disabled the tests because it was least bad alternative out of the following list:

Do not merge until tests are properly fixed (I don't know when that will be,, and chances are this PR will be stale by then)
Merge with broken tests (this is bad form, because now other contributors will trip over the same failures)
Disable unit tests temporarily

If you disagree with my decision, I can re-enable the tests in a separate PR. It's simple.

gojomo · 2020-12-29T01:24:21Z

Merge with broken tests (this is bad form, because now other contributors will trip over the same failures)

If this PR's changes caused the test-failures (even via bumping dependency versions), I'd agree that merging it would be a problem. But, I'm pretty sure develop without this PR would also be failing, given the upstream changes. The develop trunk only looks 'healthy' because no build/test has been triggered, and this PR was only "unlucky" to be the 1st experiencing the side-effects of external changes.

Given that, merging this PR, with that known-but-unrelated problem, wouldn't create any new problems for others, so I think (2) would be OK. With regard to passing tests, I think the precise policy could be: "any PR shouldn't break any tests that were working before the PR - but it doesn't have to fix everything that's currently broken".

get rid of pattern dependency

6437e87

mpenkov added the breaks backward-compatibility label Dec 18, 2020

mpenkov requested review from gojomo and piskvorky December 18, 2020 06:39

mpenkov added 4 commits December 18, 2020 15:50

get rid of six import in mmreader.pyx

7f39e2d

bump cython version to 0.29.21

41582f6

Trying to work around "has no attribute '__reduce_cython__'" problem

add six to list of dependencies

4e52814

Why was it removed? Parts of the code still need it.

rm removed file from docs

4f59c7d

piskvorky reviewed Dec 18, 2020

View reviewed changes

gensim/scripts/segment_wiki.py Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

mpenkov added 6 commits December 19, 2020 09:02

Revert "add six to list of dependencies"

22d6441

This reverts commit 4e52814.

remove unused six import

577a84a

add friendly message

6fd6d03

update gitignore to include cython output

0c66fcf

update gitignore

c9d7884

fix build

013d9f0

mpenkov commented Dec 19, 2020

View reviewed changes

docs/src/scripts/make_wiki_online.rst Outdated Show resolved Hide resolved

mpenkov added 2 commits December 19, 2020 10:36

Update docs/src/scripts/make_wiki_online.rst

c96d12f

more friendliness

1afe676

piskvorky reviewed Dec 26, 2020

View reviewed changes

gensim/scripts/segment_wiki.py Outdated Show resolved Hide resolved

gensim/scripts/segment_wiki.py Outdated Show resolved Hide resolved

piskvorky added 2 commits December 26, 2020 14:45

Update gensim/scripts/segment_wiki.py

120b0ae

Update gensim/scripts/segment_wiki.py

ee8b4f2

skip broken tests

fe2eb5d

flake8 fix

ac4c70e

Update CHANGELOG.md

cf51910

mpenkov changed the title ~~get rid of pattern dependency~~ remove pattern dependency Jan 17, 2021

mpenkov merged commit 67f45da into develop Jan 17, 2021

mpenkov deleted the rmpattern branch January 17, 2021 07:14

piskvorky changed the title ~~remove pattern dependency~~ Remove pattern dependency Mar 21, 2021

piskvorky mentioned this pull request Sep 17, 2021

ImportError: cannot import name 'lemmatize' from 'gensim.utils' #3239

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `pattern` dependency #3012

Remove `pattern` dependency #3012

mpenkov commented Dec 18, 2020

mpenkov commented Dec 26, 2020

piskvorky commented Dec 26, 2020

mpenkov commented Dec 27, 2020 •

edited

Loading

mpenkov commented Dec 27, 2020

gojomo commented Dec 27, 2020

piskvorky commented Dec 27, 2020 •

edited

Loading

mpenkov commented Dec 28, 2020

gojomo commented Dec 29, 2020

Remove pattern dependency #3012

Remove pattern dependency #3012

Conversation

mpenkov commented Dec 18, 2020

mpenkov commented Dec 26, 2020

piskvorky commented Dec 26, 2020

mpenkov commented Dec 27, 2020 • edited Loading

mpenkov commented Dec 27, 2020

gojomo commented Dec 27, 2020

piskvorky commented Dec 27, 2020 • edited Loading

mpenkov commented Dec 28, 2020

gojomo commented Dec 29, 2020

Remove `pattern` dependency #3012

Remove `pattern` dependency #3012

mpenkov commented Dec 27, 2020 •

edited

Loading

piskvorky commented Dec 27, 2020 •

edited

Loading