Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue282 handling syntax errors in projects config #300

Merged
merged 6 commits into from
Aug 23, 2019

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Jul 17, 2019

Raise ConfigurationException instead of full traceback in following situations:

  1. when trying to use (train/suggest) a project without a backend set in projects.cfg:

    $ annif train nobackend -p tests/projects.cfg any_trainingdata.tsv 
    Error: Project 'nobackend': backend setting is missing
    

    This is in line with missing vocab error: "Error: Project 'maui-fi': vocab setting is missing".

  2. when trying to use projects.cfg that has duplicated sections or entries:

    $ annif train any_backend -p tests/projects_invalid.cfg any_trainingdata.tsv 
    Error: Error: While reading from 'tests/projects_invalid.cfg' [line 10]: section 'duplicatedproject' already exists
    

This closes #282.

@juhoinkinen juhoinkinen marked this pull request as ready for review August 8, 2019 10:23
@osma osma mentioned this pull request Aug 8, 2019
@osma
Copy link
Member

osma commented Aug 8, 2019

Travis Python 3.5 build is failing, I think VW needs to be upgraded first. Maybe merge #296?

@osma
Copy link
Member

osma commented Aug 8, 2019

failing build log: https://travis-ci.org/NatLibFi/Annif/jobs/559957266

@codecov
Copy link

codecov bot commented Aug 8, 2019

Codecov Report

Merging #300 into master will increase coverage by 19.59%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #300       +/-   ##
===========================================
+ Coverage    79.8%   99.39%   +19.59%     
===========================================
  Files          55       55               
  Lines        2842     2969      +127     
===========================================
+ Hits         2268     2951      +683     
+ Misses        574       18      -556
Impacted Files Coverage Δ
annif/default_config.py 100% <100%> (ø) ⬆️
tests/test_project.py 100% <100%> (+7.75%) ⬆️
annif/project.py 100% <100%> (+1.21%) ⬆️
annif/corpus/subject.py 100% <0%> (ø) ⬆️
annif/corpus/types.py 100% <0%> (ø) ⬆️
annif/corpus/document.py 100% <0%> (ø) ⬆️
annif/corpus/convert.py 100% <0%> (ø) ⬆️
tests/test_corpus.py 100% <0%> (ø) ⬆️
annif/analyzer/__init__.py 92.59% <0%> (+3.7%) ⬆️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d13f369...47d4f2a. Read the comment docs.

@osma
Copy link
Member

osma commented Aug 12, 2019

Looks good, but there is one further case that could be covered. If analyzer is left unspecified, I get this:

  File "/home/oisuomin/git/Annif/annif/project.py", line 188, in _create_vectorizer
    tokenizer=self.analyzer.tokenize_words)
AttributeError: 'NoneType' object has no attribute 'tokenize_words'

Should this case be covered too, or left for another issue/PR?

@juhoinkinen
Copy link
Member Author

Looks good, but there is one further case that could be covered. If analyzer is left unspecified, I get this:

  File "/home/oisuomin/git/Annif/annif/project.py", line 188, in _create_vectorizer
    tokenizer=self.analyzer.tokenize_words)
AttributeError: 'NoneType' object has no attribute 'tokenize_words'

Should this case be covered too, or left for another issue/PR?

I'll check if this can be easily covered too and implement, otherwise I'll make a new issue about this.

@juhoinkinen
Copy link
Member Author

Looks good, but there is one further case that could be covered. If analyzer is left unspecified, I get this:

  File "/home/oisuomin/git/Annif/annif/project.py", line 188, in _create_vectorizer
    tokenizer=self.analyzer.tokenize_words)
AttributeError: 'NoneType' object has no attribute 'tokenize_words'

Should this case be covered too, or left for another issue/PR?

Implemented raising ConfigurationError for this in the previous commit. For the test I changed the [noanalyzer] project to use TFIDF backend because dummy backend does not need analyzer and would not raise the ConfigurationError. That project was included for issue #148. I did not see any reason for the backend to remain dummy, but could @osma confirm?

Copy link
Member

@osma osma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, changing [noanalyzer] to use tfidf should be fine. It was originally added in this commit where the intent was to verify that it's possible to have a project without an analyzer. But it might be necessary to add more project configurations e.g. "fasttext-noanalyzer", "vw_multi-noanalyzer" etc. to cover all the situations where an analyzer is needed but is missing.

annif/project.py Outdated Show resolved Hide resolved
annif/project.py Show resolved Hide resolved
@osma osma merged commit b660eed into master Aug 23, 2019
@osma osma deleted the issue282-handling-syntax-errors-in-projects-config branch August 23, 2019 12:17
@juhoinkinen juhoinkinen added this to the 0.42 milestone Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling errors in projects.cfg
2 participants