Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug in rasa/data.py #5106

Closed
jehnes opened this issue Jan 22, 2020 · 4 comments · Fixed by #5183
Closed

Potential bug in rasa/data.py #5106

jehnes opened this issue Jan 22, 2020 · 4 comments · Fixed by #5183
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework stale type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@jehnes
Copy link

jehnes commented Jan 22, 2020

Hello Rasa team,

I have the following problem running Rasa and in consequence Rasa-X on my setup. I was able to fix it in the rasa code, but in rasa-X it is not so straight forward as it involves some some fidelling with docker files.

My setup:

I use a linux workstation (Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)) with python3.6 as my rasa server as well as the docker host for rasa-X. I usually connect to this workstation remotely from my MacBook Pro (macOS 10.15.2) and I do all the editing (stories files, domain files etc) on this mac.

The problem:

Rasa has problems importing my story files: (here is a excerpt of the logs from Rasa-X, as I'm currently trying to get Rasa-X working as shown in Ep #9 -Rasa Masterclass)

rasa-x_1 | Job "GitService.run_background_synchronization (trigger: cron[minute='*'], next run at: 2020-01-22 09:48:00 UTC)" raised an exception
rasa-x_1 | Traceback (most recent call last):
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job
rasa-x_1 | retval = job.func(*job.args, **job.kwargs)
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasax/community/services/git_service.py", line 742, in run_background_synchronization
rasa-x_1 | git_service.synchronize_project(force_data_injection)
rasa-x_1 | File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasax/community/services/git_service.py", line 599, in synchronize_project
rasa-x_1 | await self._inject_data()
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasax/community/services/git_service.py", line 629, in _inject_data
rasa-x_1 | str(self.repository_path()), str(data_path), self.session, SYSTEM_USER
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasax/community/initialise.py", line 299, in inject_files_from_disk
rasa-x_1 | story_files, nlu_files = rasa.data.get_core_nlu_files([data_path])
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasa/data.py", line 92, in get_core_nlu_files
rasa-x_1 | path
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasa/data.py", line 115, in _find_core_nlu_files_in_directory
rasa-x_1 | elif is_story_file(full_path):
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasa/data.py", line 153, in is_story_file
rasa-x_1 | _is_story_file = any(_contains_story_pattern(l) for l in f)
rasa-x_1 | File "/usr/local/lib/python3.6/site-packages/rasa/data.py", line 153, in
rasa-x_1 | _is_story_file = any(_contains_story_pattern(l) for l in f)
rasa-x_1 | File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
rasa-x_1 | (result, consumed) = self._buffer_decode(data, self.errors, final)
rasa-x_1 | UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte

I assume that the file encoding from the mac trips up the file loader on Linux.

My Fix for Rasa:

I was able to fix this for rasa by adding the parameter errors="surrogateescape" to the with open(...line (line 152) in the function is_story_file(file_path: Text) -> bool: in the file python3.6/site-packages/rasa/data.py :

`def is_story_file(file_path: Text) -> bool:
"""Checks if a file is a Rasa story file.

Args:
    file_path: Path of the file which should be checked.

Returns:
    `True` if it's a story file, otherwise `False`.
"""
_is_story_file = False

if file_path.endswith(".md"):
    #with open(file_path, encoding=DEFAULT_ENCODING) as f:  (JE: This is the original)
    with open(file_path, encoding=DEFAULT_ENCODING, errors="surrogateescape") as f:
        _is_story_file = any(_contains_story_pattern(l) for l in f)

return _is_story_file

`

More about this error handlers in python3's textprocessing here: http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html

Conclusion:

From my point of view it looks like a bug in rasa, as the files saved by the macBook look fine, even when opened in an editor on linux and even if the file was somehow invalid, rasa should not crash/throw an exception but point out the error instead.

I would suggest that you add this additional parameter to the open function in rasa's data.py file to make rasa more robust. It would be great if you could also fix that in the docker files of Rasa-X.

Best regards,
Jochen

@sara-tagger
Copy link
Collaborator

Thanks for the issue, @tmbo will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@tmbo tmbo added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Feb 4, 2020
tmbo added a commit that referenced this issue Feb 4, 2020
@tmbo
Copy link
Member

tmbo commented Feb 4, 2020

Thanks a lot for the suggestion, I've made the file loading more robust 👍

tmbo added a commit that referenced this issue Feb 10, 2020
tmbo added a commit that referenced this issue Feb 10, 2020
tmbo added a commit that referenced this issue Feb 10, 2020
tmbo added a commit that referenced this issue Feb 11, 2020
@stale
Copy link

stale bot commented May 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label May 6, 2020
@stale
Copy link

stale bot commented May 13, 2020

This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.

@stale stale bot closed this as completed May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework stale type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants