Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KoWikiText LM data 생성 이슈 #203

Open
Beomi opened this issue May 1, 2021 · 1 comment
Open

KoWikiText LM data 생성 이슈 #203

Beomi opened this issue May 1, 2021 · 1 comment

Comments

@Beomi
Copy link

Beomi commented May 1, 2021

env

  • korpora == 0.2.0
  • python ~= 3.8

Issue

command

아래 커맨드 실행시 에러 발생

korpora lmdata \
  --corpus all \
  --output_dir ~/works/lmdata

Error log

Create train data from kowikitext: 0it [00:00, ?it/s]

| Done | Corpus name               | Num sents  | File name |
| ---- | ------------------------- | ---------- | --------- |
|  x   | kcbert                    |   86246284 | all.train |
|  x   | korean_chatbot_data       |      23646 | all.train |
|  x   | korean_hate_speech        |    2042260 | all.train |
|  x   | korean_parallel_koen_news |      97123 | all.train |
|  x   | korean_petitions          |     867262 | all.train |
|  x   | kornli                    |    1900708 | all.train |
|  x   | korsts                    |      17256 | all.train |
|      | kowikitext                |  -         |           |
|      | namuwikitext              |  -         |           |
|      | naver_changwon_ner        |  -         |           |
|      | nsmc                      |  -         |           |
|      | question_pair             |  -         |           |
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.train.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.train
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.test.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.test
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.dev.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.dev
Create train data from kowikitext: 0it [00:02, ?it/s]
Traceback (most recent call last):
  File "/home/beomi/anaconda3/envs/deepspeed/bin/korpora", line 8, in <module>
    sys.exit(main())
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/cli.py", line 64, in main
    task_function(args)
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/task_lmdata.py", line 47, in create_lmdata
    for i_sent, sent in enumerate(sent_iterator):
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/tqdm/std.py", line 1133, in __iter__
    for obj in iterable:
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/task_lmdata.py", line 180, in iterate_kowikitext
    with open(path, encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/beomi/Korpora//kowiki/kowikitext_20200920.train'

ko wiki의 경우 kowikitext/kowikitext_.....으로 되어있어야 하는데, LM data 부분에서는 /kowiki/kowikitext_....으로 오타가 있는 듯 합니다.

@Beomi
Copy link
Author

Beomi commented May 1, 2021

#187 이슈에서 이미 체크된 부분인 것 같습니다. 이슈는 0.3.0 릴리즈하시고 닫아주세요.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant