-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Corpus] AIHub: 번역데이터 loaders #136
Comments
lovit
added a commit
that referenced
this issue
Oct 15, 2020
lovit
added a commit
that referenced
this issue
Oct 15, 2020
로컬테스트 코드입니다 @ratsgo import sys
sys.path.insert(0, '../')
import contextlib
import os
import sys
from colored import fg, bg, attr
from Korpora.korpus_aihub_translation import (
AIHubTranslationKorpus,
AIHubSpokenTranslationKorpus,
AIHubConversationTranslationKorpus,
AIHubNewsTranslationKorpus,
AIHubKoreanCultureTranslationKorpus,
AIHubDecreeTranslationKorpus,
AIHubGovernmentWebsiteTranslationKorpus,
)
## SET ARGUMENT ##
CUSTOM_DIR = 'path/to/AIHub_Translation/'
@contextlib.contextmanager
def nostdout():
save_stdout = sys.stdout
sys.stdout = open(os.devnull, "w")
yield
sys.stdout = save_stdout
korpus_class_lengths = [
(AIHubTranslationKorpus, 1602418 ),
(AIHubSpokenTranslationKorpus, 400000),
(AIHubConversationTranslationKorpus, 100000),
(AIHubNewsTranslationKorpus, 801387),
(AIHubKoreanCultureTranslationKorpus, 100646),
(AIHubDecreeTranslationKorpus, 100298),
(AIHubGovernmentWebsiteTranslationKorpus, 100087),
]
for korpus_class, length in korpus_class_lengths:
classname = korpus_class.__class__.__name__
with nostdout():
corpus = korpus_class()
assert len(corpus.train) == length
print(f'{fg(2)} passed {classname} with default dir {attr(0)}')
if CUSTOM_DIR:
with nostdout():
corpus = korpus_class(CUSTOM_DIR)
assert len(corpus.train) == length
print(f'{fg(2)} passed {classname} with custom dir {attr(0)}')
print('LENGTH', len(corpus.train)) |
lovit
added a commit
that referenced
this issue
Oct 15, 2020
lovit
added a commit
that referenced
this issue
Oct 15, 2020
lovit
added a commit
that referenced
this issue
Oct 15, 2020
lovit
added a commit
that referenced
this issue
Nov 1, 2020
3 tasks
lovit
added a commit
that referenced
this issue
Nov 2, 2020
Unify variable name: `root_dir_or_paths` -> `root_dir` (#136)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
.xlsx
)(구어체)
(대화체)
(문어체 뉴스)
(문어체 한국문화)
(문어체 조례)
(문어체 지자체웹사이트)
The text was updated successfully, but these errors were encountered: