You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from Korpora import Korpora
corpus = Korpora.load("kcbert")
terminal case
korpora fetch --corpus kowikitext
오류내용:
Korpora 는 다른 분들이 연구 목적으로 공유해주신 말뭉치들을
손쉽게 다운로드, 사용할 수 있는 기능만을 제공합니다.
말뭉치들을 공유해 주신 분들에게 감사드리며, 각 말뭉치 별 설명과 라이센스를 공유 드립니다.
해당 말뭉치에 대해 자세히 알고 싶으신 분은 아래의 description 을 참고,
해당 말뭉치를 연구/상용의 목적으로 이용하실 때에는 아래의 라이센스를 참고해 주시기 바랍니다.
공개된 한국어 BERT는 대부분 한국어 위키, 뉴스 기사, 책 등 잘 정제된 데이터를 기반으로 학습한 모델입니다.
한편, 실제로 NSMC와 같은 댓글형 데이터셋은 정제되지 않았고 구어체 특징에 신조어가 많으며,
오탈자 등 공식적인 글쓰기에서 나타나지 않는 표현들이 빈번하게 등장합니다.
KcBERT는 위와 같은 특성의 데이터셋에 적용하기 위해, 네이버 뉴스에서 댓글과 대댓글을 수집해,
토크나이저와 BERT모델을 처음부터 학습한 Pretrained BERT 모델입니다.
KcBERT는 Huggingface의 Transformers 라이브러리를 통해 간편히 불러와 사용할 수 있습니다.
(별도의 파일 다운로드가 필요하지 않습니다.)
License
MIT License
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1350, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1424, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1124)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/hyungrakkim/Google Drive/practice_study/ko_bigbird/data_preprocessing/korpora_dataset/korpora_data_test.py", line 10, in
corpus = Korpora.load("kcbert")
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/loader.py", line 47, in load
corpora = [KORPUS[corpus_name](root_dir, force_download) for corpus_name in corpus_names]
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/loader.py", line 47, in
corpora = [KORPUS[corpus_name](root_dir, force_download) for corpus_name in corpus_names]
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/korpus_kcbert.py", line 50, in init
fetch_kcbert(root_dir, force_download)
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/korpus_kcbert.py", line 74, in fetch_kcbert
fetch(info['url'], local_path, 'kcbert', force_download)
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/utils.py", line 215, in fetch
web_download(remote_path, destination, corpus_name, force_download)
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/utils.py", line 110, in web_download
site = request.urlopen(url)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1393, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1353, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1124)>
The text was updated successfully, but these errors were encountered:
환경:
맥 OS
Python 3.8
pip install korpora 설치
code case
terminal case
오류내용:
Korpora 는 다른 분들이 연구 목적으로 공유해주신 말뭉치들을
손쉽게 다운로드, 사용할 수 있는 기능만을 제공합니다.
말뭉치들을 공유해 주신 분들에게 감사드리며, 각 말뭉치 별 설명과 라이센스를 공유 드립니다.
해당 말뭉치에 대해 자세히 알고 싶으신 분은 아래의 description 을 참고,
해당 말뭉치를 연구/상용의 목적으로 이용하실 때에는 아래의 라이센스를 참고해 주시기 바랍니다.
Description
Author : beomi@github
Repository : https://github.com/Beomi/KcBERT/
References :
공개된 한국어 BERT는 대부분 한국어 위키, 뉴스 기사, 책 등 잘 정제된 데이터를 기반으로 학습한 모델입니다.
한편, 실제로 NSMC와 같은 댓글형 데이터셋은 정제되지 않았고 구어체 특징에 신조어가 많으며,
오탈자 등 공식적인 글쓰기에서 나타나지 않는 표현들이 빈번하게 등장합니다.
KcBERT는 위와 같은 특성의 데이터셋에 적용하기 위해, 네이버 뉴스에서 댓글과 대댓글을 수집해,
토크나이저와 BERT모델을 처음부터 학습한 Pretrained BERT 모델입니다.
KcBERT는 Huggingface의 Transformers 라이브러리를 통해 간편히 불러와 사용할 수 있습니다.
(별도의 파일 다운로드가 필요하지 않습니다.)
License
MIT License
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1350, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1424, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1124)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/hyungrakkim/Google Drive/practice_study/ko_bigbird/data_preprocessing/korpora_dataset/korpora_data_test.py", line 10, in
corpus = Korpora.load("kcbert")
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/loader.py", line 47, in load
corpora = [KORPUS[corpus_name](root_dir, force_download) for corpus_name in corpus_names]
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/loader.py", line 47, in
corpora = [KORPUS[corpus_name](root_dir, force_download) for corpus_name in corpus_names]
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/korpus_kcbert.py", line 50, in init
fetch_kcbert(root_dir, force_download)
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/korpus_kcbert.py", line 74, in fetch_kcbert
fetch(info['url'], local_path, 'kcbert', force_download)
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/utils.py", line 215, in fetch
web_download(remote_path, destination, corpus_name, force_download)
File "/Users/hyungrakkim/Library/Caches/pypoetry/virtualenvs/bigbird-J4TmDNlf-py3.8/lib/python3.8/site-packages/Korpora/utils.py", line 110, in web_download
site = request.urlopen(url)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1393, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1353, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1124)>
The text was updated successfully, but these errors were encountered: