Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(EDA) 외국어, 특수기호 분석 #20

Open
dnjdsxor21 opened this issue May 16, 2023 · 0 comments
Open

(EDA) 외국어, 특수기호 분석 #20

dnjdsxor21 opened this issue May 16, 2023 · 0 comments

Comments

@dnjdsxor21
Copy link
Contributor

Background

한자, 일본어, 아랍어, 특수기호가 [UNK]로 변환되는 경우가 다수 존재
스크린샷 2023-05-16 14 01 53

한자, 일본어를 한글발음으로 변환

  • 한자 : hanja라이브러리(link)
  • 일본어: 히라가나, 가타가나에 대한 한글사전을 만듬(chat gpt 활용)
스크린샷 2023-05-16 14 01 29

기타 외국어 및 특수기호

  • unidecode 라이브러리(link)
  • 다양한 특수기호를 통일된 방식으로 변환할 수 있을듯함
스크린샷 2023-05-16 14 02 57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant