Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

모두의 말뭉치: 형태 분석 말뭉치 loader #122

Closed
lovit opened this issue Oct 10, 2020 · 5 comments
Closed

모두의 말뭉치: 형태 분석 말뭉치 loader #122

lovit opened this issue Oct 10, 2020 · 5 comments

Comments

@lovit
Copy link
Member

lovit commented Oct 10, 2020

(snapshot)

{
    "id": "NXMP1902008040",
    "metadata": {
    },
    "document": [
        {
            "id": "NWRW1800000022.417",
            "metadata": {
            },
            "sentence": [
                {
                    "id": "NWRW1800000022.417.1.1",
                    "form": "[제주·서울] \"세계환경수도 조성위해 10개년 실천계획 만들겠다\" 김태환 지사 밝혀",
                    "word": [
                        {
                            "id": 1,
                            "form": "[제주·서울]",
                            "begin": 0,
                            "end": 7
                        },
                        {
                            "id": 2,
                            "form": "\"세계환경수도",
                            "begin": 8,
                            "end": 15
                        },
                        {
                            "id": 3,
                            "form": "조성위해",
                            "begin": 16,
                            "end": 20
                        },
                        {
                            "id": 4,
                            "form": "10개년",
                            "begin": 21,
                            "end": 25
                        },
                        {
                            "id": 5,
                            "form": "실천계획",
                            "begin": 26,
                            "end": 30
                        },
                        {
                            "id": 6,
                            "form": "만들겠다\"",
                            "begin": 31,
                            "end": 36
                        },
                        {
                            "id": 7,
                            "form": "김태환",
                            "begin": 37,
                            "end": 40
                        },
                        {
                            "id": 8,
                            "form": "지사",
                            "begin": 41,
                            "end": 43
                        },
                        {
                            "id": 9,
                            "form": "밝혀",
                            "begin": 44,
                            "end": 46
                        }
                    ],
                    "morpheme": [
                        {
                            "id": 1,
                            "form": "[",
                            "label": "SS",
                            "word_id": 1,
                            "position": 1
                        },
                        {
                            "id": 2,
                            "form": "제주",
                            "label": "NNP",
                            "word_id": 1,
                            "position": 2
                        },
                        {
                            "id": 3,
                            "form": "·",
                            "label": "SP",
                            "word_id": 1,
                            "position": 3
                        },
                        {
                            "id": 4,
                            "form": "서울",
                            "label": "NNP",
                            "word_id": 1,
                            "position": 4
                        },
                        {
                            "id": 5,
                            "form": "]",
                            "label": "SS",
                            "word_id": 1,
                            "position": 5
                        },
                        {
                            "id": 6,
                            "form": "\"",
                            "label": "SS",
                            "word_id": 2,
                            "position": 1
                        },
                        {
                            "id": 7,
                            "form": "세계",
                            "label": "NNG",
                            "word_id": 2,
                            "position": 2
                        },
                        {
@lovit
Copy link
Member Author

lovit commented Oct 10, 2020

형태소 정보 (예: NNG: 일반명사) 가 없기 때문에 전수 조사 후 확인 필요

@lovit
Copy link
Member Author

lovit commented Oct 10, 2020

  • 국립국어원 2019-01-23, 발간등록번호 11-1371028-000776-01 문서에 형태소 품사 정보가 기술되어 있음, 첨부파일

@lovit
Copy link
Member Author

lovit commented Oct 10, 2020

Tag Count (percentage)
NNG 1562168 (24.06%)
VV 403501 (6.214%)
EC 386549 (5.953%)
ETM 298970 (4.604%)
EF 258542 (3.982%)
JKB 256604 (3.952%)
JX 251864 (3.879%)
NNB 244696 (3.769%)
SF 228243 (3.515%)
NNP 224329 (3.455%)
SS 209491 (3.226%)
JKO 207376 (3.194%)
MAG 182313 (2.808%)
XSV 178325 (2.746%)
JKS 167515 (2.58%)
EP 157874 (2.431%)
VA 146731 (2.26%)
SN 140913 (2.17%)
XSN 107643 (1.658%)
IC 100695 (1.551%)
VX 100233 (1.544%)
VCP 95837 (1.476%)
JKG 86943 (1.339%)
NP 76386 (1.176%)
SP 68049 (1.048%)
MMD 53649 (0.8263%)
JC 32886 (0.5065%)
NR 32627 (0.5025%)
MAJ 31093 (0.4789%)
XSA 24422 (0.3761%)
ETN 23355 (0.3597%)
JKQ 20562 (0.3167%)
SL 20508 (0.3158%)
MMN 20204 (0.3112%)
SW 18773 (0.2891%)
XPN 16159 (0.2489%)
JKC 11436 (0.1761%)
VCN 10049 (0.1548%)
NA 8141 (0.1254%)
SH 7722 (0.1189%)
SO 7529 (0.116%)
MMA 5437 (0.08374%)
SE 4806 (0.07402%)
XR 560 (0.008625%)
NAP 555 (0.008548%)
NF 517 (0.007962%)
JKV 181 (0.002788%)
NV 89 (0.001371%)

@ratsgo
Copy link
Member

ratsgo commented Oct 12, 2020

태그표

스크린샷 2020-10-12 오후 8 53 09

스크린샷 2020-10-12 오후 8 53 22

@lovit
Copy link
Member Author

lovit commented Oct 12, 2020

위의 표의 내용을 바탕으로 아래의 tagmap 이 작성되었습니다.

ModuMorphemeKorpus.tagmap

@lovit lovit closed this as completed Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants