Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for IJELID #29

Closed
SamuelCahyawijaya opened this issue Nov 9, 2023 · 1 comment · Fixed by #45
Closed

Create dataset loader for IJELID #29

SamuelCahyawijaya opened this issue Nov 9, 2023 · 1 comment · Fixed by #45
Assignees

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: ijelid/ijelid.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?ijelid

Dataset ijelid
Description This is a code-mixed Indonesian-Javanese-English dataset for token-level language identification. We named this dataset as IJELID (Indonesian-Javanese-English Language Identification). This dataset contains tweets that have been tokenized with the corresponding token and its language label. There are seven language labels in the dataset, namely: ID (Indonesian)JV (Javanese), EN (English),
MIX_ID_EN (mixed Indonesian-English), MIX_ID_JV (mixed Indonesian-Javanese), MIX_JV_EN (mixed Javanese-English), OTH (Other).
Subsets -
Languages ind, jav, eng
License Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)
Homepage https://github.com/fathanick/Code-mixed-Indonesian-Javanese-English-Twitter-Data
HF URL -
Paper URL https://peerj.com/articles/cs-1312/
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Nov 9, 2023
@ljvmiranda921
Copy link
Collaborator

#self-assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants