Skip to content

Generic spreadsheet (and Describo) driven tool for creating an LDaCA-ready profile-compliant language corpus that can be loaded into Oni

License

Notifications You must be signed in to change notification settings

Language-Research-Technology/corpus-tools-ro-crate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

corpus-tools-spreadsheet

Generic spreadsheet (and Crate-O) driven tool for creating an LDaCA-ready profile-compliant language corpus that can be loaded into Oni

Assumptions

USers will use crate-o to describe the Collection and sub collection structure of their data. as per the Selected Mode

This tool will allow people to describe RepositoryObjects (eg PARADISEC or Alveo ITEMS)

Conventions:

Ech object will be in a dir

-- object1/
        1.txt
        1.wav
        1.elan
-- objectn/
      ...
      
ro-crate-objects.xlsx

ro-crate-object.xlsx will have an objects tab with object level metadata

ro-crate-object sheet can be auto-generated by the tool

Structure of the spreadsheet

Can always handle multiples. Current method is to use [1,2,3] where there are multiple values -- ALternative is to allow more columns with additional indexes language, language__1, language__2 ---- Aim to handle both

Any property can be added by making a column(s) if it does not match a vocab there will not be an error -- in rocxl you CAN add properties, classes and defined terms in their own sheets TODO: see if we can use Descrino to make a vocab/ dir that has their special terms.

Does not need to repeat the type RepositoryObject (but you may need OTHER types as well, so need a @type column) -- assume RepoObject

Other columns like language --- need to be there, or inherit from the collection structure (which will be alongside, in describo) OR just put in a name or a code and we'll look it up

cd portal (base) pt@Peters-MacBook-Pro-2 portal % npm run dev

About

Generic spreadsheet (and Describo) driven tool for creating an LDaCA-ready profile-compliant language corpus that can be loaded into Oni

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •