-
Notifications
You must be signed in to change notification settings - Fork 0
AI Cookbook for Libraries
CENL-AI-WG edited this page Aug 16, 2021
·
19 revisions
The recipes are classified by technical domain, and for each recipe, classical use cases in libraries are provided.
Recipe #1: Named entity extraction & linking
- enrichment of digital collections (creating new metadata such as person names, organizations, locations) for information retrieval, scientific objectives, etc.
- establishing links between documents and authority data
- understanding large collections of unstructured text documents (text mining use case)
- enrichment of digital collections with topics (information retrieval)
- enrichment of digitized collections with genres (novel, poetry, science, ...) or other classification schema (Dewey...)
- cataloguing of born-digital materials
Recipe #4: Language models
- creation and use of language models for NLP tasks
Recipe #5: OCR Post-correction
- correction of ocerized collections
- HTR for full text indexing
- HTR for transcription
- Attribution of authorship based on handwriting
- Attribution of style of writing (uncial, carolingian, etc.)
- pre-treatment of uncatalogued collections (filtering, preindexing, ...) based on the document type (letter, typewritten, map, etc.)
- enrichment of digital catalogued collections with document types
Recipe #2: Page Segmentation
- extraction of text from heritage documents for full text indexing
- segmentation of illustrations from heritage documents
Recipe #3: Article Recognition for newspapers, dictionaries, sales catalogues (arts, coins/medals...)
Recipe #1: Images Classification
- pre-treatment of uncatalogued collections (filtering, preindexing, ...)
- enrichment of digital catalogued collections (creating new metadata) for information retrieval, scientific objectives, etc.
Recipe #2: Object Detection and Face Detection, Instance Search
- enrichment of digital collections
- data analysis, visual studies
- information retrieval based on visual similarity
- navigating massive digital collections
- curation of digital collections (duplicate detection, variant detection)
- pre-treatment of uncatalogued collections (cutting into sequences, ...)
- enrichment of digital catalogued collections (creating new metadata): object detection, scene classification, etc.
- subtitles transcription (OCR)
- speech to text
- speaker detection
To add a new AI recipe, use the recipe template