forked from harvard-edge/cs249r_book
-
Notifications
You must be signed in to change notification settings - Fork 1
/
zoo_datasets.qmd
31 lines (23 loc) · 2.18 KB
/
zoo_datasets.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Datasets
1. **Google Speech Commands Dataset**
- Description: A set of one-second .wav audio files, each containing a single spoken English word.
- [Link to the Dataset](https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html)
2. **VisualWakeWords Dataset**
- Description: A dataset tailored for tinyML vision applications, consisting of binary labeled images indicating whether a person is in the image or not.
- [Link to the Dataset](https://github.com/tensorflow/models/tree/master/research/slim#preparing-the-visualwakewords-dataset)
3. **EMNIST Dataset**
- Description: A dataset containing 28x28 pixel images of handwritten characters and digits, which is an extension of the MNIST dataset but includes letters.
- [Link to the Dataset](https://www.nist.gov/itl/products-and-services/emnist-dataset)
4. **UCI Machine Learning Repository: Human Activity Recognition Using Smartphones**
- Description: A dataset with the recordings of 30 study participants performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.
- [Link to the Dataset](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones)
5. **PlantVillage Dataset**
- Description: A dataset comprising of images of healthy and diseased crop leaves categorized based on the crop type and disease type, which could be used in a tinyML agricultural project.
- [Link to the Dataset](https://github.com/spMohanty/PlantVillage-Dataset)
6. **Gesture Recognition using 3D Motion Sensing (3D Gesture Database)**
- Description: This dataset contains 3D gesture data recorded using a Leap Motion Controller, which might be useful for gesture recognition projects.
- [Link to the Dataset](https://lttm.dei.unipd.it/downloads/gesture/)
7. **Multilingual Spoken Words Corpus**
- Description: A dataset containing recordings of common spoken words in various languages, useful for speech recognition projects targeting multiple languages.
- [Link to the Dataset](https://mlcommons.org/en/multilingual-spoken-words/)
Remember to verify the dataset's license or terms of use to ensure it can be used for your intended purpose.