You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we make splits now for learning curves, we do not clip the vectors we use for bookkeeping so that they are a consistent length, i.e., we do not train on fixed durations
In doing so I realized that the duration as measured in seconds of audio can differ from the duration as measured in number of spectrogram time bins, and that this difference varies depending on the method used to compute the spectrogram
I ended up using some hacks so that we get indexing vectors of (mostly) consistent lengths.
But it's annoyingly fragile.
Probably the better way to do this from first principles is to clip the audio in such a way that we get the target duration in seconds--while keeping all classes present in the dataset--and then let the spectrogram code do whatever it wants
The text was updated successfully, but these errors were encountered:
Probably the better way to do this from first principles is to vocalpy/vocalpy#149 the audio in such a way that we get the target duration in seconds--while keeping all classes present in the dataset--and then let the spectrogram code do whatever it wants
A question here is whether we want to copy the audio to the prepared dataset,
Especially if we clip it, I would want to save the audio we clipped along with metadata about the source audio that produced the clip.
But the trade-off here is that this increases the size of the dataset. So we make it an option specific to learning curves, and don't do it by default probably
When we make splits now for learning curves, we do not clip the vectors we use for bookkeeping so that they are a consistent length, i.e., we do not train on fixed durations
This logic lived on the WindowDataset class in version 0.x, the
crop_spect_vectors_keep_classes
label: https://github.com/vocalpy/vak/blob/0.8/src/vak/datasets/window_dataset.py#L246I just rewrote some of this logic for the BioSoundSegBench dataset, here:
vocalpy/CMACBench@f8a6b28
In doing so I realized that the duration as measured in seconds of audio can differ from the duration as measured in number of spectrogram time bins, and that this difference varies depending on the method used to compute the spectrogram
I ended up using some hacks so that we get indexing vectors of (mostly) consistent lengths.
But it's annoyingly fragile.
Probably the better way to do this from first principles is to clip the audio in such a way that we get the target duration in seconds--while keeping all classes present in the dataset--and then let the spectrogram code do whatever it wants
The text was updated successfully, but these errors were encountered: