You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying test the repo on a small local dataset and downloaded the weights. (sidenote: It's odd that the size of the downloaded weights was shown as 10000 x 8XXX which doesn't match the suggested weights size in the documentation)
Attached is the code we are trying to run. Any ideas on what is going wrong? There's a lot of guesswork involved in the process..
#audio files into a batch to use forward pass
audio_data, sr_original = librosa.load("songs/videogames_1.mp3", sr=None)
x_1 = librosa.resample(y=audio_data, orig_sr=sr_original, target_sr=22050)[0:220503]
audio_data, sr_original = librosa.load("songs/videogames_2.mp3", sr=None)
x_2 = librosa.resample(y=audio_data, orig_sr=sr_original, target_sr=22050)[0:220503]
x_batch = torch.stack([torch.tensor(x_1), torch.tensor(x_2)], dim=0)
model is applied
y_batch = model(x_batch)
similarity computation
f_c_batch = y_batch['f_c'].detach().numpy()
y_1 = f_c_batch[0,:]
y_2 = f_c_batch[1,:]
print(np.linalg.norm(y_1- y_2)) # <--- this number is way too big and cosine similarity is not better
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've been trying test the repo on a small local dataset and downloaded the weights. (sidenote: It's odd that the size of the downloaded weights was shown as 10000 x 8XXX which doesn't match the suggested weights size in the documentation)
Attached is the code we are trying to run. Any ideas on what is going wrong? There's a lot of guesswork involved in the process..
file_path = 'orfium-bytecover.pt'
Load the model
loaded_weights = torch.load(file_path, map_location=device)
Instantiate the model with the same parameters as in the loaded weights
model = Resnet50(
Bottleneck,
emb_dim=loaded_weights['fc.weight'].shape[1],
num_channels=1,
num_classes=loaded_weights['fc.weight'].shape[0],
sr=22050,
hop_lenght=512,
n_bins=84,
bins_per_octave=12,
window="hann",
compress_ratio=20,
tempo_factors=None,
)
weights loading
model.load_state_dict(loaded_weights)
#audio files into a batch to use forward pass
audio_data, sr_original = librosa.load("songs/videogames_1.mp3", sr=None)
x_1 = librosa.resample(y=audio_data, orig_sr=sr_original, target_sr=22050)[0:220503]
audio_data, sr_original = librosa.load("songs/videogames_2.mp3", sr=None)
x_2 = librosa.resample(y=audio_data, orig_sr=sr_original, target_sr=22050)[0:220503]
x_batch = torch.stack([torch.tensor(x_1), torch.tensor(x_2)], dim=0)
model is applied
y_batch = model(x_batch)
similarity computation
f_c_batch = y_batch['f_c'].detach().numpy()
y_1 = f_c_batch[0,:]
y_2 = f_c_batch[1,:]
print(np.linalg.norm(y_1- y_2)) # <--- this number is way too big and cosine similarity is not better
Beta Was this translation helpful? Give feedback.
All reactions