Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same index is present multiple times in FPS output during sample selection #128

Closed
Luthaf opened this issue May 2, 2022 · 1 comment · Fixed by #129
Closed

Same index is present multiple times in FPS output during sample selection #128

Luthaf opened this issue May 2, 2022 · 1 comment · Fixed by #129
Labels
bug Something isn't working

Comments

@Luthaf
Copy link
Collaborator

Luthaf commented May 2, 2022

Running this code

import numpy as np
import skcosmo.sample_selection

X = np.load("power-spectrum.npy")
print("shape =", X.shape)
print()

fps = skcosmo.sample_selection.FPS(n_to_select=35)
fps.fit(X)
print("selected_idx_ =", fps.selected_idx_)

outputs

shape = (640, 320)

selected_idx_ = [  0 247  45 105  16  56 176  25   9  38 152  72  54 192 131  88  64 168
 212 189 166 202  83 145  96 113 142 125 217 226 233   9   9   9   9]

There are enough samples to select more than 35 of them, but the last one is repeated multiple times in the output, which is unexpected. I'll try to double check the data to see if we are creating the same sample multiple time for some reason.

power-spectrum.npy.zip

@Luthaf Luthaf added the bug Something isn't working label May 2, 2022
@Luthaf
Copy link
Collaborator Author

Luthaf commented May 3, 2022

So the data does contains the same entry multiple time, by blocks of 8 samples (the input structures contains 8 atoms per frame). This script:

start = 0
for _ in range(X.shape[0] // 8):
    stop = start + 8
    x = X[start:stop, :]

    print(f"{np.linalg.norm(x):.4}    {np.linalg.norm(x - x[0]):.4}")
    start = stop

outputs something like

243.6    1.634e-12
227.3    1.377e-12
210.5    1.213e-12
194.1    1.185e-12
178.9    1.275e-12
165.4    1.699e-12
154.2    1.409e-12
145.3    1.646e-12
138.7    1.114e-12
134.1    6.263e-13
130.7    1.202e-12
127.7    1.208e-12
125.3    1.173e-12
123.4    9.692e-13
121.8    7.84e-13
120.2    1.432e-12
118.6    1.155e-12
116.9    1.12e-12
114.8    1.026e-12
112.1    6.875e-13
108.7    1.746e-12
       ...

I.e. each block of 8 is different, but all samples in a block contains the same data, up to some numerical noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant