Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result to a numpy / panda frame? #134

Closed
jjkoehorst opened this issue Aug 16, 2024 · 1 comment · Fixed by #135
Closed

Result to a numpy / panda frame? #134

jjkoehorst opened this issue Aug 16, 2024 · 1 comment · Fixed by #135

Comments

@jjkoehorst
Copy link

I could not find it in the documentation but what is the best / most efficient way of transforming the object to a panda or numpy dataframe? Currently writing the file to disk and loading it again but there might be a better way?

Thanks a lot for the extremely fast program!

lkeegan added a commit that referenced this issue Aug 19, 2024
- add read-only property `lt_array` to `Dataset` that provides the raw distances data as a 1-d numpy array
- add example of use to readme
- bump deps
- add python 3.13
- bump version
- resolves #134
lkeegan added a commit that referenced this issue Aug 19, 2024
- add read-only property `lt_array` to `Dataset` that provides the raw distances data as a 1-d numpy array
- add example of use to readme
- bump deps
- add python 3.13
- temporarily skip tests for Python 3.13 wheel on linux due to numpy import error
- bump version
- resolves #134
@lkeegan
Copy link
Member

lkeegan commented Aug 20, 2024

@jjkoehorst with version 1.3.0 you should be able to directly access the distances as a 1-d numpy array

data = hammingdist.from_fasta("example.fasta")
lt_array = data.lt_array

The elements in this array are in lower-triangular order, i.e. correspond to the 2-d indices (row=1,col=0), (row=2,col=0), (row=2,col=1), ...
These indices can be generated using the numpy tril_indices function, e.g. to construct the lower-triangular distances matrix from the 1-d lt_array:

lt_matrix = np.zeros((n_seq, n_seq))
lt_matrix[np.tril_indices(n_seq, -1)] = lt_array

Hopefully that is helpful, free to re-open this issue if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants