Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading directly from hugginface #3

Closed
pengzhangzhi opened this issue Nov 25, 2024 · 3 comments
Closed

loading directly from hugginface #3

pengzhangzhi opened this issue Nov 25, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@pengzhangzhi
Copy link

pengzhangzhi commented Nov 25, 2024

Hi,
In the current codebase, we have to download the ckpt to the local and load it using the following method:

# download "{model}.safetensors" to the local 
# and load it like below
model = ESM2.from_pretrained("{model}.safetensors", device=0)

I wonder if we can directly load the ckpt from Hugginface?
such as

model = ESM2.from_pretrained("facebook/esm2_t30_150M_UR50D", device=0)

That way, it's more straightforward to replace existing codebase with a flash-attention version of esm2.

It seems doable to me bc eesm shares the same model architecture with ESM2 except for the use of flash attention?

Would love to hear ur thoughts!

@MuhammedHasan MuhammedHasan self-assigned this Nov 25, 2024
@MuhammedHasan MuhammedHasan added the question Further information is requested label Nov 25, 2024
@pengzhangzhi
Copy link
Author

I'm working on it. I guess the work is converting of the names defined in esm-efficient to be what's defined in esm2, which is the standrad hugginface names? Let me know if you are interested! We can talk more about that!!

@MuhammedHasan
Copy link
Collaborator

Just so you know, pull requests are welcome. Please make sure any change passes the test cases. I renamed and created safetensors from the checkpoints because pickles are not reliable in my experience.

Given we have the weights in the huggingface https://huggingface.co/mhcelik/esm-efficient/tree/main, it need to fetched to implement something like:

model = ESM2.from_pretrained("esm-efficient/esm2_8M", device=0)

I plan to support this at some point, but pull requests are appreciated and make it sooner.

@MuhammedHasan MuhammedHasan added the enhancement New feature or request label Dec 20, 2024
@MuhammedHasan
Copy link
Collaborator

Fixed by #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants