EmbeddedLLM
Pinned Loading
Repositories
- LLM_Sizing_Guide Public Forked from qoofyk/LLM_Sizing_Guide
A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
EmbeddedLLM/LLM_Sizing_Guide’s past year of commit activity - infinity-executable Public Forked from michaelfeil/infinity
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
EmbeddedLLM/infinity-executable’s past year of commit activity - JamAIBase Public
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
EmbeddedLLM/JamAIBase’s past year of commit activity - SageAttention-rocm Public Forked from thu-ml/SageAttention
ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
EmbeddedLLM/SageAttention-rocm’s past year of commit activity - torchac_rocm Public Forked from LMCache/torchac_cuda
ROCm Implementation of torchac_cuda from LMCache
EmbeddedLLM/torchac_rocm’s past year of commit activity - LMCache-ROCm Public Forked from LMCache/LMCache
ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
EmbeddedLLM/LMCache-ROCm’s past year of commit activity