diff --git a/reading-notes/conference/icml-2024.md b/reading-notes/conference/icml-2024.md index 3492dce..5607808 100644 --- a/reading-notes/conference/icml-2024.md +++ b/reading-notes/conference/icml-2024.md @@ -6,9 +6,25 @@ Homepage: [https://icml.cc/Conferences/2024](https://icml.cc/Conferences/2024) ### Papers -### Large Language Models (LLMs) +### Serving Large Language Models (LLMs) * HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment \[[Personal Notes](../miscellaneous/arxiv/2023/hexgen.md)] \[[arXiv](https://arxiv.org/abs/2311.11514)] \[[Code](https://github.com/Relaxed-System-Lab/HexGen)] * HKUST & ETH & CMU - * Support _asymmetric_ tensor model parallelism and pipeline parallelism under the _heterogeneous_ setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree). - * Propose _a heuristic-based evolutionary algorithm_ to search for the optimal layout. + * Support _asymmetric_ tensor model parallelism and pipeline parallelism under the _heterogeneous_ setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree). + * Propose _a heuristic-based evolutionary algorithm_ to search for the optimal layout. + +### Multimodality + +* Video generation + * VideoPoet: A Large Language Model for Zero-Shot Video Generation \[[Paper](https://proceedings.mlr.press/v235/kondratyuk24a.html)] \[[Homepage](https://sites.research.google/videopoet/)] + * Google & CMU + * Employ a decoder-only transformer architecture that processes multimodal inputs – including images, videos, text, and audio. + * The pre-trained LLM is adapted to a range of video generation tasks. +* Image retrieval + * MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions \[[Paper](https://proceedings.mlr.press/v235/zhang24an.html)] \[[Homepage](https://open-vision-language.github.io/MagicLens/)] \[[Code](https://github.com/google-deepmind/magiclens)] + * OSU & Google DeepMind + * Enable multimodality-to-image, image-to-image, and text-to-image retrieval. + +## References + +* [Google DeepMind at ICML 2024, 2024/07/19](https://deepmind.google/discover/blog/google-deepmind-at-icml-2024/)