AI 知识库Multimodal
-
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration — arXiv:2412.13180
https://arxiv.org/abs/2412.13180 -
Token Activation Map to Visually Explain Multimodal LLMs — arXiv:2506.23270
https://arxiv.org/abs/2506.23270 -
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Training — arXiv:2507.01006
https://arxiv.org/abs/2507.01006 -
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers — arXiv:2506.23918
https://arxiv.org/abs/2506.23918 -
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations — arXiv:2506.18898
https://arxiv.org/abs/2506.18898