LLM Encoder/Decoder - 搜索 News

腾讯纯文本LLM训视觉encoder，拿捏图表长视频，达到开源小模型SOTA！

打破多模态视觉+语言拼接套路！腾讯开源Penguin-VL，直接用纯文本LLM训视觉编码器。这项研究跳出了先有传统视觉 backbone，再接语言模型的常规路径，直接从text-only LLM初始化vision encoder。并在2B/8B紧凑参数规模下的文档理解、长视频时序定位等复杂任务中表现出 ...

Semiconductor Engineering

NPU Acceleration For Multimodal LLMs

Transformer-based models have rapidly spread from text to speech, vision, and other modalities. This has created challenges for the development of Neural Processing Units (NPUs). NPUs must now ...

InfoQ

Multi-Modal LLM NExT-GPT Handles Text, Images, Videos, and Audio

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

当前正在显示可能无法访问的结果。

隐藏无法访问的结果

腾讯纯文本LLM训视觉encoder，拿捏图表长视频，达到开源小模型SOTA！

NPU Acceleration For Multimodal LLMs

Multi-Modal LLM NExT-GPT Handles Text, Images, Videos, and Audio

今日热点