Description
The latest generation vision-language MoE model in the Qwen series with comprehensive upgrades to visual perception, spatial reasoning, and image understanding.
Stats
103K Downloads
15 stars
Capabilities
Minimum system memory
Tags
Last updated
Updated on November 4byREADME
The latest generation vision-language MoE model in the Qwen series with comprehensive upgrades to visual perception, spatial reasoning, and video understanding.
Delivers superior vision-language performance across diverse tasks including document analysis, visual question answering, video understanding, and agentic interactions. The MoE architecture provides excellent efficiency while maintaining high-quality outputs.
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
Based on