๐Ÿฅ MedVidBench Leaderboard

MedVidBench is a comprehensive benchmark for evaluating Video-Language Models on medical and surgical video understanding. It covers 8 tasks across 8 surgical datasets with 6,245 test samples, evaluated on 10 metrics including LLM-based caption judging.

๐Ÿ“„ Paper ย  ๐ŸŒ Project Page ย  ๐Ÿ’พ Dataset ย  ๐Ÿค– Model ย  ๐Ÿ’ป GitHub ย  ๐ŸŽฎ Demo

Official Rankings (Verified)

Models on this leaderboard have been independently verified by the benchmark maintainers. We evaluate top community submissions by requesting model API access and running our evaluation pipeline directly.

This ensures reproducible and trustworthy results.

RankModelTeamCVS_accNAP_accSA_accSTG_mIoUTAG_mIoU@0.3TAG_mIoU@0.5DVC_F1DVC_llmVS_llmRC_llmVerified
๐Ÿฅ‡1uAI-NEXUS-MedVLM-1.0b-4B-RL๐Ÿท๏ธ UII0.8980.4730.2850.1760.5040.4410.4803.9504.2273.8612026-04-15
๐Ÿฅˆ2uAI-NEXUS-MedVLM-1.0c-4B-SFT๐Ÿท๏ธ UII0.8970.5760.3540.1900.4820.4290.4513.7414.2383.7462026-04-15
๐Ÿฅ‰3uAI-NEXUS-MedVLM-1.0a-7B-RL๐Ÿท๏ธ UII0.8960.4050.2540.2020.2160.1560.2143.7974.1843.4422026-04-15
4uAI-NEXUS-MedVLM-1.0b-4B-SFT๐Ÿท๏ธ UII0.8950.4660.2700.1330.4650.4030.4353.8624.1803.7522026-04-15
5uAI-NEXUS-MedVLM-1.0a-7B-SFT๐Ÿท๏ธ UII0.8940.4420.2180.1770.1420.0910.1653.6653.5962.7572026-04-15
6Qwen3.5-4B๐Ÿท๏ธ Qwen AI0.3090.2310.2760.0510.0740.0400.1422.6993.4913.0372026-04-15
7Gemini-3.1-flash-lite๐Ÿท๏ธ Google0.2420.4060.2250.0590.0720.0490.1743.1983.7373.4922026-04-15
8GPT-5.4๐Ÿท๏ธ OpenAI0.1640.3930.2670.0040.0860.0550.1783.4033.9763.7142026-04-15
9Qwen2.5VL-7B๐Ÿท๏ธ Qwen AI0.1050.1510.0100.0200.0060.0680.0752.5122.4522.0902026-04-15
10Gemini-2.5-Flash๐Ÿท๏ธ Google0.1010.2280.1070.0470.0450.0210.0842.3872.3521.9122026-04-15
11GPT-4.1๐Ÿท๏ธ OpenAI0.0180.2500.0870.0140.0960.0050.1012.4382.4902.0802026-04-15
12Qwen3VL-4B๐Ÿท๏ธ Qwen AI0.0000.1780.0060.0000.0390.0340.1281.9392.9262.8532026-04-15
13Qwen2.5VL-7B-Surg-CholecT50๐Ÿท๏ธ NVIDIA0.0000.3020.0000.0000.0190.0130.0511.9452.1012.9862026-04-15
14VideoChat-R1.5-7B๐Ÿท๏ธ OpenGVLab0.0000.2700.0060.0000.0090.0050.0261.7233.0343.0862026-04-15

■ Best   ■ 2nd Best    ๐Ÿท๏ธ = Evaluation run by benchmark maintainers    ✅ = User submission verified by maintainers via model API


How to get on the Official Leaderboard

  1. Submit your model predictions via the "Community Submissions" tab
  2. Top performers will be contacted by the benchmark maintainers
  3. Provide model API access so we can independently verify results
  4. Once verified, your model is added to the Official Leaderboard

For questions, contact us via GitHub.