FList

What's Changed

  • [Model] use AutoWeightsLoader for bart by @calvin0327 in https://github.com/vllm-project/vllm/pull/18299
  • [Model] Support VLMs with transformers backend by @zucchini-nlp in https://github.com/vllm-project/vllm/pull/20543
  • [bugfix] fix syntax warning caused by backslash by @1195343015 in https://github.com/vllm-project/vllm/pull/21251
  • [CI] Cleanup modelscope version constraint in Dockerfile by @yankay in https://github.com/vllm-project/vllm/pull/21243
  • [Docs] Add RFC Meeting to Issue Template by @simon-mo in https://github.com/vllm-project/vllm/pull/21279
  • Add the instruction to run e2e validation manually before release by @huydhn in https://github.com/vllm-project/vllm/pull/21023
  • [Bugfix] Fix missing placeholder in logger debug by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21280
  • [Model][1/N] Support multiple poolers at model level by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21227
  • [Docs] Fix hardcoded links in docs by @hmellor in https://github.com/vllm-project/vllm/pull/21287
  • [Docs] Make tables more space efficient in supported_models.md by @hmellor in https://github.com/vllm-project/vllm/pull/21291
  • [Misc] unify variable for LLM instance by @andyxning in https://github.com/vllm-project/vllm/pull/20996
  • Add Nvidia ModelOpt config adaptation by @Edwardf0t1 in https://github.com/vllm-project/vllm/pull/19815
  • [Misc] Add sliding window to flashinfer test by @WoosukKwon in https://github.com/vllm-project/vllm/pull/21282
  • [CPU] Enable shared-memory based pipeline parallel for CPU backend by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/21289
  • [BugFix] make utils.current_stream thread-safety (#21252) by @simpx in https://github.com/vllm-project/vllm/pull/21253
  • [Misc] Add dummy maverick test by @minosfuture in https://github.com/vllm-project/vllm/pull/21199
  • [Attention] Clean up iRoPE in V1 by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/21188
  • [DP] Fix Prometheus Logging by @robertgshaw2-redhat in https://github.com/vllm-project/vllm/pull/21257
  • Fix bad lm-eval fork by @mgoin in https://github.com/vllm-project/vllm/pull/21318
  • [perf] Speed up align sum kernels by @hj-mistral in https://github.com/vllm-project/vllm/pull/21079
  • [v1][sampler] Inplace logprobs comparison to get the token rank by @houseroad in https://github.com/vllm-project/vllm/pull/21283
  • [XPU] Enable external_launcher to serve as an executor via torchrun by @chaojun-zhang in https://github.com/vllm-project/vllm/pull/21021
  • [Doc] Fix CPU doc format by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/21316
  • [Intel GPU] Ray Compiled Graph avoid NCCL for Intel GPU by @ratnampa in https://github.com/vllm-project/vllm/pull/21338
  • Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762) by @minosfuture in https://github.com/vllm-project/vllm/pull/21334
  • [Core] Minimize number of dict lookup in _maybe_evict_cached_block by @Jialin in https://github.com/vllm-project/vllm/pull/21281
  • [V1] [Hybrid] Add new test to verify that hybrid views into KVCacheTensor are compatible by @tdoublep in https://github.com/vllm-project/vllm/pull/21300
  • [Refactor] Fix Compile Warning #1444-D by @yewentao256 in https://github.com/vllm-project/vllm/pull/21208
  • Fix kv_cache_dtype handling for out-of-tree HPU plugin by @kzawora-intel in https://github.com/vllm-project/vllm/pull/21302
  • [Misc] DeepEPHighThroughtput - Enable Inductor pass by @varun-sundar-rabindranath in https://github.com/vllm-project/vllm/pull/21311
  • [Bug] DeepGemm: Fix Cuda Init Error by @yewentao256 in https://github.com/vllm-project/vllm/pull/21312
  • Update fp4 quantize API by @wenscarl in https://github.com/vllm-project/vllm/pull/21327
  • [Feature][eplb] add verify ep or tp or dp by @lengrongfu in https://github.com/vllm-project/vllm/pull/21102
  • Add arcee model by @alyosha-swamy in https://github.com/vllm-project/vllm/pull/21296
  • [Bugfix] Fix eviction cached blocked logic by @simon-mo in https://github.com/vllm-project/vllm/pull/21357
  • [Misc] Remove deprecated args in v0.10 by @kebe7jun in https://github.com/vllm-project/vllm/pull/21349
  • [Core] Optimize update checks in LogitsProcessor by @Jialin in https://github.com/vllm-project/vllm/pull/21245
  • [benchmark] Port benchmark request sent optimization to benchmark_serving by @Jialin in https://github.com/vllm-project/vllm/pull/21209
  • [Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool by @Jialin in https://github.com/vllm-project/vllm/pull/21222
  • [Misc] unify variable for LLM instance v2 by @andyxning in https://github.com/vllm-project/vllm/pull/21356
  • [perf] Add fused MLA QKV + strided layernorm by @mickaelseznec in https://github.com/vllm-project/vllm/pull/21116
  • [feat]: add SM100 support for cutlass FP8 groupGEMM by @djmmoss in https://github.com/vllm-project/vllm/pull/20447
  • [Perf] Cuda Kernel for Per Token Group Quant by @yewentao256 in https://github.com/vllm-project/vllm/pull/21083
  • Adds parallel model weight loading for runai_streamer by @bbartels in https://github.com/vllm-project/vllm/pull/21330
  • [feat] Enable mm caching for transformers backend by @zucchini-nlp in https://github.com/vllm-project/vllm/pull/21358
  • Revert "[Refactor] Fix Compile Warning #1444-D (#21208)" by @yewentao256 in https://github.com/vllm-project/vllm/pull/21384
  • Add tokenization_kwargs to encode for embedding model truncation by @Receiling in https://github.com/vllm-project/vllm/pull/21033
  • [Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers by @ariG23498 in https://github.com/vllm-project/vllm/pull/21353
  • [CI/Build] Fix test failure due to updated model repo by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21375
  • Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num by @xinli-git in https://github.com/vllm-project/vllm/pull/21325
  • [Model] Add Qwen3CoderToolParser by @ranpox in https://github.com/vllm-project/vllm/pull/21396
  • [Misc] Copy HF_TOKEN env var to Ray workers by @ruisearch42 in https://github.com/vllm-project/vllm/pull/21406
  • [BugFix] Fix ray import error mem cleanup bug by @joerunde in https://github.com/vllm-project/vllm/pull/21381
  • [CI/Build] Fix model executor tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21387
  • [Bugfix][ROCm][Build] Fix build regression on ROCm by @gshtras in https://github.com/vllm-project/vllm/pull/21393
  • Simplify weight loading in Transformers backend by @hmellor in https://github.com/vllm-project/vllm/pull/21382
  • [BugFix] Update python to python3 calls for image; fix prefix & input calculations. by @ericehanley in https://github.com/vllm-project/vllm/pull/21391
  • [BUGFIX] deepseek-v2-lite failed due to fused_qkv_a_proj name update by @xuechendi in https://github.com/vllm-project/vllm/pull/21414
  • [Bugfix][CUDA] fixes CUDA FP8 kv cache dtype supported by @elvischenv in https://github.com/vllm-project/vllm/pull/21420
  • Changing "amdproduction" allocation. by @Alexei-V-Ivanov-AMD in https://github.com/vllm-project/vllm/pull/21409
  • [Bugfix] Fix nightly transformers CI failure by @Isotr0py in https://github.com/vllm-project/vllm/pull/21427
  • [Core] Add basic unit test for maybe_evict_cached_block by @Jialin in https://github.com/vllm-project/vllm/pull/21400
  • [Cleanup] Only log MoE DP setup warning if DP is enabled by @mgoin in https://github.com/vllm-project/vllm/pull/21315
  • add clear messages for deprecated models by @youkaichao in https://github.com/vllm-project/vllm/pull/21424
  • [Bugfix] ensure tool_choice is popped when tool_choice:null is passed in json payload by @gcalmettes in https://github.com/vllm-project/vllm/pull/19679
  • Fixed typo in profiling logs by @sergiopaniego in https://github.com/vllm-project/vllm/pull/21441
  • [Docs] Fix bullets and grammars in tool_calling.md by @windsonsea in https://github.com/vllm-project/vllm/pull/21440
  • [Sampler] Introduce logprobs mode for logging by @houseroad in https://github.com/vllm-project/vllm/pull/21398
  • Mamba V2 Test not Asserting Failures. by @fabianlim in https://github.com/vllm-project/vllm/pull/21379
  • [Misc] fixed nvfp4_moe test failures due to invalid kwargs by @chenyang78 in https://github.com/vllm-project/vllm/pull/21246
  • [Docs] Clean up v1/metrics.md by @windsonsea in https://github.com/vllm-project/vllm/pull/21449
  • [Model] add Hunyuan V1 Dense Model support. by @kzjeef in https://github.com/vllm-project/vllm/pull/21368
  • [V1] Check all pooling tasks during profiling by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21299
  • [Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. by @sighingnow in https://github.com/vllm-project/vllm/pull/21364
  • [Tests] Add tests for headless internal DP LB by @njhill in https://github.com/vllm-project/vllm/pull/21450
  • [Core][Model] PrithviMAE Enablement on vLLM v1 engine by @christian-pinto in https://github.com/vllm-project/vllm/pull/20577
  • Add test case for compiling multiple graphs by @sarckk in https://github.com/vllm-project/vllm/pull/21044
  • [TPU][TEST] Fix the downloading issue in TPU v1 test 11. by @QiliangCui in https://github.com/vllm-project/vllm/pull/21418
  • [Core] Add reload_weights RPC method by @22quinn in https://github.com/vllm-project/vllm/pull/20096
  • [V1] Fix local chunked attention always disabled by @sarckk in https://github.com/vllm-project/vllm/pull/21419
  • [V0 Deprecation] Remove Prompt Adapters by @mgoin in https://github.com/vllm-project/vllm/pull/20588
  • [Core] Freeze gc during cuda graph capture to speed up init by @mgoin in https://github.com/vllm-project/vllm/pull/21146
  • feat(gguf_loader): accept HF repo paths & URLs for GGUF by @hardikkgupta in https://github.com/vllm-project/vllm/pull/20793
  • [Frontend] Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding by @deven-labovitch in https://github.com/vllm-project/vllm/pull/21374
  • [Misc] Add dummy maverick test to CI by @minosfuture in https://github.com/vllm-project/vllm/pull/21324
  • [XPU][UT] increase intel xpu CI test scope by @Liangliang-Ma in https://github.com/vllm-project/vllm/pull/21492
  • [Bugfix] Fix casing warning by @MatthewBonanni in https://github.com/vllm-project/vllm/pull/21468
  • [Bugfix] Fix example disagg_example_p2p_nccl_xpyd.sh zombie process by @david6666666 in https://github.com/vllm-project/vllm/pull/21437
  • [BugFix]: Batch generation from prompt_embeds fails for long prompts by @KazusatoOoko in https://github.com/vllm-project/vllm/pull/21390
  • [BugFix] Fix KVConnector TP worker aggregation by @njhill in https://github.com/vllm-project/vllm/pull/21473
  • [DP] Internal Load Balancing Per Node [one-pod-per-node] by @robertgshaw2-redhat in https://github.com/vllm-project/vllm/pull/21238
  • Dump input metadata on crash for async scheduling by @WoosukKwon in https://github.com/vllm-project/vllm/pull/21258
  • [BugFix] Set CUDA_VISIBLE_DEVICES before spawning the subprocesses by @yinghai in https://github.com/vllm-project/vllm/pull/21211
  • Add think chunk by @juliendenize in https://github.com/vllm-project/vllm/pull/21333

New Contributors

  • @chaojun-zhang made their first contribution in https://github.com/vllm-project/vllm/pull/21021
  • @alyosha-swamy made their first contribution in https://github.com/vllm-project/vllm/pull/21296
  • @bbartels made their first contribution in https://github.com/vllm-project/vllm/pull/21330
  • @Receiling made their first contribution in https://github.com/vllm-project/vllm/pull/21033
  • @ariG23498 made their first contribution in https://github.com/vllm-project/vllm/pull/21353
  • @xinli-git made their first contribution in https://github.com/vllm-project/vllm/pull/21325
  • @ranpox made their first contribution in https://github.com/vllm-project/vllm/pull/21396
  • @ericehanley made their first contribution in https://github.com/vllm-project/vllm/pull/21391
  • @sergiopaniego made their first contribution in https://github.com/vllm-project/vllm/pull/21441
  • @hardikkgupta made their first contribution in https://github.com/vllm-project/vllm/pull/20793
  • @deven-labovitch made their first contribution in https://github.com/vllm-project/vllm/pull/21374
  • @MatthewBonanni made their first contribution in https://github.com/vllm-project/vllm/pull/21468
  • @david6666666 made their first contribution in https://github.com/vllm-project/vllm/pull/21437
  • @KazusatoOoko made their first contribution in https://github.com/vllm-project/vllm/pull/21390

Full Changelog: https://github.com/vllm-project/vllm/compare/v0.10.0rc1...v0.10.0rc2