FList

What's Changed

  • Deduplicate Transformers backend code using inheritance by @hmellor in https://github.com/vllm-project/vllm/pull/21461
  • [Bugfix][ROCm] Fix for warp_size uses on host by @gshtras in https://github.com/vllm-project/vllm/pull/21205
  • [TPU][Bugfix] fix moe layer by @yaochengji in https://github.com/vllm-project/vllm/pull/21340
  • [v1][Core] Clean up usages of SpecializedManager by @zhouwfang in https://github.com/vllm-project/vllm/pull/21407
  • [Misc] Fix duplicate FusedMoEConfig debug messages by @njhill in https://github.com/vllm-project/vllm/pull/21455
  • [Core] Support model loader plugins by @22quinn in https://github.com/vllm-project/vllm/pull/21067
  • remove GLM-4 quantization wrong Code by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/21435
  • Replace --expand-tools-even-if-tool-choice-none with --exclude-tools-when-tool-choice-none for v0.10.0 by @okdshin in https://github.com/vllm-project/vllm/pull/20544
  • [Misc] Improve comment for DPEngineCoreActor._set_cuda_visible_devices() by @ruisearch42 in https://github.com/vllm-project/vllm/pull/21501
  • [Feat] Allow custom naming of vLLM processes by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/21445
  • bump flashinfer to v0.2.8 by @cjackal in https://github.com/vllm-project/vllm/pull/21385
  • [Attention] Optimize FlashInfer MetadataBuilder Build call by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/21137
  • [Model] Officially support Emu3 with Transformers backend by @hmellor in https://github.com/vllm-project/vllm/pull/21319
  • [Bugfix] Fix CUDA arch flags for MoE permute by @minosfuture in https://github.com/vllm-project/vllm/pull/21426
  • [Fix] Update mamba_ssm to 2.2.5 by @elvischenv in https://github.com/vllm-project/vllm/pull/21421
  • [Docs] Update Tensorizer usage documentation by @sangstar in https://github.com/vllm-project/vllm/pull/21190
  • [Docs] Rewrite Distributed Inference and Serving guide by @crypdick in https://github.com/vllm-project/vllm/pull/20593
  • [Bug] Fix Compressed Tensor NVFP4 cutlass_fp4_group_mm illegal memory access by @yewentao256 in https://github.com/vllm-project/vllm/pull/21465
  • Update flashinfer CUTLASS MoE Kernel by @wenscarl in https://github.com/vllm-project/vllm/pull/21408
  • [XPU] Conditionally import CUDA-specific passes to avoid import errors on xpu platform by @chaojun-zhang in https://github.com/vllm-project/vllm/pull/21036
  • [P/D] Move FakeNixlWrapper to test dir by @ruisearch42 in https://github.com/vllm-project/vllm/pull/21328
  • [P/D] Support CPU Transfer in NixlConnector by @juncgu in https://github.com/vllm-project/vllm/pull/18293
  • [Docs][minor] Fix broken gh-file link in distributed serving docs by @crypdick in https://github.com/vllm-project/vllm/pull/21543
  • [Docs] Add Expert Parallelism Initial Documentation by @simon-mo in https://github.com/vllm-project/vllm/pull/21373
  • update flashinfer to v0.2.9rc1 by @weireweire in https://github.com/vllm-project/vllm/pull/21485
  • [TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. by @QiliangCui in https://github.com/vllm-project/vllm/pull/21539
  • [MoE] More balanced expert sharding by @WoosukKwon in https://github.com/vllm-project/vllm/pull/21497
  • [Frontend] run-batch supports V1 by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21541
  • [Docs] Fix site_url for RunLLM by @hmellor in https://github.com/vllm-project/vllm/pull/21564
  • [Bug] Fix DeepGemm Init Error by @yewentao256 in https://github.com/vllm-project/vllm/pull/21554
  • Fix GLM-4 PP Missing Layer When using with PP. by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/21531
  • [Kernel] adding fused_moe configs for upcoming granite4 by @bringlein in https://github.com/vllm-project/vllm/pull/21332
  • [Bugfix] DeepGemm utils : Fix hardcoded type-cast by @varun-sundar-rabindranath in https://github.com/vllm-project/vllm/pull/21517
  • [DP] Support api-server-count > 0 in hybrid DP LB mode by @njhill in https://github.com/vllm-project/vllm/pull/21510
  • [TPU][Test] Temporarily suspend this MoE model in test_basic.py. by @QiliangCui in https://github.com/vllm-project/vllm/pull/21560
  • [Docs] Add requirements/common.txt to run unit tests by @zhouwfang in https://github.com/vllm-project/vllm/pull/21572
  • Integrate TensorSchema with shape validation for Phi3VImagePixelInputs by @bbeckca in https://github.com/vllm-project/vllm/pull/21232
  • [CI] Update CODEOWNERS for CPU and Intel GPU by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/21582
  • [Bugfix] fix modelscope snapshot_download serialization by @andyxning in https://github.com/vllm-project/vllm/pull/21536
  • [Model] Support tensor parallel for timm ViT in Deepseek_vl2 by @wzqd in https://github.com/vllm-project/vllm/pull/21494
  • [Model] Fix a check for None but the return value was empty list in Gemma3 MM vision_embeddings by @hfan in https://github.com/vllm-project/vllm/pull/21479
  • [Misc][Tools] make max-model-len a parameter in auto_tune script by @yaochengji in https://github.com/vllm-project/vllm/pull/21321
  • [CI/Build] fix cpu_extension for apple silicon by @ignaciosica in https://github.com/vllm-project/vllm/pull/21195
  • [Misc] Removed undefined cmake variables MOE_PERMUTE_ARCHS by @chenyang78 in https://github.com/vllm-project/vllm/pull/21262
  • [TPU][Bugfix] fix OOM issue in CI test by @yaochengji in https://github.com/vllm-project/vllm/pull/21550
  • [Tests] Harden DP tests by @njhill in https://github.com/vllm-project/vllm/pull/21508
  • Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct by @Xu-Wenqing in https://github.com/vllm-project/vllm/pull/21598
  • [Bugfix] GGUF: fix AttributeError: 'PosixPath' object has no attribute 'startswith' by @kebe7jun in https://github.com/vllm-project/vllm/pull/21579
  • [Quantization] Enable BNB support for more MoE models by @jeejeelee in https://github.com/vllm-project/vllm/pull/21370
  • [V1] Get supported tasks from model runner instead of model config by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21585
  • [Bugfix][Logprobs] Fix logprobs op to support more backend by @MengqingCao in https://github.com/vllm-project/vllm/pull/21591
  • [Model] Fix Ernie4.5MoE e_score_correction_bias parameter by @xyxinyang in https://github.com/vllm-project/vllm/pull/21586
  • [MODEL] New model support for naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B by @bigshanedogg in https://github.com/vllm-project/vllm/pull/20931
  • [Frontend] Add request_id to the Request object so they can be controlled better via external load balancers by @kouroshHakha in https://github.com/vllm-project/vllm/pull/21009
  • [Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel by @cyang49 in https://github.com/vllm-project/vllm/pull/20839
  • [ROCm][AITER] Enable fp8 kv cache on rocm aiter backend. by @fsx950223 in https://github.com/vllm-project/vllm/pull/20295
  • [Kernel] Improve machete memory bound perf by @czhu-cohere in https://github.com/vllm-project/vllm/pull/21556
  • Add support for Prithvi in Online serving mode by @mgazz in https://github.com/vllm-project/vllm/pull/21518
  • [CI] Unifying Dockerfiles for ARM and X86 Builds by @kebe7jun in https://github.com/vllm-project/vllm/pull/21343
  • [Docs] add auto-round quantization readme by @wenhuach21 in https://github.com/vllm-project/vllm/pull/21600
  • [TPU][Test] Rollback PR-21550. by @QiliangCui in https://github.com/vllm-project/vllm/pull/21619
  • Add Unsloth to RLHF.md by @danielhanchen in https://github.com/vllm-project/vllm/pull/21636
  • [Perf] Cuda Kernel for Int8 Per Token Group Quant by @yewentao256 in https://github.com/vllm-project/vllm/pull/21476
  • Add interleaved RoPE test for Llama4 (Maverick) by @sarckk in https://github.com/vllm-project/vllm/pull/21478
  • [Bugfix] Fix sync_and_slice_intermediate_tensors by @ruisearch42 in https://github.com/vllm-project/vllm/pull/21537
  • [Bugfix] Always set RAY_ADDRESS for Ray actor before spawn by @ruisearch42 in https://github.com/vllm-project/vllm/pull/21540
  • [TPU] Update ptxla nightly version to 20250724 by @yaochengji in https://github.com/vllm-project/vllm/pull/21555
  • [Feature] Add support for MoE models in the calibration-free RTN-based quantization by @sakogan in https://github.com/vllm-project/vllm/pull/20766
  • [Model] Ultravox: Support Llama 4 and Gemma 3 backends by @farzadab in https://github.com/vllm-project/vllm/pull/17818
  • [Docs] add offline serving multi-modal video input expamle Qwen2.5-VL by @david6666666 in https://github.com/vllm-project/vllm/pull/21530
  • Correctly kill vLLM processes after finishing serving benchmarks by @huydhn in https://github.com/vllm-project/vllm/pull/21641
  • [Bugfix] Fix isinstance check for tensor types in _load_prompt_embeds to use dtype comparison by @Mitix-EPI in https://github.com/vllm-project/vllm/pull/21612
  • [TPU][Test] Divide TPU v1 Test into 2 parts. by @QiliangCui in https://github.com/vllm-project/vllm/pull/21431
  • Support Intern-S1 by @lvhan028 in https://github.com/vllm-project/vllm/pull/21628
  • [Misc] remove unused try-except in pooling config check by @reidliu41 in https://github.com/vllm-project/vllm/pull/21618
  • [Take 2] Correctly kill vLLM processes after benchmarks by @huydhn in https://github.com/vllm-project/vllm/pull/21646
  • Migrate AriaImagePixelInputs to TensorSchema for shape validation by @bbeckca in https://github.com/vllm-project/vllm/pull/21620
  • Migrate AyaVisionImagePixelInputs to TensorSchema for shape validation by @bbeckca in https://github.com/vllm-project/vllm/pull/21622
  • [Bugfix] Investigate Qwen2-VL failing test by @Isotr0py in https://github.com/vllm-project/vllm/pull/21527
  • Support encoder-only models without KV-Cache by @maxdebayser in https://github.com/vllm-project/vllm/pull/21270
  • [Bug] Fix has_flashinfer_moe Import Error when it is not installed by @yewentao256 in https://github.com/vllm-project/vllm/pull/21634
  • [Misc] Improve memory profiling debug message by @yeqcharlotte in https://github.com/vllm-project/vllm/pull/21429
  • [BugFix] Fix shared storage connector load kv only load attention layer by @david6666666 in https://github.com/vllm-project/vllm/pull/21428
  • [Refactor] Remove moe_align_block_size_triton by @yewentao256 in https://github.com/vllm-project/vllm/pull/21335
  • [Bugfix][Apple Silicon] fix missing symbols when build from source on Mac with Apple Silicon by @zhouyeju in https://github.com/vllm-project/vllm/pull/21380
  • [CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI by @yeqcharlotte in https://github.com/vllm-project/vllm/pull/21355
  • [NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels by @kaixih in https://github.com/vllm-project/vllm/pull/21411
  • Remove xformers requirement for Mistral-format Pixtral and Mistral3 by @wenchen76 in https://github.com/vllm-project/vllm/pull/21154
  • support torch.compile for bailing moe by @jinzhen-lin in https://github.com/vllm-project/vllm/pull/21664
  • Migrate Blip2ImagePixelInputs and Blip2ImageEmbeddingInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21656
  • Migrate DeepseekVL2ImageInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21658
  • Migrate FuyuImagePatchInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21662
  • Migrate ChameleonImagePixelInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21657
  • [VLM] Support HF format Phi-4-MM model by @Isotr0py in https://github.com/vllm-project/vllm/pull/17121
  • Handle non-serializable objects in vllm bench by @huydhn in https://github.com/vllm-project/vllm/pull/21665
  • [CI/Build][Doc] Clean up more docs that point to old bench scripts by @yeqcharlotte in https://github.com/vllm-project/vllm/pull/21667
  • Refactor: Remove numpy dependency from LoggingStatLogger by @skyloevil in https://github.com/vllm-project/vllm/pull/20529
  • [Misc] add default value for file pattern arg by @andyxning in https://github.com/vllm-project/vllm/pull/21659
  • Migrate Florence2ImagePixelInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21663
  • [VLM] Add video support for Intern-S1 by @Isotr0py in https://github.com/vllm-project/vllm/pull/21671
  • [Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor by @yewentao256 in https://github.com/vllm-project/vllm/pull/21631
  • Fix CUDA permute/unpermute for use with DeepGemm Moe by @CalebDu in https://github.com/vllm-project/vllm/pull/17934
  • [Misc] Refactor vllm config str by @andyxning in https://github.com/vllm-project/vllm/pull/21666
  • [Attention] Make CutlassMLA the default backend for SM100 (blackwell) by @alexm-redhat in https://github.com/vllm-project/vllm/pull/21626
  • [Deprecation][2/N] Replace --task with --runner and --convert by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21470
  • Fix typo for limit-mm-per-prompt in docs by @joa-stdn in https://github.com/vllm-project/vllm/pull/21697
  • Fix GLM tool parser by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/21668
  • [Misc] Add fused_moe configs for Qwen3-Coder-480B-A35B-Instruct-FP8 by @jeejeelee in https://github.com/vllm-project/vllm/pull/21700
  • [V1] Exception Handling when Loading KV Cache from Remote Store by @liuyumoye in https://github.com/vllm-project/vllm/pull/21534
  • [Model] Support TP/PP/mamba2 kernel for PLaMo2 by @Alnusjaponica in https://github.com/vllm-project/vllm/pull/19674
  • [FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel by @tjtanaa in https://github.com/vllm-project/vllm/pull/21242
  • Migrate Gemma3ImagePixelInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21676
  • Migrate Glm4vImageInputs, Glm4vVideoInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21678
  • Migrate GLMVImagePixelInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21679
  • Migrate GraniteSpeechAudioInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21682
  • Migrate Idefics3ImagePixelInputs and Idefics3ImageEmbeddingInputs to … by @bbeckca in https://github.com/vllm-project/vllm/pull/21683
  • [Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled by @hsliuustc0106 in https://github.com/vllm-project/vllm/pull/21573
  • [bugfix] fix profile impact benchmark results by @lengrongfu in https://github.com/vllm-project/vllm/pull/21507
  • [Bugfix] Fix shape checking for Fuyu by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21709
  • [Bugfix] fix max-file-size type from str to int by @andyxning in https://github.com/vllm-project/vllm/pull/21675
  • [BugFix] Fix ChunkedLocalAttention when the hybrid kv-cache is disabled by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/21707
  • [v1][mamba] Added mamba_type into MambaSpec by @Josephasafg in https://github.com/vllm-project/vllm/pull/21715
  • Migrate KeyeImageInputs and KeyeVideoInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21686
  • [Model] Prioritize Transformers fallback over suffix matching by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21719
  • [feature] add log non default args in LLM by @lengrongfu in https://github.com/vllm-project/vllm/pull/21680
  • [Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts by @jeejeelee in https://github.com/vllm-project/vllm/pull/21717
  • [Bugfix] Fix environment variable setting in CPU Dockerfile by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/21730
  • [Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme by @Isotr0py in https://github.com/vllm-project/vllm/pull/21744
  • [PD] let p2p nccl toy proxy handle /chat/completions by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/21734
  • [Ernie 4.5] Name Change for Base 0.3B Model by @vasqu in https://github.com/vllm-project/vllm/pull/21735
  • [Bugfix] Improve JSON extraction in LlamaToolParser by @key4ng in https://github.com/vllm-project/vllm/pull/19024
  • [Docs] Add revision date to rendered docs by @hmellor in https://github.com/vllm-project/vllm/pull/21752
  • [Bugfix]check health for engine core process exiting unexpectedly by @wuhang2014 in https://github.com/vllm-project/vllm/pull/21728
  • [Bugfix][CI/Build] Update peft version in test requirement by @Isotr0py in https://github.com/vllm-project/vllm/pull/21729
  • [Logs] Change flashinfer sampler logs to once by @mgoin in https://github.com/vllm-project/vllm/pull/21759
  • [Misc] Reduce logs for model resolution by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21765
  • [Bugfix] Mistral crashes on tool with no description by @HugoMichard in https://github.com/vllm-project/vllm/pull/21167
  • [CI/Build] Fix plugin tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21758
  • [XPU] IPEX-optimized Punica Wrapper on XPU by @chaojun-zhang in https://github.com/vllm-project/vllm/pull/21703
  • [Bugfix] Fix granite speech shape validation by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21762
  • [P/D] Log warnings related to prefill KV expiry by @njhill in https://github.com/vllm-project/vllm/pull/21753
  • Use metavar to list the choices for a CLI arg when custom values are also accepted by @hmellor in https://github.com/vllm-project/vllm/pull/21760
  • update flashinfer to v0.2.9rc2 by @weireweire in https://github.com/vllm-project/vllm/pull/21701
  • [AMD][BugFix] Fix omission of wvSplitK kernel for small batch sizes (1-4) due to torch.compile by @rasmith in https://github.com/vllm-project/vllm/pull/21350
  • [Bug] Enforce contiguous input for dynamic_scaled_fp8_quant and static_scaled_fp8_quant by @yewentao256 in https://github.com/vllm-project/vllm/pull/21773
  • [AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure by @houseroad in https://github.com/vllm-project/vllm/pull/21647
  • Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" by @KuntaiDu in https://github.com/vllm-project/vllm/pull/21778
  • [Bugfix] DeepGEMM is not enabled on B200 due to _lazy_init() by @smarterclayton in https://github.com/vllm-project/vllm/pull/21472
  • [Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels by @nikhil-arm in https://github.com/vllm-project/vllm/pull/17112
  • [Perf] Disable chunked local attention by default with llama4 by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/21761
  • [Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning by @LyrisZhong in https://github.com/vllm-project/vllm/pull/20396
  • [Docs] Minimize spacing for supported_hardware.md table by @mgoin in https://github.com/vllm-project/vllm/pull/21779
  • [Refactor] Merge Compressed Tensor FP8 CompressedTensorsW8A8Fp8MoEMethod and CompressedTensorsW8A8Fp8MoECutlassMethod by @yewentao256 in https://github.com/vllm-project/vllm/pull/21775
  • [CI] Parallelize Kernels MoE Test by @mgoin in https://github.com/vllm-project/vllm/pull/21764
  • skip fusedmoe layer for start_load_kv by @calvin0327 in https://github.com/vllm-project/vllm/pull/21378
  • [AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM by @gshtras in https://github.com/vllm-project/vllm/pull/21766
  • Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21684
  • [Misc] Rework process titles by @njhill in https://github.com/vllm-project/vllm/pull/21780
  • [Doc] Link to RFC for pooling optimizations by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21806
  • [Model]: Fused MoE for nomic-embed-text-v2-moe by @Isotr0py in https://github.com/vllm-project/vllm/pull/18321
  • [V0 deprecation] Guided decoding by @rzabarazesh in https://github.com/vllm-project/vllm/pull/21347
  • [Model] Refactor JambaForCausalLM by @jeejeelee in https://github.com/vllm-project/vllm/pull/21394
  • [Docs] Fix the outdated URL for installing from vLLM binaries by @yankay in https://github.com/vllm-project/vllm/pull/21523
  • [KVCache] Make KVCacheSpec hashable by @heheda12345 in https://github.com/vllm-project/vllm/pull/21791
  • [Doc] Update compatibility matrix for pooling and multimodal models by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21831
  • [Bugfix] VLLM_V1 supports passing other compilation levels by @zou3519 in https://github.com/vllm-project/vllm/pull/19340
  • [Docs] Merge design docs for a V1 only future by @hmellor in https://github.com/vllm-project/vllm/pull/21832
  • [TPU] Add an optimization doc on TPU by @bvrockwell in https://github.com/vllm-project/vllm/pull/21155
  • [Bugfix]fix mixed bits and visual language model quantization in AutoRound by @wenhuach21 in https://github.com/vllm-project/vllm/pull/21802
  • [Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend by @elvischenv in https://github.com/vllm-project/vllm/pull/21525
  • [Docs] use uv in GPU installation docs by @davidxia in https://github.com/vllm-project/vllm/pull/20277
  • [Doc] Add FusedMoE Modular Kernel Documentation by @varun-sundar-rabindranath in https://github.com/vllm-project/vllm/pull/21623
  • [Doc] update Contributing page's testing section by @davidxia in https://github.com/vllm-project/vllm/pull/18272
  • Add flashinfer_python to CUDA wheel requirements by @mgoin in https://github.com/vllm-project/vllm/pull/21389
  • docker: docker-aware precompiled wheel support by @dougbtv in https://github.com/vllm-project/vllm/pull/21127
  • Revert "[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure (#21647)" by @gshtras in https://github.com/vllm-project/vllm/pull/21850
  • [BugFix] Fix interleaved sliding window not set for Gemma3n by @sarckk in https://github.com/vllm-project/vllm/pull/21863
  • [ci] add b200 test placeholder by @simon-mo in https://github.com/vllm-project/vllm/pull/21866
  • [ci] mark blackwell test optional for now by @simon-mo in https://github.com/vllm-project/vllm/pull/21878
  • [Bugfix] Correct max tokens for non-contiguous embeds by @milesial in https://github.com/vllm-project/vllm/pull/21798
  • [v1][attention] Support Hybrid Allocator + FlashInfer by @heheda12345 in https://github.com/vllm-project/vllm/pull/21412
  • [Docs] Switch to better markdown linting pre-commit hook by @hmellor in https://github.com/vllm-project/vllm/pull/21851
  • [DOC] Fix path of v1 related figures by @heheda12345 in https://github.com/vllm-project/vllm/pull/21868
  • [Docs] Update docker.md with HF_TOKEN, new model, and podman fix by @mgoin in https://github.com/vllm-project/vllm/pull/21856
  • Expose PyTorch profiler configuration to environment variables by @Csrayz in https://github.com/vllm-project/vllm/pull/21803
  • [Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization by @sydarb in https://github.com/vllm-project/vllm/pull/21808
  • [Bugfix] Fix comment typo of get_num_common_prefix_blocks() by @MingzhenHan in https://github.com/vllm-project/vllm/pull/21827
  • [Bugfix] Actually disable processing cache when API server is scaled out by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21839
  • [Perf] Using __nv_fp8_e4m3 instead of c10::e4m3 for per_token_group_quant by @yewentao256 in https://github.com/vllm-project/vllm/pull/21867
  • [Frontend] Add LLM.reward specific to reward models by @noooop in https://github.com/vllm-project/vllm/pull/21720
  • [XPU] use ZE_AFFINITY_MASK for device select on xpu by @jikunshang in https://github.com/vllm-project/vllm/pull/21815
  • Add @sighingnow as maintainer of qwen's related files. by @sighingnow in https://github.com/vllm-project/vllm/pull/21895
  • [CI/Build] Fix pre-commit failure in docs by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21897
  • [Docs] Expand introduction to Ray in Multi-node deployment section by @crypdick in https://github.com/vllm-project/vllm/pull/21584
  • Update vLLM Benchmark Suite for Xeon based on 0.9.2 release by @louie-tsai in https://github.com/vllm-project/vllm/pull/21486
  • [Misc] Remove redundant config definitions by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21891
  • [Doc] Update Intern-S1 info by @jeejeelee in https://github.com/vllm-project/vllm/pull/21908
  • [CI] rollback lint-and-deploy pipeline using amd machine by @kebe7jun in https://github.com/vllm-project/vllm/pull/21912
  • [Tests] Fixing bug inside MultiModalProfiler. by @shenoyvvarun in https://github.com/vllm-project/vllm/pull/21842
  • [Model] Remove DSV2 unused code by @jeejeelee in https://github.com/vllm-project/vllm/pull/21903
  • [benchmark] add max-concurrency in result table by @panpan0000 in https://github.com/vllm-project/vllm/pull/21095
  • [Doc] Update partial support by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21916
  • [Docs] Fix the example code of streaming chat completions in reasoning by @hsliuustc0106 in https://github.com/vllm-project/vllm/pull/21825
  • Add @patrickvonplaten as maintainer of mistral's related files. by @patrickvonplaten in https://github.com/vllm-project/vllm/pull/21928
  • [Hardware][CPU] Build fix for ARM without BF16 by @ericcurtin in https://github.com/vllm-project/vllm/pull/21848
  • [Feature][EPLB] Add eplb support for Qwen3 by @aladerran in https://github.com/vllm-project/vllm/pull/20815
  • [Doc] Remove vLLM prefix and add citation for PagedAttention by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21910
  • [Bugfix] we should use metavar is not choices by @lengrongfu in https://github.com/vllm-project/vllm/pull/21902
  • [Feature] Support multiple api keys in server by @Yanpas in https://github.com/vllm-project/vllm/pull/18548
  • [misc] skip p2p check by default by @youkaichao in https://github.com/vllm-project/vllm/pull/21904
  • [Test] Add Benchmark and Unit Test for per_token_group_quant by @yewentao256 in https://github.com/vllm-project/vllm/pull/21860
  • [CI/Build] Only run markdownlint in CI by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21892
  • Reduce time wasted in GitHub Actions using concurrency by @hmellor in https://github.com/vllm-project/vllm/pull/21919
  • [Misc] Improve code readability of KVCacheManager by @tanruixiang in https://github.com/vllm-project/vllm/pull/21673
  • [NVIDIA] Fix Llama4 Scout FP4 functionality issues by @nvpohanh in https://github.com/vllm-project/vllm/pull/21499
  • [Docs] Reduce the size of the built docs by @hmellor in https://github.com/vllm-project/vllm/pull/21920
  • [Bugfix] Fix OOM tests in initialization test by @Isotr0py in https://github.com/vllm-project/vllm/pull/21921
  • [Bugfix] Fix multi-api server not working for text models by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21933
  • Override attention metadata for fast prefill in some KV sharing setups by @sarckk in https://github.com/vllm-project/vllm/pull/21590
  • [Bugfix] Fix TypeError in scheduler when comparing mixed request_id types by @chi2liu in https://github.com/vllm-project/vllm/pull/21816
  • [CI/Build] Fix registry tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21934
  • [Bugfix] SharedStorage Connector for V1 PD multimodal by @fake0fan in https://github.com/vllm-project/vllm/pull/21611
  • feat(distributed): add get_required_kvcache_layout class method to kv connector api by @wxsms in https://github.com/vllm-project/vllm/pull/20433
  • [TPU] Support Pathways in vLLM by @wenxindongwork in https://github.com/vllm-project/vllm/pull/21417
  • [Misc] Support more collective_rpc return types by @njhill in https://github.com/vllm-project/vllm/pull/21845
  • For VLLM_USE_PRECOMPILED, only compiled .so files should be extracted by @dougbtv in https://github.com/vllm-project/vllm/pull/21964
  • [Misc] Use dracut on CentOS and skip clone if repo exists for EP kernel installation by @minosfuture in https://github.com/vllm-project/vllm/pull/21635
  • [Feature] Add async tensor parallelism for scaled mm by @cascade812 in https://github.com/vllm-project/vllm/pull/20155
  • [Bugfix] Fix None value handling in trace span creation for cancelled requests by @br4mm in https://github.com/vllm-project/vllm/pull/20272
  • [Core] Move EngineCoreRequest to Request conversion out of EngineCore by @linzebing in https://github.com/vllm-project/vllm/pull/21627
  • [Example] Add async_llm_streaming.py example for AsyncLLM streaming in python by @mgoin in https://github.com/vllm-project/vllm/pull/21763
  • [Bugfix] Relax lang pin for voxtral by @sanchit-gandhi in https://github.com/vllm-project/vllm/pull/21833
  • [UX] Rename CUTLASS_MLA_VLLM_V1 to CUTLASS_MLA by @mgoin in https://github.com/vllm-project/vllm/pull/21966
  • [Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels by @jeejeelee in https://github.com/vllm-project/vllm/pull/21818
  • [CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes by @mgoin in https://github.com/vllm-project/vllm/pull/21973
  • [Bugfix]: fix metadata file copy in test_sharded_state_loader by @andyxning in https://github.com/vllm-project/vllm/pull/21830
  • [Deprecation] Remove deprecated args and methods by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21907
  • [CI/Build] get rid of unused VLLM_FA_CMAKE_GPU_ARCHES by @dtrifiro in https://github.com/vllm-project/vllm/pull/21599
  • [Model][CI] Let more pooling models support v1 by @noooop in https://github.com/vllm-project/vllm/pull/21747
  • [BugFix] Fix case where collective_rpc returns None by @njhill in https://github.com/vllm-project/vllm/pull/22006
  • [NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend by @amirkl94 in https://github.com/vllm-project/vllm/pull/21458
  • [Model] Add step3 vl by @Oliver-ss in https://github.com/vllm-project/vllm/pull/21998
  • [ez] Remove a trailing space from compilation/decorators.py by @zhxchen17 in https://github.com/vllm-project/vllm/pull/22028
  • fix(setup): improve precompiled wheel setup for Docker builds by @dougbtv in https://github.com/vllm-project/vllm/pull/22025
  • Removing amdproduction Tests by @Alexei-V-Ivanov-AMD in https://github.com/vllm-project/vllm/pull/22027
  • Update torch_xla pin to 20250730 by @vanbasten23 in https://github.com/vllm-project/vllm/pull/21956
  • [Meta] Official Eagle mm support, first enablement on llama4 by @morgendave in https://github.com/vllm-project/vllm/pull/20788
  • [Misc] Add unit tests for chunked local attention by @sarckk in https://github.com/vllm-project/vllm/pull/21692
  • [Bugfix] Fix MTP weight loading by @benchislett in https://github.com/vllm-project/vllm/pull/21941
  • Add FlashInfer allreduce RMSNorm Quant fusion by @ilmarkov in https://github.com/vllm-project/vllm/pull/21069
  • [Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 by @yewentao256 in https://github.com/vllm-project/vllm/pull/21639
  • Add DeepGEMM to Dockerfile in vllm-base image by @MatthewBonanni in https://github.com/vllm-project/vllm/pull/21533
  • Move flashinfer-python to optional extra vllm[flashinfer] by @mgoin in https://github.com/vllm-project/vllm/pull/21959
  • [Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM by @yewentao256 in https://github.com/vllm-project/vllm/pull/21787
  • [Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache by @charent in https://github.com/vllm-project/vllm/pull/20873
  • [Misc] Automatically resolve HF processor init kwargs by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22005
  • [BugFix] fix: aot passes kvcache dtype information by @mickaelseznec in https://github.com/vllm-project/vllm/pull/19750
  • [Model] [Quantization] Support quantization for Gemma3n by @kylesayrs in https://github.com/vllm-project/vllm/pull/21974
  • [Doc] Add Voxtral to Supported Models page by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22059
  • Update sampling_metadata.py by @Aviadr-neureality in https://github.com/vllm-project/vllm/pull/21937
  • [Doc] Fix a syntax error of example code in structured_outputs.md by @hsliuustc0106 in https://github.com/vllm-project/vllm/pull/22045
  • [Bugfix] Disable multi-modal preprocessor cache for DP by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21896
  • [Core] Avoid repeated len(block_token_ids) check in hash_request_tokens by @linzebing in https://github.com/vllm-project/vllm/pull/21781
  • [Frontend] Align tool_choice="required" behavior with OpenAI when tools is empty by @n0gu-furiosa in https://github.com/vllm-project/vllm/pull/21052
  • Revert precompile wheel changes by @simon-mo in https://github.com/vllm-project/vllm/pull/22055
  • [Doc] Add example for Step3-VL by @ywang96 in https://github.com/vllm-project/vllm/pull/22061
  • [Bugfix] Add log prefix in non-dp mode engine core by @wuhang2014 in https://github.com/vllm-project/vllm/pull/21889
  • [Misc] Remove upper bound in openai package version by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22060
  • [Doc] Added warning of speculating with draft model by @david6666666 in https://github.com/vllm-project/vllm/pull/22047
  • [Quantization] Enable BNB support for InternS1 by @jeejeelee in https://github.com/vllm-project/vllm/pull/21953
  • Revert "Update sampling_metadata.py (#21937)" by @hmellor in https://github.com/vllm-project/vllm/pull/22088
  • [Speculative Decoding] Add speculators config support by @dsikka in https://github.com/vllm-project/vllm/pull/21345
  • [BUG] [ROCm] Fix import bug on ROCm by @tjtanaa in https://github.com/vllm-project/vllm/pull/22083
  • Fix get_kwargs for case where type hint is list[Union[str, type]] by @hmellor in https://github.com/vllm-project/vllm/pull/22016
  • [Bugfix] Check NVIDIA artifactory is accessible before using flashinfer cubin kernels by @mgoin in https://github.com/vllm-project/vllm/pull/21893
  • feat(multimodal): Add customizable background color for RGBA to RGB conversion by @ahengljh in https://github.com/vllm-project/vllm/pull/22052
  • [Bugfix][PD] set max_completion_tokens=1 if req has this value by @Abirdcfly in https://github.com/vllm-project/vllm/pull/21841
  • [Refactor] Fix Compile Warning #1444-D by @yewentao256 in https://github.com/vllm-project/vllm/pull/21462
  • [BugFix] Update AttnFusionPass cache key by @zou3519 in https://github.com/vllm-project/vllm/pull/21947
  • [BugFix] Don't change title of top-level process by @njhill in https://github.com/vllm-project/vllm/pull/22032
  • [Docs] use uv in CPU installation docs by @davidxia in https://github.com/vllm-project/vllm/pull/22089
  • Deprecate --disable-log-requests and replace with --enable-log-requests by @hmellor in https://github.com/vllm-project/vllm/pull/21739
  • Improve documentation of ModelConfig.try_get_generation_config to prevent future confusion by @hmellor in https://github.com/vllm-project/vllm/pull/21526
  • [Bugfix] Fix glm4.1v video inference issue by @Isotr0py in https://github.com/vllm-project/vllm/pull/22067
  • [Bugfix] fix when skip tokenizer init by @lengrongfu in https://github.com/vllm-project/vllm/pull/21922
  • security policy: take 1 by @sidhpurwala-huzaifa in https://github.com/vllm-project/vllm/pull/21119
  • [Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch by @varun-sundar-rabindranath in https://github.com/vllm-project/vllm/pull/21837
  • Enable headless models for pooling in the Transformers backend by @hmellor in https://github.com/vllm-project/vllm/pull/21767
  • [Misc] Minor enhancement of benchmark_moe by @jeejeelee in https://github.com/vllm-project/vllm/pull/22068
  • Fix pre-commit failure for SECURTIY.md by @mgoin in https://github.com/vllm-project/vllm/pull/22102
  • [compile][startup] Disable C++ compilation of symbolic shapes by @anijain2305 in https://github.com/vllm-project/vllm/pull/20836
  • Introduce RayPPCommunicator for ray-based PP by @ruisearch42 in https://github.com/vllm-project/vllm/pull/21660
  • Add lora test for tp>1 case for TPU. by @vanbasten23 in https://github.com/vllm-project/vllm/pull/21970
  • [BugFix] Harden distributed DP startup by @njhill in https://github.com/vllm-project/vllm/pull/21538
  • [CI] Initial tests for SM100 Blackwell runner by @mgoin in https://github.com/vllm-project/vllm/pull/21877
  • [Perf] Optimize reshape_and_cache_flash CUDA Kernel by @yewentao256 in https://github.com/vllm-project/vllm/pull/22036
  • feat: Add Support GPTQ Quantization MOE on ROCM vllm serve by @JartX in https://github.com/vllm-project/vllm/pull/21733
  • [V1][CUDA] Full cudagraph support for FlashInfer by @fhl2000 in https://github.com/vllm-project/vllm/pull/21367
  • [Model] Qwen2.5 VL SiLU-and-Mul by @vllmellm in https://github.com/vllm-project/vllm/pull/22066
  • [Misc] VLLM_TARGET_DEVICE.lower() by @NickLucche in https://github.com/vllm-project/vllm/pull/22101
  • [Misc] DeepGemmExperts : Avoid JIT generation in the hot-path by @varun-sundar-rabindranath in https://github.com/vllm-project/vllm/pull/21955
  • [Speculators][Speculative Decoding] Add Qwen Eagle3 Support by @dsikka in https://github.com/vllm-project/vllm/pull/21835
  • [BugFix] Improve internal DP load balancing by @njhill in https://github.com/vllm-project/vllm/pull/21617
  • [Test] Add Unit Test for Batched DeepGEMM by @yewentao256 in https://github.com/vllm-project/vllm/pull/21559
  • [Attention][DBO] Add support for "splitting" the CommonAttentionMetadata by @SageMoore in https://github.com/vllm-project/vllm/pull/21153
  • [FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. by @vllmellm in https://github.com/vllm-project/vllm/pull/22069
  • [Misc] Getting and passing ray runtime_env to workers by @ruisearch42 in https://github.com/vllm-project/vllm/pull/22040
  • Fix test_kv_sharing_fast_prefill flakiness by @sarckk in https://github.com/vllm-project/vllm/pull/22038
  • [Bugfix] Mamba2 remove bugged initial state condition in chunk scan by @cyang49 in https://github.com/vllm-project/vllm/pull/22034
  • docs: remove deprecated disable-log-requests flag by @ywang96 in https://github.com/vllm-project/vllm/pull/22113
  • [PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion by @vadiklyutiy in https://github.com/vllm-project/vllm/pull/20000
  • for glm-4.1V update by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/22000
  • [Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead by @cyang49 in https://github.com/vllm-project/vllm/pull/21075
  • [Frontend] Improve error message for too many mm items by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22114
  • [V1] [Hybrid] Validate compatibility of attention backend batch reordering at init time by @tdoublep in https://github.com/vllm-project/vllm/pull/21557
  • [xpu]support moe models on XPU platform by @yma11 in https://github.com/vllm-project/vllm/pull/21643
  • Revert "[compile][startup] Disable C++ compilation of symbolic shapes" by @xiszishu in https://github.com/vllm-project/vllm/pull/22122
  • [Misc] Bump ray to 2.48.0 by @ruisearch42 in https://github.com/vllm-project/vllm/pull/22123
  • [Fix] Fix llama4 modelopt weight loading error by @jiahanc in https://github.com/vllm-project/vllm/pull/22107
  • [Misc] Add tensor schema test coverage for multimodal models by @Isotr0py in https://github.com/vllm-project/vllm/pull/21754
  • [Benchmark] Support ready check timeout in vllm bench serve by @yeqcharlotte in https://github.com/vllm-project/vllm/pull/21696
  • Support CUTLASS NVFP4 (w4a4) for Blackwell Geforce GPUs (SM120) by @LopezCastroRoberto in https://github.com/vllm-project/vllm/pull/21309
  • [Misc] update doc comment for send by @andyxning in https://github.com/vllm-project/vllm/pull/22026
  • [executor] feat: add supports_pp attr to executors by @eric-haibin-lin in https://github.com/vllm-project/vllm/pull/21786
  • [V1] [P/D] Refactor KV Connector Path by @sdavidbd in https://github.com/vllm-project/vllm/pull/21980
  • [Responses API] Disable response store by default by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22137
  • [CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/22145
  • Add chat doc in quick start by @TankNee in https://github.com/vllm-project/vllm/pull/21213
  • fuse fp32 for GLM-4.5 e_score_correction_bias by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/22143
  • [Bugfix] Fix failing multimodal standard test by @Isotr0py in https://github.com/vllm-project/vllm/pull/22153
  • Use aiohttp connection pool for benchmarking by @eicherseiji in https://github.com/vllm-project/vllm/pull/21981
  • [fix] fix correct assertion syntax error in attention utils. by @skyloevil in https://github.com/vllm-project/vllm/pull/22154
  • [RLHF] Fix torch.dtype not serializable in example by @22quinn in https://github.com/vllm-project/vllm/pull/22158
  • [PD] add test for chat completions endpoint by @Abirdcfly in https://github.com/vllm-project/vllm/pull/21925
  • remove duplicate code within cleanup_dist_env_and_memory by @andyxning in https://github.com/vllm-project/vllm/pull/22147
  • Add tree attention backend for v1 (part 1) by @TheEpicDolphin in https://github.com/vllm-project/vllm/pull/20401
  • [refactor] improve ConstantList exception specificity by @skyloevil in https://github.com/vllm-project/vllm/pull/22156
  • Remove index_put from MM embeddings merging by @chenxi-yang in https://github.com/vllm-project/vllm/pull/22105
  • [CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes by @tlrmchlsmth in https://github.com/vllm-project/vllm/pull/22163
  • [Misc] Modify the organization of GLM series by @jeejeelee in https://github.com/vllm-project/vllm/pull/22171
  • [feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading by @weixiao-huang in https://github.com/vllm-project/vllm/pull/21164
  • [Bugfix] Fix failing GGUF models test by @Isotr0py in https://github.com/vllm-project/vllm/pull/22174
  • [Sampler] Support returning all logprobs or logits by @22quinn in https://github.com/vllm-project/vllm/pull/21792
  • [Doc] Update pooling model docs by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22186
  • Fix Arcee model weight loading: Add custom load_weights by @alyosha-swamy in https://github.com/vllm-project/vllm/pull/21725
  • [Responses API] Ignore store=True and process the request by default by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22185
  • [Bug] Update auto_tune.sh to separate benchmarking and profiling. by @ericehanley in https://github.com/vllm-project/vllm/pull/21629
  • [Bugfix][V1][P/D]Fix the uneven polling issue in the toy proxy for P2pNcclConnector by @Abatom in https://github.com/vllm-project/vllm/pull/21819
  • [NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading by @nvpohanh in https://github.com/vllm-project/vllm/pull/22073
  • [Bugfix] V1 Fix the cursor leakage issue during request scheduling. by @CLFutureX in https://github.com/vllm-project/vllm/pull/21173
  • Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22223
  • [V1] reduce block size for tree attention correctness test to fix 'ou… by @TheEpicDolphin in https://github.com/vllm-project/vllm/pull/22207
  • [V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) by @lk-chen in https://github.com/vllm-project/vllm/pull/21785
  • [FEAT] Refactor ROPE into module by @tjtanaa in https://github.com/vllm-project/vllm/pull/22192
  • [ROCm][Bugfix] Compilation passes fix by @gshtras in https://github.com/vllm-project/vllm/pull/22202
  • self.gate dtype update for GLM-4.5 by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/22203
  • [Log] DeepGEMM Update Log for Unaligned Problem Size by @yewentao256 in https://github.com/vllm-project/vllm/pull/22208
  • fix: kimi_k2 return empty tool call list by @tlipoca9 in https://github.com/vllm-project/vllm/pull/22149
  • [Misc] Remove pass_config from CompilationConfig dump_json excluded by @elvischenv in https://github.com/vllm-project/vllm/pull/21911
  • [Doc] add backend to doc string of initialize_model_parallel by @andyxning in https://github.com/vllm-project/vllm/pull/22142
  • [Misc] log more detailed message for ensure_model_parallel_initialized by @andyxning in https://github.com/vllm-project/vllm/pull/22144
  • Optimize configuration access with LRU cache in custom ops by @skyloevil in https://github.com/vllm-project/vllm/pull/22204
  • [Bugfix] Misaligned params in TreeAttentionImpl by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22226
  • [UX] Fail if an invalid attention backend is specified by @mgoin in https://github.com/vllm-project/vllm/pull/22217
  • [Core] Factor out common logic for MM budget calculation by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22228
  • [Model] Pooling model activation supports per request control by PoolingParams by @noooop in https://github.com/vllm-project/vllm/pull/20538
  • [Docs][TPU] Highlight TPU Software version selection by @NickLucche in https://github.com/vllm-project/vllm/pull/22242
  • Migrate KimiVLImagePixelInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21769
  • [Feature] Non-contiguous Support for FP8 Quantization by @yewentao256 in https://github.com/vllm-project/vllm/pull/21961
  • [NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel by @elvischenv in https://github.com/vllm-project/vllm/pull/22095
  • [Misc] correct static type check for GroupCoordinator by @andyxning in https://github.com/vllm-project/vllm/pull/21946
  • [V0 Deprecation][TPU] Remove V1 flag check from tests by @NickLucche in https://github.com/vllm-project/vllm/pull/22248
  • Use UV_LINK_MODE=copy in Dockerfile to avoid hardlink fail by @mgoin in https://github.com/vllm-project/vllm/pull/22128
  • [CI/Build] Update flashinfer to 0.2.9 by @mgoin in https://github.com/vllm-project/vllm/pull/22233
  • [Refactor] Remove Unused Environment Variable VLLM_NO_DEPRECATION_WARNING by @yewentao256 in https://github.com/vllm-project/vllm/pull/22199
  • [V1] port xformers backend to v1 by @TheEpicDolphin in https://github.com/vllm-project/vllm/pull/21342
  • [bugfix] fix blackwell deepep installation by @youkaichao in https://github.com/vllm-project/vllm/pull/22255
  • [CI][TPU] Fix docker clean up by @lsy323 in https://github.com/vllm-project/vllm/pull/22271
  • [Bugfix] Remove faulty test for oot attention backend by @mgoin in https://github.com/vllm-project/vllm/pull/22286
  • [Bugfix] Fix 3D input passed into cutlass_scaled_mm by @mgoin in https://github.com/vllm-project/vllm/pull/22278
  • [Bugfix] Fix MoE BNB version by @jeejeelee in https://github.com/vllm-project/vllm/pull/22260
  • [Perf] Parallelize fill_bitmask to accelerate high-throughput guided decoding by @benchislett in https://github.com/vllm-project/vllm/pull/21862
  • [Bugfix] Skip dead and non-GPU nodes for Ray DP engine allocation by @ruisearch42 in https://github.com/vllm-project/vllm/pull/22275
  • [Bugfix][CI/Build][ROCm] Make sure to use the headers from the build folder on ROCm by @gshtras in https://github.com/vllm-project/vllm/pull/22264
  • Upgrade FA3 for attention sink by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22313
  • Increase openai-python version by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22316
  • Add attention sink in attention backends by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22320
  • Update transformers to v4.55 by @hmellor in https://github.com/vllm-project/vllm/pull/21931
  • Add GPT-OSS model code and config [1/N] by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22327
  • [ROCm] Add attention sink to use_rocm_custom_paged_attention by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22329
  • [GptOss] Add GptOss reasoning parser to support structure output by @heheda12345 in https://github.com/vllm-project/vllm/pull/22322
  • [gpt-oss] flashinfer attention sink init by @zyongye in https://github.com/vllm-project/vllm/pull/22330
  • [gpt-oss] Add openai-harmony as default dependency by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22332
  • [Misc] Clean up duplicated hf overrides by @Isotr0py in https://github.com/vllm-project/vllm/pull/22311
  • [gpt-oss] Add Tool/ConversationContext classes and harmony_utils by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22340
  • [gpt-oss] add model to supported models doc by @ywang96 in https://github.com/vllm-project/vllm/pull/22336
  • [gpt-oss] Support chat completion api by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22342
  • [Minor] Fix type by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22347
  • [BugFix] Fix FA2 RuntimeError when sinks is provided by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/22365
  • add the codes to check AMD Instinct GPU number by @zhangnju in https://github.com/vllm-project/vllm/pull/22367
  • [BugFix] Fix triton compile error in kernel_unified_attention_2/3d caused by attention sinks by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/22368
  • [Bugfix] Make condition in triton kernel constexpr by @gshtras in https://github.com/vllm-project/vllm/pull/22370
  • [gpt-oss] Add loop for built-in tool call by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22374
  • [gpt-oss] attention sink init fix gemini by @zyongye in https://github.com/vllm-project/vllm/pull/22335
  • [gpt-oss] flashinfer mxfp4 by @zyongye in https://github.com/vllm-project/vllm/pull/22339
  • [v1] - Mamba1 Attention Metadata by @Josephasafg in https://github.com/vllm-project/vllm/pull/21249
  • [Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue by @yewentao256 in https://github.com/vllm-project/vllm/pull/22399
  • [gpt-oss] add demo tool server by @heheda12345 in https://github.com/vllm-project/vllm/pull/22393
  • [gpt-oss] fix model config with hf_config by @zyongye in https://github.com/vllm-project/vllm/pull/22401
  • Fix trtllm-gen attention env and add attention sink by @IwakuraRein in https://github.com/vllm-project/vllm/pull/22378
  • Update flashinfer-python==0.2.10 by @mgoin in https://github.com/vllm-project/vllm/pull/22389
  • [model] Support MiniCPM-V 4.0 by @tc-mb in https://github.com/vllm-project/vllm/pull/22166
  • Support encoder_only attention for FlexAttention by @maxdebayser in https://github.com/vllm-project/vllm/pull/22273
  • [Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/21588
  • [XPU]Fix flash_attn_varlen_func interface on xpu by @jikunshang in https://github.com/vllm-project/vllm/pull/22350
  • [Qwen3] Enable dual-chunk-attention support for Qwen3 models. by @sighingnow in https://github.com/vllm-project/vllm/pull/21924
  • [Bugfix] Fix wrong method name in Intern-S1 image processor by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22417
  • Use float32 for test_completion.py by @mgoin in https://github.com/vllm-project/vllm/pull/22385
  • [Bugfix]: Fix the streaming output for function calls in the minimax by @qscqesze in https://github.com/vllm-project/vllm/pull/22015
  • [Bugfix] Add proper comparison for package versions by @syedmba in https://github.com/vllm-project/vllm/pull/22314
  • Update hf_xet pin to resolve hangs by @hmellor in https://github.com/vllm-project/vllm/pull/22356
  • Optimize logger init performance by using module-level constants by @skyloevil in https://github.com/vllm-project/vllm/pull/22373
  • preload heavy modules when mp method is forkserver by @lionelvillard in https://github.com/vllm-project/vllm/pull/22214
  • [gpt-oss] Convert user input to harmony format by @heheda12345 in https://github.com/vllm-project/vllm/pull/22402
  • [Bugfix] EPLB load statistics problem by @david6666666 in https://github.com/vllm-project/vllm/pull/22167
  • [CI] Skip the pooling models that do not support transformers v4.55 by @noooop in https://github.com/vllm-project/vllm/pull/22411
  • [Bench] Split serve.py:main into async/async versions by @lk-chen in https://github.com/vllm-project/vllm/pull/22405
  • [Model] Switch to Fused RMS norm in Qwen2.5_VL model. by @vllmellm in https://github.com/vllm-project/vllm/pull/22184
  • [Frontend] Update OpenAI error response to upstream format by @msanft in https://github.com/vllm-project/vllm/pull/22099
  • [Misc] Support routing logic simulation by @minosfuture in https://github.com/vllm-project/vllm/pull/21990
  • feat: Add --enable-log-outputs flag for logging model generations by @mizadri in https://github.com/vllm-project/vllm/pull/20707
  • [Docs] Add missing dependency for docs build by @hmellor in https://github.com/vllm-project/vllm/pull/22435
  • Add H20-3e fused MoE kernel tuning configs for GLM-4.5 by @JaceyShao in https://github.com/vllm-project/vllm/pull/22433
  • [Misc] Enhance code formatting in mxfp4.py by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22423
  • [Doc] Fix link to prefix caching design by @sarckk in https://github.com/vllm-project/vllm/pull/22384
  • [Docs] Factor out troubleshooting to its own guide; add section for Ray Observability by @crypdick in https://github.com/vllm-project/vllm/pull/21578
  • [Doc] update docs for nightly benchmarks by @andrewkchan in https://github.com/vllm-project/vllm/pull/12022
  • [Docs] Update features/disagg_prefill, add v1 examples and development by @david6666666 in https://github.com/vllm-project/vllm/pull/22165
  • [Core] Store only the keys for multi-modal data in P0 by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22198
  • [Bugfix] Add missing packed_modules_mapping to DeepseekV2ForCausalLM by @fxmarty-amd in https://github.com/vllm-project/vllm/pull/22352
  • [Tool] Fix auto tool call by @heheda12345 in https://github.com/vllm-project/vllm/pull/22434
  • [gpt-oss] Generate ResponseOutputItem from Harmony Message by @heheda12345 in https://github.com/vllm-project/vllm/pull/22410
  • Fix pre-commit error in main by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22462
  • [Core] Simplify mm processing cache by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22457
  • [Frontend] Use engine argument to control MM cache size by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22441
  • Remove from_dict from SpeculativeConfig by @hmellor in https://github.com/vllm-project/vllm/pull/22451
  • [Misc] normalize multiprocessing Queue usage by @andyxning in https://github.com/vllm-project/vllm/pull/22371
  • [ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine by @tjtanaa in https://github.com/vllm-project/vllm/pull/21496
  • [PERF] Use pybase64 to more quickly decode prompt embeddings by @qthequartermasterman in https://github.com/vllm-project/vllm/pull/22469
  • Add ModelOpt Qwen3 nvfp4 support by @Edwardf0t1 in https://github.com/vllm-project/vllm/pull/20101
  • Support Tensorrt-LLM MoE fp4 for low-latency by @wenscarl in https://github.com/vllm-project/vllm/pull/21331
  • Fix Flashinfer CUTLASS MOE Allgather by @wenscarl in https://github.com/vllm-project/vllm/pull/21963
  • [Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) by @0xjunhao in https://github.com/vllm-project/vllm/pull/22131
  • [Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/22065
  • not tie_word_embeddings for glm-4.5 and glm-4.5v by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/22460
  • Optimize MiniCPMO mask creation with vectorized implementation by @skyloevil in https://github.com/vllm-project/vllm/pull/22464
  • Fix pre-commit by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22487
  • [bugfix] Fix Llama3/4 issues caused by FlashInfer 0.2.10 by @nvpohanh in https://github.com/vllm-project/vllm/pull/22426
  • [Doc] Sleep mode documentation by @iAmir97 in https://github.com/vllm-project/vllm/pull/22310
  • [bench] Fix benchmark/serve.py to ignore unavailable results by @lk-chen in https://github.com/vllm-project/vllm/pull/22382
  • [CI/Build] Fix multimodal tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22491
  • [Misc] Begin deprecation of get_tensor_model_*_group by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22494
  • [Misc] fix openai version by @lengrongfu in https://github.com/vllm-project/vllm/pull/22485
  • [BugFix] Don't cancel asyncio tasks directly from destructors by @njhill in https://github.com/vllm-project/vllm/pull/22476
  • [Docs] Improve API docs (+small tweaks) by @hmellor in https://github.com/vllm-project/vllm/pull/22459
  • Remove exception for Python 3.8 typing from linter by @hmellor in https://github.com/vllm-project/vllm/pull/22506
  • [gpt-oss] triton kernel mxfp4 by @zyongye in https://github.com/vllm-project/vllm/pull/22421
  • [Benchmark] Add benchmark tool for multi turn conversations by @pliops-daniels in https://github.com/vllm-project/vllm/pull/20267
  • [gpt-oss] guard import when triton kernel is not installed by @zyongye in https://github.com/vllm-project/vllm/pull/22529
  • [Docs] Rename “Distributed inference and serving” to “Parallelism & Scaling” by @crypdick in https://github.com/vllm-project/vllm/pull/22466
  • [gpt-oss] Support tool call and implement MCP tool server by @heheda12345 in https://github.com/vllm-project/vllm/pull/22427
  • [BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/21691
  • [Misc] DeepGEMM : Avoid JIT generation in the hot-path by @varun-sundar-rabindranath in https://github.com/vllm-project/vllm/pull/22215
  • [Bugfix] Update FA commit hash by @tdoublep in https://github.com/vllm-project/vllm/pull/22546
  • Skip Qwen 1 in CI because remote code is no longer compatible with Transformers by @hmellor in https://github.com/vllm-project/vllm/pull/22536
  • [Docs] fix broken links in metrics.md by @GuyStone in https://github.com/vllm-project/vllm/pull/22315
  • [Frontend] Add unix domain socket support by @yyweiss in https://github.com/vllm-project/vllm/pull/18097
  • Extract CompilationConfig from config.py by @hmellor in https://github.com/vllm-project/vllm/pull/22524
  • Drop flaky test_healthcheck_response_time by @russellb in https://github.com/vllm-project/vllm/pull/22539
  • [XPU] upgrade torch 2.8 on for XPU by @jikunshang in https://github.com/vllm-project/vllm/pull/22300
  • [BugFix] [P/D] Handle lookahead token count edge-case with Eagle Spec Decoding and P/D by @Pradyun92 in https://github.com/vllm-project/vllm/pull/22317
  • [Bugfix] Fix ModernBert cuda graph capturing in v1 by @Isotr0py in https://github.com/vllm-project/vllm/pull/21901
  • Implicit language-model-only mode via limit-mm-per-prompt by @ywang96 in https://github.com/vllm-project/vllm/pull/22299
  • [Doc] Add usage of implicit text-only mode by @ywang96 in https://github.com/vllm-project/vllm/pull/22561
  • Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation by @tdoublep in https://github.com/vllm-project/vllm/pull/22541
  • [Log] Add Warning for Deprecation of DeepGEMM old version by @yewentao256 in https://github.com/vllm-project/vllm/pull/22194
  • [V1] [Hybrid] Support Minimax-Text-01 in V1 by @tdoublep in https://github.com/vllm-project/vllm/pull/22151
  • v1: Pass KVConnectorOutput to scheduler-side by @orozery in https://github.com/vllm-project/vllm/pull/22157
  • [Misc] Use config definitions from Transformers library by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21913
  • Fix loading of quantized BigCode models by @eldarkurtic in https://github.com/vllm-project/vllm/pull/22463
  • [TPU] Add support for online w8a8 quantization by @kyuyeunk in https://github.com/vllm-project/vllm/pull/22425
  • [ROCm][Misc] Rename the context_len to seq_len in ROCm custom paged attention kernel by @charlifu in https://github.com/vllm-project/vllm/pull/22097
  • [Bugfix] Fix failing GPT-OSS initialization test by @Isotr0py in https://github.com/vllm-project/vllm/pull/22557
  • [Bugfix] Fix CI moe kernel failure by @jeejeelee in https://github.com/vllm-project/vllm/pull/22556
  • Update docs for Minimax-Text support by @tdoublep in https://github.com/vllm-project/vllm/pull/22562
  • GLM-4.5V with new class name at transformers by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/22520
  • [CI] [Hybrid] Speed up hybrid models test by removing large models by @tdoublep in https://github.com/vllm-project/vllm/pull/22563
  • [Docs] Reduce noise in docs and --help from the JSON tip by @hmellor in https://github.com/vllm-project/vllm/pull/22567
  • Move ParallelConfig from config/__init__.py to config/parallel.py by @hmellor in https://github.com/vllm-project/vllm/pull/22565
  • [Model] Gemma3n MM by @NickLucche in https://github.com/vllm-project/vllm/pull/20495
  • [Bugfix] Fix basic models tests hanging due to mm processor creation by @Isotr0py in https://github.com/vllm-project/vllm/pull/22571
  • [FEAT] [Performance] Add triton mrope to replace the torch code path by @tjtanaa in https://github.com/vllm-project/vllm/pull/22375
  • [V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers by @tdoublep in https://github.com/vllm-project/vllm/pull/21401
  • [oss] Init gpt-oss bf16 support by @jeejeelee in https://github.com/vllm-project/vllm/pull/22508
  • [Config] add "qwen" as a native eagle3 target supported model by @lec77 in https://github.com/vllm-project/vllm/pull/22333
  • Improve fast_topk function with type hints and documentation by @skyloevil in https://github.com/vllm-project/vllm/pull/22530
  • [TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block by @yaochengji in https://github.com/vllm-project/vllm/pull/22394
  • Refactor sliding window configuration to Transformers best practice by @hmellor in https://github.com/vllm-project/vllm/pull/21927
  • [Minor] Fix pre-commit error on main by @Isotr0py in https://github.com/vllm-project/vllm/pull/22579
  • [Misc] code clean duplicate set_current_vllm_config in _set_vllm_config by @andyxning in https://github.com/vllm-project/vllm/pull/22566
  • [Doc] Fix API doc link in side navigation by @22quinn in https://github.com/vllm-project/vllm/pull/22585
  • [Misc] Further refine type annotations in parallel state by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22499
  • [Docs] Fix warnings in docs build by @hmellor in https://github.com/vllm-project/vllm/pull/22588
  • [Misc] Replace flaky image urls in pixtral test by @Isotr0py in https://github.com/vllm-project/vllm/pull/22574
  • Move CacheConfig from config/__init__.py to config/cache.py by @hmellor in https://github.com/vllm-project/vllm/pull/22586
  • [doc] add beijing meetup links by @youkaichao in https://github.com/vllm-project/vllm/pull/22596
  • [doc] add alibaba cloud as sponsor by @youkaichao in https://github.com/vllm-project/vllm/pull/22597
  • [Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel by @Isotr0py in https://github.com/vllm-project/vllm/pull/22593
  • Fix(benchmarks): allow multiple mm contents in OpenAI Chat Completion Benchmarks by @h-brenoskuk in https://github.com/vllm-project/vllm/pull/22534
  • Migrate LlavaNextImageInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21774
  • Remove redundant row_indices unsqueeze operation in MiniCPMO by @skyloevil in https://github.com/vllm-project/vllm/pull/22528
  • Fix TensorSchema validation test for symbolic dims by @bbeckca in https://github.com/vllm-project/vllm/pull/22366
  • enable Docker-aware precompiled wheel setup by @dougbtv in https://github.com/vllm-project/vllm/pull/22106
  • Migrate LlavaNextVideoPixelInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21843
  • Migrate LlavaImageInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21770
  • [CI/Build] Fix tensorizer test for load_format change by @22quinn in https://github.com/vllm-project/vllm/pull/22583
  • [BugFix] Fix KVConnectorOutput TPU breakage by @njhill in https://github.com/vllm-project/vllm/pull/22598
  • [Misc][gpt-oss] Add rules to label gpt-oss related PRs by @draftbk in https://github.com/vllm-project/vllm/pull/22600
  • [Misc][gpt-oss] guard import when triton kernel when not up to date by @zhewenl in https://github.com/vllm-project/vllm/pull/22584
  • [BugFix] Fix logits repetition penalty cuda check by @PicoCreator in https://github.com/vllm-project/vllm/pull/22592
  • [ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. by @vllmellm in https://github.com/vllm-project/vllm/pull/22521
  • Support token_type_ids in V1 with less code changes by @maxdebayser in https://github.com/vllm-project/vllm/pull/21985
  • [Misc] benchmark_moe supports expert parallel by @jeejeelee in https://github.com/vllm-project/vllm/pull/22251
  • [BUGFIX] KeyError 'layers.14.mlp.gate.g_idx' for Qwen3-MoE with GPTQ on ROCm by @JartX in https://github.com/vllm-project/vllm/pull/22017
  • [Docs] Add comprehensive CLI reference for all large vllm subcommands by @hmellor in https://github.com/vllm-project/vllm/pull/22601
  • [Misc] Move tensor schema tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22612
  • [Misc] Move jsontree to utils by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22622
  • [Model] NemotronH Support by @danielafrimi in https://github.com/vllm-project/vllm/pull/22349
  • Document aarch64 CPU support works by @ericcurtin in https://github.com/vllm-project/vllm/pull/22646
  • [Misc] Further clean up some redundant config definitions by @Isotr0py in https://github.com/vllm-project/vllm/pull/22649
  • [Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale by @yewentao256 in https://github.com/vllm-project/vllm/pull/21968
  • fix: NIXL connector transfers partial block to pass full multi-modal context by @GuanLuo in https://github.com/vllm-project/vllm/pull/21074
  • [Model] Pooling models default to using chunked prefill & prefix caching if supported. by @noooop in https://github.com/vllm-project/vllm/pull/20930
  • [CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 by @Isotr0py in https://github.com/vllm-project/vllm/pull/22659
  • [BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI by @22quinn in https://github.com/vllm-project/vllm/pull/22611
  • [CI] Skip Tree Attn Test in test_max_len.py to unblock CI by @tjtanaa in https://github.com/vllm-project/vllm/pull/22664
  • Support more parallel styles in Transformers backend TP by @hmellor in https://github.com/vllm-project/vllm/pull/22651
  • [gpt-oss] Support streaming in response API by @heheda12345 in https://github.com/vllm-project/vllm/pull/22431
  • [gpt-oss] Add test for response API + harmony (but skipped) by @heheda12345 in https://github.com/vllm-project/vllm/pull/22554
  • Enable 4bit bnb prequant MOE by @py-andy-c in https://github.com/vllm-project/vllm/pull/21548
  • Re-enable Xet on TPU tests now that hf_xet has been updated by @hmellor in https://github.com/vllm-project/vllm/pull/22666
  • Upgrade FlashInfer to v0.2.11 by @nvpohanh in https://github.com/vllm-project/vllm/pull/22613
  • [CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py by @mgoin in https://github.com/vllm-project/vllm/pull/22686
  • [CI] Increase timeout for test_completion_with_image_embeds by @mgoin in https://github.com/vllm-project/vllm/pull/22670
  • Migrate MiniCPMVImageInputs to TensorSchema by @bbeckca in https://github.com/vllm-project/vllm/pull/21939
  • [gpt-oss] Fix mxfp4 support by @heheda12345 in https://github.com/vllm-project/vllm/pull/22700
  • [gpt-oss] Small bug fixes for frontend by @heheda12345 in https://github.com/vllm-project/vllm/pull/22512
  • Fix passing SpeculativeConfig from the CLI by @hmellor in https://github.com/vllm-project/vllm/pull/22652
  • [Doc] Added unmentioned required option "method" in the usage of EAGLE-3 based models by @hsliuustc0106 in https://github.com/vllm-project/vllm/pull/21737
  • [doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f by @sooraj-satheesh in https://github.com/vllm-project/vllm/pull/22707
  • [Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. by @noooop in https://github.com/vllm-project/vllm/pull/22637
  • Move SchedulerConfig from config/__init__.py to config/scheduler.py by @hmellor in https://github.com/vllm-project/vllm/pull/22626
  • [DOC] update v1_guide with INTEL HW by @xuechendi in https://github.com/vllm-project/vllm/pull/22679
  • [New Model] Support Command-A-Vision by @dongluw in https://github.com/vllm-project/vllm/pull/22660
  • [V0] Correct CUDA Graph capture for encoder-decoder models by @Sugar-zsg in https://github.com/vllm-project/vllm/pull/22630
  • [Bugfix] Fix erroneous randomly generated cases in bad word testing by @phantomlei3 in https://github.com/vllm-project/vllm/pull/22170
  • Fix: AWQ Marlin get_quant_method does not recognize "modules_to_not_convert" by @Jun-Howie in https://github.com/vllm-project/vllm/pull/21888
  • [Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 by @RishiAstra in https://github.com/vllm-project/vllm/pull/21783
  • [LMCache][Example] Align the PYTHONHASHSEED for prefillers and decoders for KV chunks hashing by @zejunchen-zejun in https://github.com/vllm-project/vllm/pull/21161
  • [Misc] remove GH discussions link by @jeejeelee in https://github.com/vllm-project/vllm/pull/22722
  • [gpt-oss] Enable gpt-oss on ampere by @zyongye in https://github.com/vllm-project/vllm/pull/22714
  • [Docs] Improve docs navigation by @hmellor in https://github.com/vllm-project/vllm/pull/22720
  • [BugFix][Nixl][PD] Fix heterogenous TP by @NickLucche in https://github.com/vllm-project/vllm/pull/22663
  • Officially support SmolLM3 using the Transformers backend by @hmellor in https://github.com/vllm-project/vllm/pull/22665
  • [CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py by @noooop in https://github.com/vllm-project/vllm/pull/22708
  • Fix Llama4 FlashInfer FP4 MoE issues by @nvpohanh in https://github.com/vllm-project/vllm/pull/22511
  • [Bugfix][CI] Fix test_remote_decode_lifecycle.py::test_short_prompt_lifecycle by @NickLucche in https://github.com/vllm-project/vllm/pull/22727
  • [Benchmark] Fix terminal colors in benchmark_serving_multi_turn (python 3.12) by @pliops-daniels in https://github.com/vllm-project/vllm/pull/22730
  • Add: SupportsEagle3 interface for explicit EAGLE3 support by @rahul-tuli in https://github.com/vllm-project/vllm/pull/22642
  • Add more test scenario for tensor schema by @teekenl in https://github.com/vllm-project/vllm/pull/22733
  • [Chore] Update CODEOWNERS to include @yewentao256 for CUDA kernels, attention backends, quantization, and related tests by @yewentao256 in https://github.com/vllm-project/vllm/pull/22741
  • [Kernel][AMD] Avoid D2H copy and cumsum kernel by @mxz297 in https://github.com/vllm-project/vllm/pull/22683
  • [CI][Nixl] Check kv cache layout during handshake by @NickLucche in https://github.com/vllm-project/vllm/pull/22745
  • Fix torch version check for SM100 mxfp4 by @zifeitong in https://github.com/vllm-project/vllm/pull/22535
  • [Misc] parametrize 'dtype' in test_flash_mla by @RUTHLESS-BOT in https://github.com/vllm-project/vllm/pull/22641
  • [Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues by @frankwang28 in https://github.com/vllm-project/vllm/pull/22606
  • [Docs] Hide the navigation and toc sidebars on home page by @hmellor in https://github.com/vllm-project/vllm/pull/22749
  • Fix Transformers backend tensor parallel for multimodal models by @hmellor in https://github.com/vllm-project/vllm/pull/22673
  • [Model] Decouple glm4v by @jeejeelee in https://github.com/vllm-project/vllm/pull/22751
  • Add hardware plugins to installation doc by @mgoin in https://github.com/vllm-project/vllm/pull/22732
  • [V0 Deprecation] Remove multi-step scheduling by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22138
  • [Misc] Remove tests/multi_step/init.py by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22778
  • [V0 Deprecation] Remove args for multi-step scheduling by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22779
  • Fix cuda illegal mem access with Llama4 TP8 + rms_norm custom op by @nvpohanh in https://github.com/vllm-project/vllm/pull/22701
  • [Bugfix] Fix default enable for CUTLASS MLA on SM100 by @mgoin in https://github.com/vllm-project/vllm/pull/22738
  • Force TRTLLM attention for gpt-oss on SM100 by @mgoin in https://github.com/vllm-project/vllm/pull/22678
  • Remove unneeded ROCm platform import when using CUDA by @mgoin in https://github.com/vllm-project/vllm/pull/22765
  • [Bug] Fix Unexpected Keyword Argument 'w1_bias' by @yewentao256 in https://github.com/vllm-project/vllm/pull/22757
  • [Perf] Support topk softmax fused kernel for broader num_experts by @shixianc in https://github.com/vllm-project/vllm/pull/22211
  • [gpt-oss] upgrade gpt-oss to v0.0.3 and add version check by @heheda12345 in https://github.com/vllm-project/vllm/pull/22768
  • [Model] Add option to run Step3VisionEncoder in DP by @zzh142857 in https://github.com/vllm-project/vllm/pull/22697
  • [Model] Add missing prefix to glm4_1v by @zRzRzRzRzRzRzR in https://github.com/vllm-project/vllm/pull/22716
  • [Bugfix] Fix Nemotron VL image processing by @ducviet00 in https://github.com/vllm-project/vllm/pull/22739
  • [Doc] Add max_lora_rank configuration guide by @chi2liu in https://github.com/vllm-project/vllm/pull/22782
  • [V1] Add tree drafting tests for eagle spec decoding by @TheEpicDolphin in https://github.com/vllm-project/vllm/pull/22705
  • [Platform] Custom ops support for FusedMoe by @wangxiyuan in https://github.com/vllm-project/vllm/pull/22509
  • [Frontend] Add chunked processing to handle long inputs in embedding models by @x22x22 in https://github.com/vllm-project/vllm/pull/22280
  • [FEATURE] support custom vllm tuned config path by @vermouth1992 in https://github.com/vllm-project/vllm/pull/22791
  • [Nixl][CI] Fix tests by @NickLucche in https://github.com/vllm-project/vllm/pull/22806
  • [Bugfix][mamba] Fix type annotation of Mamba2Metadata by @heheda12345 in https://github.com/vllm-project/vllm/pull/22787
  • Remove unnecessary CUDA sync of qwen image and video preprocess by @cyyever in https://github.com/vllm-project/vllm/pull/22792
  • Fix GGUF loader for Qwen3 MoE. by @Gh0u1L5 in https://github.com/vllm-project/vllm/pull/22785
  • [Frontend] Multithreaded async multimodal load_bytes by @milesial in https://github.com/vllm-project/vllm/pull/22710
  • [Core] Use individual MM items in P0/P1 cache and model runner by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22570
  • [Misc] clear and separate error messages for input too long and input + max-tokens too long by @ywang96 in https://github.com/vllm-project/vllm/pull/22803
  • [Bugfix] Fix MiniCPMV Image input inference failed by @jio-H in https://github.com/vllm-project/vllm/pull/22813
  • [CI/Build] Update VLM common tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22841
  • [CI] Fix tests/v1/e2e/test_kv_sharing_fast_prefill.py import on test by @NickLucche in https://github.com/vllm-project/vllm/pull/22815
  • [CI/Build] Fix param mismatch in test_eagle_correctness by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22847
  • [CI/Build] Skip gpt_big model test because of broken HF model by @Isotr0py in https://github.com/vllm-project/vllm/pull/22848
  • [ROCm][Bugfix] Fix compilation error in topk softmax fused kernel by @kliuae in https://github.com/vllm-project/vllm/pull/22819
  • Move checklist in PR template by @ProExpertProg in https://github.com/vllm-project/vllm/pull/22852
  • [Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP by @Jialin in https://github.com/vllm-project/vllm/pull/22437
  • [CI/Build] Increase pooling tolerance to pass CI by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22844
  • [CI][Entrypoints]: add filter to generation to filter out invalid tool calls by @wseaton in https://github.com/vllm-project/vllm/pull/22826
  • [CI] Fix tests/distributed/test_ca_buffer_sharing.py by @ilmarkov in https://github.com/vllm-project/vllm/pull/22849
  • [CI] remove flaky v0 test by @robertgshaw2-redhat in https://github.com/vllm-project/vllm/pull/22864
  • vLLM Benchmark suite improvement by @louie-tsai in https://github.com/vllm-project/vllm/pull/22119
  • [Bugfix] Fix PixtralHFImagePixelInputs dynamic shape check by @Isotr0py in https://github.com/vllm-project/vllm/pull/22827
  • [BugFix] Threadsafe close async zmq sockets by @njhill in https://github.com/vllm-project/vllm/pull/22877
  • Remove Phi 4 Flash configuration workaround by @hmellor in https://github.com/vllm-project/vllm/pull/22723
  • [Bugfix] Add reset prefix cache for online serving by @iAmir97 in https://github.com/vllm-project/vllm/pull/22726
  • [Doc] fix dead link by @dtrifiro in https://github.com/vllm-project/vllm/pull/22898
  • [CI] Re-enable transcriptions test_long_audio_request by @NickLucche in https://github.com/vllm-project/vllm/pull/22890
  • [Perf] Dont create unnecessary pooling params by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/22876
  • [Model] Modify the gate implementation of glm4_moe by @jeejeelee in https://github.com/vllm-project/vllm/pull/22832
  • [Bugfix] Replace custom Encoding class with BatchEncoding in MistralTokenizer by @ZJY0516 in https://github.com/vllm-project/vllm/pull/22786
  • [Bugfix] Fix parsing of --disable-mm-preprocessor-cache by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/22909
  • [CI] [Hybrid] Bump min transformers version for Bamba and Jamba by @tdoublep in https://github.com/vllm-project/vllm/pull/22908
  • [Kernel] [Quantization] Add MXFP4 and bias support for marlin kernel by @jinzhen-lin in https://github.com/vllm-project/vllm/pull/22428
  • docs: update fastsafetensors usage instructions by @NirLevy98 in https://github.com/vllm-project/vllm/pull/22891
  • [CI] Temporarily disable flaky test by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/22930
  • [Kernel] Add nvfp4 gemm flashinfer backends by @nvjullin in https://github.com/vllm-project/vllm/pull/22346
  • [Quantization]: Support compressed-tensors mixed-precision model loading by @dsikka in https://github.com/vllm-project/vllm/pull/22468
  • [Core] Return final response for aborted requests from AsyncLLM.generate by @njhill in https://github.com/vllm-project/vllm/pull/22283
  • [BugFix] Fix initial DP request load imbalance by @njhill in https://github.com/vllm-project/vllm/pull/22910
  • [Bugfix] use flash attn on sm90 by @zyongye in https://github.com/vllm-project/vllm/pull/22933
  • [Kernel] Add cuda kernel for gpt_oss activation by @jeejeelee in https://github.com/vllm-project/vllm/pull/22538
  • Revert "[Kernel] Add cuda kernel for gpt_oss activation" by @simon-mo in https://github.com/vllm-project/vllm/pull/22948
  • [BugFix][KVConn] Fix use of get_required_kvcache_layout by @njhill in https://github.com/vllm-project/vllm/pull/22734
  • [BugFix] Fix port lookup in internal DP LB tests by @njhill in https://github.com/vllm-project/vllm/pull/22252
  • [CI Perf] Prune tests in tests/kernels/quantization/ by @mgoin in https://github.com/vllm-project/vllm/pull/22942
  • [CI Perf] Prune tests in tests/kernels/moe/ by @mgoin in https://github.com/vllm-project/vllm/pull/22939
  • [CI Perf] Prune tests in tests/kernels/attention/ by @mgoin in https://github.com/vllm-project/vllm/pull/22936
  • refactor: Change scaling factors calculation for flashinfer FusedMoE by @amirkl94 in https://github.com/vllm-project/vllm/pull/22812
  • [Feature] Full Cuda Graph Support for Cutlass MLA and 6% E2E Throughput Improvement by @yewentao256 in https://github.com/vllm-project/vllm/pull/22763
  • [Mamba] - refactor: Renamed mamba_attn to mamba2_attn by @Josephasafg in https://github.com/vllm-project/vllm/pull/22818
  • Revert "[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module." by @tjtanaa in https://github.com/vllm-project/vllm/pull/22956
  • [P/D]Provide bucket algorithm rate limiter for proxy_server by @frankie-ys in https://github.com/vllm-project/vllm/pull/22643
  • [CI] Pooling models mteb test uses enforce_eager by @noooop in https://github.com/vllm-project/vllm/pull/22878
  • [V1] - Split Prefill and Decode for Mamba1 models by @amirai21 in https://github.com/vllm-project/vllm/pull/22653
  • [Bugfix] Unquote file uri before reading image by @sayandipdutta in https://github.com/vllm-project/vllm/pull/22912
  • [Bugfix] fix cuda 12.6 and 11.8 build by @jinzhen-lin in https://github.com/vllm-project/vllm/pull/22952
  • [MM] Allow skipping memory profiling for multimodal models. by @ywang96 in https://github.com/vllm-project/vllm/pull/22950
  • Improve multimodal hasher performance for re-used Image prompts by @p88h in https://github.com/vllm-project/vllm/pull/22825
  • [V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) by @tdoublep in https://github.com/vllm-project/vllm/pull/22928
  • [Misc] Ignore ep_kernels_workspace by @jeejeelee in https://github.com/vllm-project/vllm/pull/22807
  • [CI] Remove duplicated docs build from buildkite by @hmellor in https://github.com/vllm-project/vllm/pull/22924
  • [Frontend] Expose do_log_stats interval to env by @Csrayz in https://github.com/vllm-project/vllm/pull/22905
  • [Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer by @fhl2000 in https://github.com/vllm-project/vllm/pull/20059
  • [V0 Deprecation] Remove advance_step by @WoosukKwon in https://github.com/vllm-project/vllm/pull/22969
  • [BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 by @sstamenk in https://github.com/vllm-project/vllm/pull/22369
  • [FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches by @JartX in https://github.com/vllm-project/vllm/pull/22896
  • [Benchmarks] Include image data when ShareGPT4V dataset is used. by @huachenheli in https://github.com/vllm-project/vllm/pull/22955
  • [Structured Output] Make the output of structured output example more complete by @shen-shanshan in https://github.com/vllm-project/vllm/pull/22481
  • [Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. by @bnellnm in https://github.com/vllm-project/vllm/pull/22035
  • [Model] Granite-4 support loading quantized checkpoint by @cyang49 in https://github.com/vllm-project/vllm/pull/22925
  • [Log] Debug Once for Randomizing dummy data for DP Rank by @yewentao256 in https://github.com/vllm-project/vllm/pull/22860
  • [Core] direct indexing on self.block_table_np in compute_slot_mapping by @linzebing in https://github.com/vllm-project/vllm/pull/22940
  • [Bugfix] Added more env vars to hash by @nvjullin in https://github.com/vllm-project/vllm/pull/22449
  • Use regex in convert-results-json-to-markdown.py by @mgoin in https://github.com/vllm-project/vllm/pull/22989
  • [CI] Speed up Whisper tests by reusing server by @mgoin in https://github.com/vllm-project/vllm/pull/22859
  • [Fix] enable swap_ab for pplx problem size computation by @shixianc in https://github.com/vllm-project/vllm/pull/22991
  • Add PrefixRepetitionRandomDataset to vllm bench serve datasets by @eicherseiji in https://github.com/vllm-project/vllm/pull/20638
  • minor: zero workspace buffer init for flashinfer trtllm-gen attn by @yyihuang in https://github.com/vllm-project/vllm/pull/22603
  • [Attention] FA3 Attention Sinks Perf Boost by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/22478
  • [BugFix] Fix regression caused by mamba state dtype PR by @tdoublep in https://github.com/vllm-project/vllm/pull/22998
  • ci: Add CUDA + arm64 release builds by @seemethere in https://github.com/vllm-project/vllm/pull/21201
  • [Structured Outputs] [Bug] Fix misalignment in apply_grammar_bitmask causing unintended masking and NaN logits by @rishitdholakia13 in https://github.com/vllm-project/vllm/pull/22963
  • [BugFix] Handle case where async utility call is cancelled by @njhill in https://github.com/vllm-project/vllm/pull/22996
  • [v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728) by @orozery in https://github.com/vllm-project/vllm/pull/19728
  • Support multiple attention groups for KV sharing by @sarckk in https://github.com/vllm-project/vllm/pull/22672
  • [BugFix] Make run_once thread-safe by @oraluben in https://github.com/vllm-project/vllm/pull/22978
  • [Misc] Support passing multiple request ids at once to AsyncLLM.abort() by @njhill in https://github.com/vllm-project/vllm/pull/22944
  • [Kernel] Simplify get_kv_cache_layout and cache use_trtllm_attention env-dependent bit by @NickLucche in https://github.com/vllm-project/vllm/pull/22735
  • [Bugfix] Fix DeepSeek MTP by @benchislett in https://github.com/vllm-project/vllm/pull/22934
  • [Frontend] Avoid list copies in serving_chat.py by @njhill in https://github.com/vllm-project/vllm/pull/22947
  • [V1] support min_tokens for detokener by @calvin0327 in https://github.com/vllm-project/vllm/pull/22014
  • [misc] nsys profile output kernel classifier and visualizer by @gracehonv in https://github.com/vllm-project/vllm/pull/22971
  • [XPU]avoid circular import during XPU init by @jikunshang in https://github.com/vllm-project/vllm/pull/23017
  • [Build] Env var to disable sccache by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/22968
  • [BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors by @qthequartermasterman in https://github.com/vllm-project/vllm/pull/22962
  • [Misc] Add --save-dir option to benchmark_moe by @jeejeelee in https://github.com/vllm-project/vllm/pull/23020
  • [Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs by @Isotr0py in https://github.com/vllm-project/vllm/pull/22867
  • [Core] Make cudagraph check cuda platform only by @yaochengji in https://github.com/vllm-project/vllm/pull/23005
  • [CI][Bugfix] Skip Ovis2 generation test because of broken remote code by @Isotr0py in https://github.com/vllm-project/vllm/pull/22954
  • Add docs for PrefixRepetitionDataset + enable usage with vllm bench throughput by @eicherseiji in https://github.com/vllm-project/vllm/pull/23012
  • [Refactor] Allow optional MultiModalKwargsItem in IPC by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/23022
  • [New Model]mBART model by @princepride in https://github.com/vllm-project/vllm/pull/22883
  • Fix handling of max_num_batched_tokens for pooling tasks by @maxdebayser in https://github.com/vllm-project/vllm/pull/23004
  • [Frontend] Added support for HermesToolParser for models without special tokens by @minpeter in https://github.com/vllm-project/vllm/pull/16890
  • [Bugfix gpt-oss] Fix float32 convert for flashinfer sink support by @mgoin in https://github.com/vllm-project/vllm/pull/23016
  • [Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions by @mgoin in https://github.com/vllm-project/vllm/pull/23028
  • [Kernel/Quant] Remove AQLM by @mgoin in https://github.com/vllm-project/vllm/pull/22943
  • [V1] Logits processors extensibility by @afeldman-nm in https://github.com/vllm-project/vllm/pull/19912
  • [Bugfix] fix qwen3 moe fp8 accuracy issue by @jinzhen-lin in https://github.com/vllm-project/vllm/pull/23031
  • [UX] Separate marlin moe config logic from triton moe by @mgoin in https://github.com/vllm-project/vllm/pull/23006
  • [Refactor] Defer tensor data construction in MultiModalKwargs by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/23030
  • [Misc] method name typo fix by @andyxning in https://github.com/vllm-project/vllm/pull/23042
  • [Kernel] Add cuda kernel for gpt_oss activation by @jeejeelee in https://github.com/vllm-project/vllm/pull/22951
  • [Bugfix] should use stack instead of concat by @947132885 in https://github.com/vllm-project/vllm/pull/22972
  • [Misc] fix typo in the multimodal doc by @KevinZeng08 in https://github.com/vllm-project/vllm/pull/23051
  • [BugFix] Fix for IMA in FA3 varlen combine by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/22967
  • [Misc] Remove dead return by @WoosukKwon in https://github.com/vllm-project/vllm/pull/23061
  • [Misc] Convert use_structured_output property into constant by @WoosukKwon in https://github.com/vllm-project/vllm/pull/23060
  • [XPU] fix xpu to set cudagraph batch sizes by @calvin0327 in https://github.com/vllm-project/vllm/pull/23044
  • fix: gptq marlin weight loading failure by @simon-mo in https://github.com/vllm-project/vllm/pull/23066

New Contributors

  • @zhouwfang made their first contribution in https://github.com/vllm-project/vllm/pull/21407
  • @juncgu made their first contribution in https://github.com/vllm-project/vllm/pull/18293
  • @weireweire made their first contribution in https://github.com/vllm-project/vllm/pull/21485
  • @bbeckca made their first contribution in https://github.com/vllm-project/vllm/pull/21232
  • @wzqd made their first contribution in https://github.com/vllm-project/vllm/pull/21494
  • @hfan made their first contribution in https://github.com/vllm-project/vllm/pull/21479
  • @ignaciosica made their first contribution in https://github.com/vllm-project/vllm/pull/21195
  • @xyxinyang made their first contribution in https://github.com/vllm-project/vllm/pull/21586
  • @bigshanedogg made their first contribution in https://github.com/vllm-project/vllm/pull/20931
  • @fsx950223 made their first contribution in https://github.com/vllm-project/vllm/pull/20295
  • @mgazz made their first contribution in https://github.com/vllm-project/vllm/pull/21518
  • @Mitix-EPI made their first contribution in https://github.com/vllm-project/vllm/pull/21612
  • @lvhan028 made their first contribution in https://github.com/vllm-project/vllm/pull/21628
  • @zhouyeju made their first contribution in https://github.com/vllm-project/vllm/pull/21380
  • @wenchen76 made their first contribution in https://github.com/vllm-project/vllm/pull/21154
  • @skyloevil made their first contribution in https://github.com/vllm-project/vllm/pull/20529
  • @joa-stdn made their first contribution in https://github.com/vllm-project/vllm/pull/21697
  • @liuyumoye made their first contribution in https://github.com/vllm-project/vllm/pull/21534
  • @hsliuustc0106 made their first contribution in https://github.com/vllm-project/vllm/pull/21573
  • @Josephasafg made their first contribution in https://github.com/vllm-project/vllm/pull/21715
  • @vasqu made their first contribution in https://github.com/vllm-project/vllm/pull/21735
  • @key4ng made their first contribution in https://github.com/vllm-project/vllm/pull/19024
  • @wuhang2014 made their first contribution in https://github.com/vllm-project/vllm/pull/21728
  • @HugoMichard made their first contribution in https://github.com/vllm-project/vllm/pull/21167
  • @smarterclayton made their first contribution in https://github.com/vllm-project/vllm/pull/21472
  • @nikhil-arm made their first contribution in https://github.com/vllm-project/vllm/pull/17112
  • @LyrisZhong made their first contribution in https://github.com/vllm-project/vllm/pull/20396
  • @rzabarazesh made their first contribution in https://github.com/vllm-project/vllm/pull/21347
  • @milesial made their first contribution in https://github.com/vllm-project/vllm/pull/21798
  • @Csrayz made their first contribution in https://github.com/vllm-project/vllm/pull/21803
  • @MingzhenHan made their first contribution in https://github.com/vllm-project/vllm/pull/21827
  • @aladerran made their first contribution in https://github.com/vllm-project/vllm/pull/20815
  • @Yanpas made their first contribution in https://github.com/vllm-project/vllm/pull/18548
  • @tanruixiang made their first contribution in https://github.com/vllm-project/vllm/pull/21673
  • @nvpohanh made their first contribution in https://github.com/vllm-project/vllm/pull/21499
  • @chi2liu made their first contribution in https://github.com/vllm-project/vllm/pull/21816
  • @fake0fan made their first contribution in https://github.com/vllm-project/vllm/pull/21611
  • @wxsms made their first contribution in https://github.com/vllm-project/vllm/pull/20433
  • @wenxindongwork made their first contribution in https://github.com/vllm-project/vllm/pull/21417
  • @br4mm made their first contribution in https://github.com/vllm-project/vllm/pull/20272
  • @linzebing made their first contribution in https://github.com/vllm-project/vllm/pull/21627
  • @sanchit-gandhi made their first contribution in https://github.com/vllm-project/vllm/pull/21833
  • @amirkl94 made their first contribution in https://github.com/vllm-project/vllm/pull/21458
  • @zhxchen17 made their first contribution in https://github.com/vllm-project/vllm/pull/22028
  • @charent made their first contribution in https://github.com/vllm-project/vllm/pull/20873
  • @Aviadr-neureality made their first contribution in https://github.com/vllm-project/vllm/pull/21937
  • @n0gu-furiosa made their first contribution in https://github.com/vllm-project/vllm/pull/21052
  • @ahengljh made their first contribution in https://github.com/vllm-project/vllm/pull/22052
  • @sidhpurwala-huzaifa made their first contribution in https://github.com/vllm-project/vllm/pull/21119
  • @anijain2305 made their first contribution in https://github.com/vllm-project/vllm/pull/20836
  • @JartX made their first contribution in https://github.com/vllm-project/vllm/pull/21733
  • @xiszishu made their first contribution in https://github.com/vllm-project/vllm/pull/22122
  • @LopezCastroRoberto made their first contribution in https://github.com/vllm-project/vllm/pull/21309
  • @TankNee made their first contribution in https://github.com/vllm-project/vllm/pull/21213
  • @TheEpicDolphin made their first contribution in https://github.com/vllm-project/vllm/pull/20401
  • @chenxi-yang made their first contribution in https://github.com/vllm-project/vllm/pull/22105
  • @weixiao-huang made their first contribution in https://github.com/vllm-project/vllm/pull/21164
  • @CLFutureX made their first contribution in https://github.com/vllm-project/vllm/pull/21173
  • @tlipoca9 made their first contribution in https://github.com/vllm-project/vllm/pull/22149
  • @zyongye made their first contribution in https://github.com/vllm-project/vllm/pull/22330
  • @zhangnju made their first contribution in https://github.com/vllm-project/vllm/pull/22367
  • @tc-mb made their first contribution in https://github.com/vllm-project/vllm/pull/22166
  • @syedmba made their first contribution in https://github.com/vllm-project/vllm/pull/22314
  • @msanft made their first contribution in https://github.com/vllm-project/vllm/pull/22099
  • @mizadri made their first contribution in https://github.com/vllm-project/vllm/pull/20707
  • @JaceyShao made their first contribution in https://github.com/vllm-project/vllm/pull/22433
  • @andrewkchan made their first contribution in https://github.com/vllm-project/vllm/pull/12022
  • @iAmir97 made their first contribution in https://github.com/vllm-project/vllm/pull/22310
  • @pliops-daniels made their first contribution in https://github.com/vllm-project/vllm/pull/20267
  • @yyweiss made their first contribution in https://github.com/vllm-project/vllm/pull/18097
  • @Pradyun92 made their first contribution in https://github.com/vllm-project/vllm/pull/22317
  • @kyuyeunk made their first contribution in https://github.com/vllm-project/vllm/pull/22425
  • @lec77 made their first contribution in https://github.com/vllm-project/vllm/pull/22333
  • @h-brenoskuk made their first contribution in https://github.com/vllm-project/vllm/pull/22534
  • @zhewenl made their first contribution in https://github.com/vllm-project/vllm/pull/22584
  • @PicoCreator made their first contribution in https://github.com/vllm-project/vllm/pull/22592
  • @danielafrimi made their first contribution in https://github.com/vllm-project/vllm/pull/22349
  • @GuanLuo made their first contribution in https://github.com/vllm-project/vllm/pull/21074
  • @sooraj-satheesh made their first contribution in https://github.com/vllm-project/vllm/pull/22707
  • @dongluw made their first contribution in https://github.com/vllm-project/vllm/pull/22660
  • @Sugar-zsg made their first contribution in https://github.com/vllm-project/vllm/pull/22630
  • @phantomlei3 made their first contribution in https://github.com/vllm-project/vllm/pull/22170
  • @RishiAstra made their first contribution in https://github.com/vllm-project/vllm/pull/21783
  • @zejunchen-zejun made their first contribution in https://github.com/vllm-project/vllm/pull/21161
  • @teekenl made their first contribution in https://github.com/vllm-project/vllm/pull/22733
  • @mxz297 made their first contribution in https://github.com/vllm-project/vllm/pull/22683
  • @RUTHLESS-BOT made their first contribution in https://github.com/vllm-project/vllm/pull/22641
  • @frankwang28 made their first contribution in https://github.com/vllm-project/vllm/pull/22606
  • @zzh142857 made their first contribution in https://github.com/vllm-project/vllm/pull/22697
  • @ducviet00 made their first contribution in https://github.com/vllm-project/vllm/pull/22739
  • @x22x22 made their first contribution in https://github.com/vllm-project/vllm/pull/22280
  • @Gh0u1L5 made their first contribution in https://github.com/vllm-project/vllm/pull/22785
  • @jio-H made their first contribution in https://github.com/vllm-project/vllm/pull/22813
  • @ZJY0516 made their first contribution in https://github.com/vllm-project/vllm/pull/22786
  • @NirLevy98 made their first contribution in https://github.com/vllm-project/vllm/pull/22891
  • @nvjullin made their first contribution in https://github.com/vllm-project/vllm/pull/22346
  • @frankie-ys made their first contribution in https://github.com/vllm-project/vllm/pull/22643
  • @amirai21 made their first contribution in https://github.com/vllm-project/vllm/pull/22653
  • @sayandipdutta made their first contribution in https://github.com/vllm-project/vllm/pull/22912
  • @yyihuang made their first contribution in https://github.com/vllm-project/vllm/pull/22603
  • @rishitdholakia13 made their first contribution in https://github.com/vllm-project/vllm/pull/22963
  • @oraluben made their first contribution in https://github.com/vllm-project/vllm/pull/22978
  • @minpeter made their first contribution in https://github.com/vllm-project/vllm/pull/16890
  • @947132885 made their first contribution in https://github.com/vllm-project/vllm/pull/22972
  • @KevinZeng08 made their first contribution in https://github.com/vllm-project/vllm/pull/23051

Full Changelog: https://github.com/vllm-project/vllm/compare/v0.10.0...v0.10.1rc1