koboldcpp-1.107.2

Vulkan (Older PC) in the oldpc builds. This provides GPU support via Vulkan without any CPU intrinsics (no AVX2, no AVX). This replaces the removed CLBlast options.

Breaking Changes:
- Pipeline parallel is enabled by default now in CLI. Disable it in the launcher or with --nopipelineparallel
- Flash attention is enabled by default now in CLI. Disable it in the launcher or with --noflashattention
Added a few fixes for GLM 4.7 Flash. Note that this model is extremely sensitive to rep-pen, recommend disabling rep pen when using it. Make sure you use a fixed gguf model as some early quants were broken. It may be helpful to use the GLM4.5 NoThink template, or enable forced thinking if you desire it.
Fixes for mcp.json importing and MCP tool listing handshake (thanks @Rose22)
Changed MCP user agent string as some sites were blocking it.
Added the fractional scaling workaround fix for the GUI launcher for KDE on Wayland.
Added support for SDXS, a really fast Stable Diffusion Image Generation model. This model is so fast that it can generate images on pure CPU in under 10 seconds on a raspberry Pi. Running it on GPU allows generating images in under half a second. An excellent way to get image generation if you do not have a GPU. For convenience, a GGUF quant of SDXS is provided here.
Added support for ESRGAN 4x upscaler. Load this as an upscaler model to be able to upscale your generated images.
Merged Image Gen improvements and Flux Klein model support from upstream (thanks @wbruna). Get Flux Klein's image model, VAE and text encoder.
Added TAE SD support for Flux2, enable with --sdvaeauto.
Increase image generation hard total resolution limit from 1 megapixel to 1.6 megapixels.
Updated SDUI with some quality of life fixes by @Riztard
Updated Kobold Lite, multiple fixes and improvements
- Added even more themes from @Rose22
- Added experimental TTS chunked streaming mode (works for all TTS APIs)
- Added customizable sampler presets from @lubumbax
- Removed manual admin state caching panel since it's made obsolete by --smartcache. The API still exists but should be unnecessary.
Merged fixes, model support, and improvements from upstream, including Vulkan speedup from occam's coopmat1 optimization. Coopmat1 is used by GPU's with matrixcores such as the 7000 and 9000 series AMD GPU's.

Important Notice: The CLBlast backend is fully deprecated and has been REMOVED as of this version. If you require CLBlast, you will need to use an earlier version.

Hotfix 1.107.1 - SDUI improvements, Flux2 Image Editing support, MCP cert validation fixes, KDE scaling fix, Z-Image cfg clamp increased, reduce cuda graph spam, updated lite with minor refactors.

Hotfix 1.107.2 - This was grouped into a hotfix as 1.107.1 was unstable. Though this release is larger and out-of-band, you're encouraged to update to it from 1.107/1.107.1 for stability reasons. Barring unforeseen circumstances, the next major release will likely be delayed.

Scaling fixes for some linux desktops
Updated SDUI and sdcpp
Template parser fix from @Reithan
Added "error" as a possible stop reason (e.g. backend failed to generate).
Fixed SSE parsing in MCP
Added GLM4.7-NoThink adapter template
NEW: Reworked newbie help menu, added simple configs they can use
NEW: Added optional --downloaddir to specify where model downloads are stored for URL references.
Fixed GLM4 and GLM4.7 Flash coherency after shifting issues, ref ggml-org#19292

GitHub - LostRuins/koboldcpp: Run GGUF models easily with a KoboldAI UI. One File. Zero Install. - Featured Image

TLDR

KoboldCpp is an AI text-generation software for GGML and GGUF models, offering a single self-contained executable with no installation required. It supports CPU and GPU, various model formats, and includes features like image generation, speech-to-text, and text-to-speech. It also provides APIs for popular web services and a bundled KoboldAI Lite UI with editing tools and multiple modes. Ready-to-use binaries are available for Windows, MacOS, Linux, and it can run directly with Colab, Docker, or on cloud GPUs.