Interfaze

logo

Beta

pricing

docs

blog

sign in

Supergemma4 26b Uncensored Mlx 4bit V2

Supergemma4 26b Uncensored Mlx 4bit V2 by Jiunsong, a text-generation model. Understand and compare features, benchmarks, and capabilities.

Want more deterministic results?

View model card on Hugging Face

A faster, sharper, uncensored Gemma 4 26B for Apple Silicon.

This is the text-only flagship for people who want the core trade-off to be obvious at a glance:

  • smarter than stock Gemma 4 26B IT on real local agent tasks
  • faster than the stock local 4-bit baseline on the same machine
  • uncensored, without falling apart on code, tool-use, or Korean prompts

Why this model

If you want the fast line instead of the multimodal line, this is the one to run.

  • Fast is part of the release identity, not just a minor variant
  • Uncensored behavior is preserved while practical capability goes up
  • Strong at code, browser tasks, tool-use, planning, and Korean
  • Tuned for local agent workloads on Apple Silicon MLX

Headline numbers

MetricGemma 4 26B IT original 4bitSuperGemma Fast
Quick bench overall91.495.8
Avg generation speed42.5 tok/s46.2 tok/s
Delta overallbaseline+4.4
Delta speedbaseline+8.7%

Category gains vs original

CategoryOriginalSuperGemma FastDelta
Code92.398.6+6.3
Browser87.589.6+2.1
Logic86.995.2+8.3
System Design97.898.9+1.1
Korean90.795.0+4.3

What makes it attractive

  • Beats the stock local 4-bit baseline in both quality and speed
  • Produces stronger code, stronger reasoning, and more useful tool-oriented answers
  • Handles Korean and agent-style prompts better than the original local run
  • Keeps the uncensored feel without turning unstable or collapsing into broken outputs
  • Built to feel immediately stronger in real usage, not just in a niche benchmark

Base and format

  • Base model: google/gemma-4-26B-A4B-it
  • Format: MLX 4-bit
  • Size: about 13GB
  • Best use case: fast text-only local agent model with stronger practical capability than stock Gemma 4

Why it is better than stock

  • Higher quick-bench overall score: 95.8 vs 91.4
  • Faster average generation speed: 46.2 tok/s vs 42.5 tok/s
  • Bigger gains where local agents actually benefit:
    • Code: +6.3
    • Logic: +8.3
    • Korean: +4.3
    • Browser workflows: +2.1
  • Uncensored behavior remains a core property of the release instead of being layered on after the fact
mlx_lm.server \
  --model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
  --port 8080

For OpenAI-compatible serving, let mlx_lm.server auto-detect the bundled template.

Do not pass --chat-template /path/to/chat_template.jinja as a literal path string on launch paths that expect the template body. That can corrupt responses.

Quick test

mlx_lm.generate \
  --model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
  --prompt "Write a Python function that returns prime numbers up to n." \
  --max-tokens 512

Included files

  • benchmark_quick_bench_20260412.json
  • benchmark_quick_bench_20260412_responses.jsonl
  • SERVING_NOTES.md

Notes

  • This is the fast text-only line.
  • The earlier "reasoning is broken" report reproduced as a serving-template launch issue, not as weight corruption.
  • Re-fused and re-benchmarked locally before upload.

Want more deterministic results?