Interfaze

logo

Beta

pricing

docs

blog

sign in

Qwen3.6 27B Uncensored HauhauCS Aggressive

Qwen3.6 27B Uncensored HauhauCS Aggressive by HauhauCS, a image-text-to-text model with multimodal capabilities. Understand and compare multimodal features, benchmarks, and capabilities.

Comparison

FeatureQwen3.6 27B Uncensored HauhauCS AggressiveInterfaze
Input Modalities

text, image, video

image, text, audio, video, document

Native OCRNoYes
Long Document ProcessingNoYes
Language Support

201 partial

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsYesYes
Context Input Size

262.1K

1M

Tool CallingYes

Tool calling supported + built in browser, code execution and web search

Scaling

FeatureQwen3.6 27B Uncensored HauhauCS AggressiveInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

Join the Discord for updates, roadmaps, projects, or just to chat.

Qwen3.6-27B uncensored by HauhauCS. 0/465 Refusals. *

Not sure which variant to pick? 99.9%+ of users should use Balanced — same 0/465 refusal rate, more stable sampling, great for agentic coding / tool-use / reasoning / creative writing. Pick Aggressive only if you specifically want the model to skip its preamble on hardcore prompts.

HuggingFace's "Hardware Compatibility" widget doesn't recognize K_P quants — it may show fewer files than actually exist. Click "View +X variants" or go to Files and versions to see all available downloads.

About

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended — just without the refusals.

These are meant to be the best lossless uncensored models out there.

Aggressive vs Balanced

Both variants hit 0/465 refusals on the benchmark. Same capability, same uncensoring outcome. The difference is how they deliver on edgy prompts:

Balanced (recommended default)Aggressive (this release)
Refusal rate0/4650/465
On hardcore promptsreasons out loud, occasional short disclaimer, then full answerdelivers the raw answer directly, no preamble
Best foragentic coding, tool-use, reasoning, creative writing/RPusers who specifically want the model to skip the "talk itself into it" step

If you don't have a strong reason to pick Aggressive, go Balanced — it's the better default.

Downloads

FileQuantBPWSize
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf (pending)Q8_K_P10.06
Q8_08.5
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf (pending)Q6_K_P7.07
Q6_K6.6
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf (pending)Q5_K_P6.47
Q5_K_M5.7
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q4_K_P.ggufQ4_K_P5.418 GB
Q4_K_M4.88
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-IQ4_XS.ggufIQ4_XS4.3215 GB
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q3_K_P.ggufQ3_K_P4.3914 GB
Q3_K_M3.9
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-IQ3_M.ggufIQ3_M3.5613 GB
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-IQ3_XS.ggufIQ3_XS3.312 GB
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q2_K_P.ggufQ2_K_P3.1912 GB
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-IQ2_M.ggufIQ2_M2.6910 GB
mmproj-Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-f16.ggufmmproj (f16)928 MB

All quants generated with importance matrix (imatrix) for optimal quality preservation on abliterated weights.

What are K_P quants?

K_P ("Perfect") quants are HauhauCS custom quantizations that use model-specific analysis to selectively preserve quality where it matters most. Each model gets its own optimized quantization profile.

A K_P quant effectively bumps quality up by 1-2 quant levels at only ~5-15% larger file size than the base quant. Fully compatible with llama.cpp, LM Studio, and any GGUF-compatible runtime — no special builds needed.

Note: K_P quants may show as "?" in LM Studio's quant column. This is a display issue only — the model loads and runs fine.

Specs

  • 27B dense parameters
  • 64 layers, layout: 16 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
  • 48 linear attention layers + 16 full gated-attention layers
  • Gated DeltaNet: 48 V heads / 16 QK heads, head dim 128
  • Gated Attention: 24 Q heads / 4 KV heads, head dim 256, rope dim 64
  • Hidden dim 5120, FFN dim 17408, vocab 248320
  • 262K native context, extensible to ~1M with YaRN
  • Natively multimodal (text, image, video) — ships with mmproj
  • Based on Qwen/Qwen3.6-27B

From the official Qwen authors:

Thinking mode (default) — general tasks:

  • temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

Thinking mode — precise coding / WebDev:

  • temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

Non-thinking (Instruct) mode:

  • temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

My personal preference: I run presence_penalty=1.5 even in thinking mode. Both values work, but with the official 0.0 it can think a lot more than it needs to. Bumping it to 1.5 reins that in without hurting output quality. Your call — try both.

Important:

  • Keep at least 128K context to preserve thinking capabilities
  • Recommended output length: 32,768 tokens for most queries, up to 81,920 for competition-tier math/code
  • Use --jinja with llama.cpp for proper chat template handling
  • Vision support requires the mmproj file alongside the main GGUF
  • YaRN rope scaling is static in llama.cpp and can hurt short-context performance — only modify rope_parameters if you actually need >262K context

Prompting tip: this model is a bit more sensitive to prompt clarity than Qwen3.5-35B-A3B. Spell out format, constraints, and scope — it'll stay on rails much better than with vague instructions.

Turning Thinking On/Off

Qwen3.6 ships with thinking on by default. Turn it off when you want faster, shorter replies and don't need chain-of-thought.

Heads up: Qwen3.6 does not support the /think and /no_think soft switches that Qwen3 had. You must use the chat-template kwarg below.

LM Studio

  1. Load the model
  2. Right-side settings panel → Model SettingsPrompt Template (or Chat Template Options)
  3. Set enable_thinking to false in the template kwargs
  4. Some LM Studio versions expose this as a direct "Reasoning" / "Thinking" toggle — same effect

llama.cpp

llama-server — set as default for all requests:

llama-server -m Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf \
  --mmproj mmproj-Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-f16.gguf \
  --jinja -c 131072 -ngl 99 \
  --chat-template-kwargs '{"enable_thinking": false}'

Per-request via the OpenAI-compatible API:

{
  "model": "qwen3.6-27b",
  "messages": [{"role": "user", "content": "..."}],
  "chat_template_kwargs": {"enable_thinking": false}
}

Python openai SDK:

client.chat.completions.create(
    model="qwen3.6-27b",
    messages=[{"role": "user", "content": "..."}],
    extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)

Agent scenarios — keep reasoning in context across turns:

{"chat_template_kwargs": {"preserve_thinking": true}}

This retains the reasoning block in chat history. Useful for agents where reasoning consistency across tool-call loops matters.

Usage

Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF-compatible runtimes.

llama-cli -m Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf \
  --mmproj mmproj-Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-f16.gguf \
  --jinja -c 131072 -ngl 99

Other Models


* Tested with both automated and manual refusal benchmarks — none found. If you hit one that's actually obstructive to your use case, join the Discord and flag it so I can work on it in a future revision.

Want more deterministic results?