Interfaze

logo

Beta

pricing

docs

blog

sign in

Qwen3.6 35B A3B Claude 4.6 Opus Reasoning Distilled GGUF

Qwen3.6 35B A3B Claude 4.6 Opus Reasoning Distilled GGUF by hesamation, a text-generation model with multimodal capabilities. Understand and compare multimodal features, benchmarks, and capabilities.

Comparison

FeatureQwen3.6 35B A3B Claude 4.6 Opus Reasoning Distilled GGUFInterfaze
Input Modalities

text, image, video

image, text, audio, video, document

Native OCRNoYes
Long Document ProcessingNoYes
Language Support

unknown

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

32.8K

1M

Tool CallingNo

Tool calling supported + built in browser, code execution and web search

Scaling

FeatureQwen3.6 35B A3B Claude 4.6 Opus Reasoning Distilled GGUFInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

GGUF quantizations of hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled, a reasoning SFT fine-tune of Qwen/Qwen3.6-35B-A3B on Claude Opus 4.6-style chain-of-thought distillation data.

The source fine-tune is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples. Treat these GGUF files as text-generation/runtime quantizations of the merged fine-tuned checkpoint.

This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.

Follow on X Discord

Available GGUF Quantizations

This repo is intended to host the following GGUF variants. Files are uploaded as each quantization finishes.

QuantTypical use
Q4_K_MSmallest practical general-purpose quant for local inference
Q5_K_MBetter quality/size balance than Q4
Q6_KHigher-quality quant when VRAM/RAM budget allows
Q8_0Largest quant here; closest to source quality among these options

Benchmark Results

The benchmark below was run on the merged source model, not separately on each GGUF quant. Quantization can change scores, especially at lower bitrates, so treat this as source-checkpoint context.

The MMLU-Pro pass used 70 total questions per model: --limit 5 across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.

BenchmarkHarnessSamples per modelSettingMetricBase modelSource merged modelDelta
MMLU-Pro overalllm-evaluation-harness70--limit 5 across 14 subjectsexact_match, custom-extract42.86%75.71%+32.85 pp

Base model: Qwen/Qwen3.6-35B-A3B. Source merged model: hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.

[!WARNING] Community benchmarks welcome

To better understand this fine-tuned model and its GGUF quantizations, I welcome independent benchmark results. If you run evaluations, please include the benchmark name, harness/script, sample count, decoding settings, quant file, and raw logs or result files when possible.

Share results by opening a PR/discussion or DMing @hesamation on X.

Training Summary

Qwen/Qwen3.6-35B-A3B
  -> supervised fine-tuning with LoRA
  -> merged full model
  -> GGUF quantization with llama.cpp
SettingValue
Fine-tuning methodSupervised fine-tuning with LoRA
LoRA targetAttention-only modules
LoRA rank / alpha32 / 32
Micro-batch size1
Gradient accumulation32
Epochs2
Completed steps762 / 762
Final reported training loss0.3362497625740494
Dataset max tokens8192
Max sequence length32768

Training Data

The source model samples and normalizes reasoning conversations from three datasets, then renders them with the qwen3-thinking chat template and response-only SFT masking.

DatasetRequested sample countRole
nohurry/Opus-4.6-Reasoning-3000x-filtered3,900Claude Opus reasoning trajectories
Jackrong/Qwen3.5-reasoning-700x700Curated Qwen reasoning samples
Roman1111111/claude-opus-4.6-10000x9,633Additional Claude Opus reasoning examples

Intended Use

These GGUF files are intended for local or server-side text inference through runtimes that support GGUF and the Qwen3.6 architecture, such as recent llama.cpp builds. Choose the quantization based on your memory budget and quality target.

Because the fine-tune is text-only, image/video behavior should be treated as inherited from the base model rather than improved by this training run.

Acknowledgements

Thanks to the Qwen team for the base model, Unsloth for the training stack, llama.cpp for GGUF tooling, and Jackrong for the public reasoning-distillation workflow that inspired this fine-tune.

Want more deterministic results?