Interfaze

logo

Beta

pricing

help

docs

blog

sign in

Qwable V1

Qwable V1 by lordx64, a text-generation model with multimodal capabilities. Understand and compare multimodal features, benchmarks, and capabilities.

Comparison

FeatureQwable V1Interfaze
Input Modalities

text, image

image, text, audio, video, document

Native OCRNoYes
Long Document ProcessingNoYes
Language Support

unknown

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

65.5K

1M

Tool CallingYes

Tool calling supported + built in browser, code execution and web search

Scaling

FeatureQwable V1Interfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

Qwen + Fable · An open-weights agentic coding model. 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.

Base model Dataset License

TL;DR

Qwable-v1 is a chained distill: vanilla Qwen3.6-35B-A3B → SFT on Claude Opus 4.7 reasoning traces → SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:

  • Thinks in explicit <think>…</think> chains-of-thought (inherited from the Opus 4.7 prior)
  • Acts like a Claude-Code-style agent when prompted as one — emits <tool_use> XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format is system-prompt-conditional: it appears when you give the model an agent-style system prompt or supply a preceding <tool_result> turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. See Usage for the recipe.
  • Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization

Versioning — this is v1, more iterations planned

This is the first iteration. We intend to keep updating the model as additional cleartext Fable-5 traces become publicly available — each new corpus that materializes will feed a Qwable-v2, Qwable-v3, etc., with the chained provenance documented at every step.

Realistic caveat: Anthropic suspended Claude Fable-5 globally on 2026-06-22 under U.S. export-control directives, and the API redacted thinking blocks for the entire preview window. The known cleartext source (Glint-Research/Fable-5-traces) is a frozen historical corpus — no upstream growth path is guaranteed. If new traces surface (community uploads, security-partner releases, or a future Fable un-suspension), we'll incorporate them. If they don't, v1 stays the latest.

In either case, follow this model repo for updates, or check the source repo for v2+ training runs.

Honest scope

This model is not a pure single-teacher distillation. It's a chained warm-start:

Qwen3.6-35B-A3B (vanilla, Apache 2.0) └─SFT─▶ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled └─SFT─▶ Qwable-v1 ← you are here

The Fable-5 SFT data is narrowly distributed (one developer's week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:

  • For pure reasoning (math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what's doing the work. Qwable-v1 won't beat it on those benchmarks; it'll match.
  • For agentic coding (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the <tool_use> XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7's reasoning. This is where Qwable outperforms a vanilla Qwen3.6.
  • For chat / general assistant: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).

Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted <tool_use> XML; multi-turn conversations with a prior <tool_result> continue in XML. See Limitations for the format details.

What's in the box

  • 26 model-0000{1..26}-of-00026.safetensors shards — merged bf16 weights (~70 GB total)
  • tokenizer.json, chat_template.jinja, config.json — Qwen3.6 chat template, unchanged from the base
  • Adapter-only variant published at lordx64/Qwable-v1-adapter for composability with the Opus 4.7 base (~50-100 MB)

GGUF quants at lordx64/Qwable-v1-GGUF:

  • IQ4_XS (~18 GB) — runs on 24 GB consumer GPUs (3090, 4090), LM Studio default
  • Q5_K_M (~25 GB) — better quality, fits 32-48 GB workstations
  • Q8_0 (~37 GB) — near-lossless, for reproducibility checks

Training recipe

SettingValue
Base (warm-start)lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
SFT datasetlordx64/agentic-distill-fable-5-sft (4,659 rows, ~12.2M Qwen tokens, single text column in Qwen chat template)
LibraryUnsloth FastLanguageModel + TRL SFTTrainer
LoRAr=16, alpha=16, attention-only (q_proj, k_proj, v_proj, o_proj), dropout 0.0
Loss maskingtrain_on_responses_only (gradients only flow through assistant turns, including <think> block)
Sequence length4096 tokens
Epochs2
Effective batch size16 (per-device 1 × grad-accum 16)
OptimizerAdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01
Learning rate2e-5
Precisionbf16 forward + LoRA params
Random seed3407
Hardware1× nvidia-h200 x1 (141 GB) on AWS ap-northeast-2 via HF Inference Endpoints
Total optimizer steps582 (4,648 examples × 2 epochs ÷ effective batch 16; 11 of 4,659 dropped during prep for label-all-masked rows)
Wall-clock14.1h actual (vs ~7-8h projected — see note below)
Cost~$70 at $5/hr
Final loss0.804 at the last step; 0.7956 averaged over the final 20 steps
Final savemerged_16bit via Unsloth

The training script is training/train.py in the source repo; the submitter is training/endpoint/deploy_fable.py. Both are reused (with track-specific config) from the original Opus 4.7 / Kimi K2.6 distill pipelines.

Training notes — slower than projected

The run took ~14h instead of the projected ~7-8h. Root cause: the HF Inference Endpoint container's flash-linear-attention + causal-conv1d builds did not compile against the runtime CUDA toolkit, so Qwen3.6's GatedDeltaNet layers fell back to a PyTorch reference implementation (the startup log noted The fast path is not available because one of the required library is not installed. Falling back to torch implementation.). The fallback path is mathematically identical — loss / convergence are unaffected — but ~2-3× slower for those layers. Step rate at full context worked out to ~83s/step instead of the ~36s/step the smoke implied.

This is a known toolkit-chain issue (Hopper SM_90 + CUDA 12.6 + Triton 3.3.1). The fix would be pre-baking compatible fla / causal-conv1d / triton wheels into training/endpoint/requirements.txt. We left it for v2 — the slowdown is honest, the model is the same, the cost (~$70) is still very reasonable for a 35B distill at H200 rates.

Dataset provenance

The SFT dataset (lordx64/agentic-distill-fable-5-sft) is a reformatted derivative of Glint-Research/Fable-5-traces. Provenance chain:

TeichAI ────── collected 953 raw Claude Code session traces against Anthropic's Claude Fable-5 preview API │ (between ~2026-06-10 and 2026-06-22, before Anthropic suspended Fable-5 globally │ under U.S. export-control directives) ▼ Glint-Research ────── extracted chain-of-thought reasoning into a per-turn `cot` field │ (added post-hoc; the underlying Anthropic API redacted cleartext │ thinking blocks via signature-only delivery on Fable-5 preview) ▼ lordx64/agentic- ────── reformatted into Qwen chat template, `<tool_use>` / `<tool_result>` XML distill-fable-5-sft serialized inline, deduplicated by SHA-256 of user-content, secrets scrubbed │ (204 active Groq API keys redacted from upstream's session JSONLs). ▼ Qwable-v1 ────── SFT'd over the Opus 4.7 distill (this model)

Composition: 4,659 rows, ~12.2M Qwen tokens.

  • 3,793 rows (81%) end in a tool call (Read / Write / Edit / Bash / PowerShell / WebFetch / MCP Claude_Preview tools)
  • 866 rows (19%) end in a pure text response

Content domain: web/game development, Three.js scenes, multiplayer FPS prototype, fluid simulation, Express server work, and transformer training scripts. Narrow — this is essentially one developer's Claude Code history, plus a Boeing 747 trace, plus assorted preview-tool sessions.

Evaluation

🚧 Evals are in progress. This table will fill in as each suite completes; nothing here is published until verified.

BenchmarkSetupTestsScoreStatus
GSM8K-CoT8-shot, multi-turn, limit 300Grade-school math; verify reasoning prior preserved through the second SFT roundpending🚧 in progress
MMLU-Pro5-shot, multi-turn, limit 500Hard multi-subject knowledge reasoningpending🚧 in progress
MMLU-Pro (per-subject)Same as aboveBiology / Math / Psychology / etc. breakdownpending🚧 in progress
GPQA Diamond0-shot CoTGraduate-level STEMpending🚧 in progress
MATH-5000-shot, math_verify metricCompetition math; tests reasoning depthpending🚧 in progress
AIME 2024 / 20250-shot CoTOlympiad-level math; sensitivity to answer-extractionpending🚧 in progress
HumanEval / MBPPpass@1 / pass@10Pure code completion (non-agentic baseline)pending🚧 in progress
IFEval0-shotInstruction-following adherencepending🚧 in progress
SWE-bench Lite (or BCB-Hard)with agent harness + tool registryThe key test: agentic coding ability vs Opus 4.7 basepending🚧 in progress
qwen3-6-distill-eval Space17 head-to-head prompts (12 design + 5 agentic)Side-by-side qualitative comparison vs Qwen3.6 base + Opus 4.7 + Kimi K2.6 distills, with human-readable HTML outputpending🚧 in progress

Methodology used (same as the Opus 4.7 / Kimi K2.6 evals on this project):

  • vLLM serving at 64k context so reasoning chains never truncate before answering
  • <think>…</think> stripped before regex extractors run (otherwise extractors grab letters/numbers from inside the reasoning, not the final answer)
  • Per-task num_fewshot (lm-eval's single global value can't handle GSM8K-8shot + GPQA-0shot together)
  • fewshot_as_multiturn=True for chat-template fidelity
  • math_verify metric for MATH-500 and AIME (catches semantic equivalence; raw strict-match against \boxed{N} returns 0% even on correct answers because the model says **Answer: N**)

Standing rule on this project: numbers stay blank until verified. If a benchmark hits a known extraction bug we couldn't cleanly fix, the row says so and we omit the score rather than publish a misleading one.

Usage

Transformers (full bf16, ~70 GB)

Important: Qwable-v1 emits <tool_use> XML reliably only when prompted as an agent. Use a system prompt that explicitly requests the XML format (see below).

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v1")
model = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwable-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

SYSTEM = (
    "You are a coding agent. When you need to read, write, edit, or run code, "
    "emit XML tool calls in this exact format:\n"
    '<tool_use name="X" id="toolu_01abc">\n{"...": "..."}\n</tool_use>\n'
    "Do NOT respond with markdown code blocks. Always use <tool_use> XML."
)
messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                  return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))

Output starts with <think>…</think> followed by a <tool_use name="…" id="…">{json}</tool_use> block. Without the system prompt, Qwable-v1 falls back to the Opus 4.7 reasoning prior (markdown code blocks) — usable but not agentic.

For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic "You are a helpful AI assistant." — the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.

vLLM serving

vllm serve lordx64/Qwable-v1 \
    --max-model-len 16384 \
    --tensor-parallel-size 2 \
    --trust-remote-code

llama.cpp / LM Studio (GGUF)

llama-cli -m Qwable-v1-IQ4_XS.gguf -p "Read /tmp/server.py and find the port..."

Adapter-only (compose on top of the Opus 4.7 distill)

If you already have the Opus 4.7 distill loaded:

from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled",
    torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter")

Tool-use format

The Fable-5 SFT data uses a custom XML envelope for tool calls, not Qwen's native <tool_call> token format. Properly-elicited outputs look like:

<think> The user wants me to change the port from 8000 to 8080. I should Read the file first to see the current configuration, then Edit it. </think> <tool_use name="Read" id="toolu_01ABC..."> { "file_path": "/tmp/server.py" } </tool_use>

Tool results come back as:

<tool_result id="toolu_01ABC..." is_error="false"> {file contents} </tool_result>

Eliciting the format reliably

Two paths produce the XML format consistently:

1. Agent system prompt — the simplest, works in one-shot:

system: You are a coding agent. When you need to read, write, edit, or run code, emit XML tool calls in this exact format: <tool_use name="X" id="toolu_01abc"> {"...": "..."} </tool_use> Do NOT respond with markdown code blocks. Always use <tool_use> XML.

2. Multi-turn conversation — supply a prior <tool_result> and the model continues in XML for the rest of the conversation, no system prompt needed.

Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The format is learned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic.

Tool names are not bound to the Claude Code inventory

The training data uses Claude Code's tool names (Read, Edit, Bash, WebFetch, mcp__*, etc.). The merged model emits sensible-but-invented names like read_file, Replace, write_file instead. The XML envelope transferred; the vocabulary didn't bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but anything that routes calls by exact tool name needs a normalizer (e.g. read_fileRead).

Native Qwen tool calling

This format is chat-template-agnostic and parses with a small regex. Downstream consumers wanting native Qwen <tool_call> JSON calling will need either (a) a wrapper that converts the XML to <tool_call> JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.

Limitations

  • Tool-use format is system-prompt-conditional. With a generic prompt ("Fix this bug for me"), Qwable-v1 falls back to the Opus 4.7 prior — explains the fix in markdown code blocks instead of emitting <tool_use> XML. With either (a) an explicit system prompt asking for tool calls in <tool_use name="X" id="Y">…</tool_use> format, or (b) a preceding <tool_result>…</tool_result> turn in the conversation, the format works correctly. Treat Qwable-v1 like Claude Code: always run it inside a harness that supplies a tool-use system prompt + tool registry.
  • Tool names don't bind to the original Claude Code inventory. The model emits XML with sensible-but-invented tool names like read_file, Replace, etc., rather than the exact Claude Code tool names (Read, Edit, etc.) from the training data. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but auto-routing tool calls to a fixed schema will need a tool-name normalizer.
  • Narrow training distribution. ~5k rows from one developer's Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows that weren't in the training data) will be hit-or-miss.
  • Custom tool envelope. <tool_use> XML doesn't slot into vLLM's tool-calling API automatically. Need a parser wrapper to convert to <tool_call> JSON if you want vLLM's native tool-call detection.
  • Persona drift. Two SFT rounds against Anthropic-style outputs may produce a model that occasionally refuses things Qwen wouldn't refuse, or that self-identifies as Claude in chat. Mild on Opus 4.7 alone; unknown additive effect from Fable-5.
  • Reasoning is from Opus 4.7, not Fable-5. Don't expect Qwable-v1 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks (math, science, GPQA). It should match. The new capability axis is agentic tool-use, not better reasoning.
  • No formal evals at v1 ship time. Pending.

License & terms

This model is released under AGPL-3.0, inherited from the upstream Glint-Research/Fable-5-traces dataset license. Downstream users running Qwable-v1 in a network-accessible service must comply with AGPL §13 (source disclosure for network use).

The underlying Fable-5 thinking traces are derivative content from Anthropic's claude-fable-5 preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance with Anthropic's usage policies for their specific use case before fine-tuning further or building commercial products on this model.

The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v1's AGPL designation supersedes those due to the Fable-5 data's AGPL upstream.

Citation

@misc{lordx64_qwable_v1_2026,
  title  = {Qwable-v1: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B},
  author = {lordx64},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/lordx64/Qwable-v1}},
}

Acknowledgements

  • Glint-Research for collecting and re-publishing the Fable-5 trace corpus with cleartext CoT — the only viable source after Anthropic's API-side redaction policy.
  • TeichAI for the upstream 953-trace collection that Glint-Research built on.
  • Anthropic for the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
  • Qwen team for releasing Qwen3.6-35B-A3B under Apache 2.0.
  • Unsloth for 2× faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR #601.
  • HuggingFace for the Inference Endpoint H200 fleet (Seoul ap-northeast-2) where the training actually ran.

Want more deterministic results?