Qwable V1

Qwable V1 by lordx64, a text-generation model with multimodal capabilities. Understand and compare multimodal features, benchmarks, and capabilities.

Comparison

Feature	Qwable V1	Interfaze
Input Modalities	text, image	image, text, audio, video, document
Native OCR	No	Yes
Long Document Processing	No	Yes
Language Support	unknown	162+
Native Speech-to-Text	No	Yes
Native Object Detection	No	Yes
Guardrail Controls	No	Yes
Context Input Size	65.5K	1M
Tool Calling	Yes	Tool calling supported + built in browser, code execution and web search

Scaling

Feature	Qwable V1	Interfaze
Scaling	Self-hosted/Provider-hosted with quantization	Unlimited

View model card on Hugging Face

Qwen + Fable · An open-weights agentic coding model. 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.

TL;DR

Qwable-v1 is a chained distill: vanilla Qwen3.6-35B-A3B → SFT on Claude Opus 4.7 reasoning traces → SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:

Thinks in explicit <think>…</think> chains-of-thought (inherited from the Opus 4.7 prior)
Acts like a Claude-Code-style agent when prompted as one — emits <tool_use> XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format is system-prompt-conditional: it appears when you give the model an agent-style system prompt or supply a preceding <tool_result> turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. See Usage for the recipe.
Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization

Versioning — this is v1, more iterations planned

This is the first iteration. We intend to keep updating the model as additional cleartext Fable-5 traces become publicly available — each new corpus that materializes will feed a Qwable-v2, Qwable-v3, etc., with the chained provenance documented at every step.

Realistic caveat: Anthropic suspended Claude Fable-5 globally on 2026-06-22 under U.S. export-control directives, and the API redacted thinking blocks for the entire preview window. The known cleartext source (Glint-Research/Fable-5-traces) is a frozen historical corpus — no upstream growth path is guaranteed. If new traces surface (community uploads, security-partner releases, or a future Fable un-suspension), we'll incorporate them. If they don't, v1 stays the latest.

In either case, follow this model repo for updates, or check the source repo for v2+ training runs.

Honest scope

This model is not a pure single-teacher distillation. It's a chained warm-start:

Qwen3.6-35B-A3B (vanilla, Apache 2.0)
  └─SFT─▶ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
           └─SFT─▶ Qwable-v1  ← you are here

The Fable-5 SFT data is narrowly distributed (one developer's week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:

For pure reasoning (math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what's doing the work. Qwable-v1 won't beat it on those benchmarks; it'll match.
For agentic coding (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the <tool_use> XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7's reasoning. This is where Qwable outperforms a vanilla Qwen3.6.
For chat / general assistant: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).

Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted <tool_use> XML; multi-turn conversations with a prior <tool_result> continue in XML. See Limitations for the format details.

What's in the box

26 model-0000{1..26}-of-00026.safetensors shards — merged bf16 weights (~70 GB total)
tokenizer.json, chat_template.jinja, config.json — Qwen3.6 chat template, unchanged from the base
Adapter-only variant published at lordx64/Qwable-v1-adapter for composability with the Opus 4.7 base (~50-100 MB)

GGUF quants at lordx64/Qwable-v1-GGUF:

IQ4_XS (~18 GB) — runs on 24 GB consumer GPUs (3090, 4090), LM Studio default
Q5_K_M (~25 GB) — better quality, fits 32-48 GB workstations
Q8_0 (~37 GB) — near-lossless, for reproducibility checks

Training recipe

Setting	Value
Base (warm-start)	`lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled`
SFT dataset	`lordx64/agentic-distill-fable-5-sft` (4,659 rows, ~12.2M Qwen tokens, single `text` column in Qwen chat template)
Library	Unsloth `FastLanguageModel` + TRL `SFTTrainer`
LoRA	r=16, alpha=16, attention-only (`q_proj, k_proj, v_proj, o_proj`), dropout 0.0
Loss masking	`train_on_responses_only` (gradients only flow through assistant turns, including `<think>` block)
Sequence length	4096 tokens
Epochs	2
Effective batch size	16 (per-device 1 × grad-accum 16)
Optimizer	AdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01
Learning rate	2e-5
Precision	bf16 forward + LoRA params
Random seed	3407
Hardware	1× nvidia-h200 x1 (141 GB) on AWS ap-northeast-2 via HF Inference Endpoints
Total optimizer steps	582 (4,648 examples × 2 epochs ÷ effective batch 16; 11 of 4,659 dropped during prep for label-all-masked rows)
Wall-clock	14.1h actual (vs ~7-8h projected — see note below)
Cost	~$70 at $5/hr
Final loss	0.804 at the last step; 0.7956 averaged over the final 20 steps
Final save	`merged_16bit` via Unsloth

The training script is training/train.py in the source repo; the submitter is training/endpoint/deploy_fable.py. Both are reused (with track-specific config) from the original Opus 4.7 / Kimi K2.6 distill pipelines.

Training notes — slower than projected

The run took ~14h instead of the projected ~7-8h. Root cause: the HF Inference Endpoint container's flash-linear-attention + causal-conv1d builds did not compile against the runtime CUDA toolkit, so Qwen3.6's GatedDeltaNet layers fell back to a PyTorch reference implementation (the startup log noted The fast path is not available because one of the required library is not installed. Falling back to torch implementation.). The fallback path is mathematically identical — loss / convergence are unaffected — but ~2-3× slower for those layers. Step rate at full context worked out to ~83s/step instead of the ~36s/step the smoke implied.

This is a known toolkit-chain issue (Hopper SM_90 + CUDA 12.6 + Triton 3.3.1). The fix would be pre-baking compatible fla / causal-conv1d / triton wheels into training/endpoint/requirements.txt. We left it for v2 — the slowdown is honest, the model is the same, the cost (~$70) is still very reasonable for a 35B distill at H200 rates.

Dataset provenance

The SFT dataset (lordx64/agentic-distill-fable-5-sft) is a reformatted derivative of Glint-Research/Fable-5-traces. Provenance chain:

TeichAI            ────── collected 953 raw Claude Code session traces against Anthropic's Claude Fable-5 preview API
   │                       (between ~2026-06-10 and 2026-06-22, before Anthropic suspended Fable-5 globally
   │                        under U.S. export-control directives)
   ▼
Glint-Research     ────── extracted chain-of-thought reasoning into a per-turn `cot` field
   │                       (added post-hoc; the underlying Anthropic API redacted cleartext
   │                        thinking blocks via signature-only delivery on Fable-5 preview)
   ▼
lordx64/agentic-   ────── reformatted into Qwen chat template, `<tool_use>` / `<tool_result>` XML
distill-fable-5-sft        serialized inline, deduplicated by SHA-256 of user-content, secrets scrubbed
   │                       (204 active Groq API keys redacted from upstream's session JSONLs).
   ▼
Qwable-v1          ────── SFT'd over the Opus 4.7 distill (this model)

Composition: 4,659 rows, ~12.2M Qwen tokens.

3,793 rows (81%) end in a tool call (Read / Write / Edit / Bash / PowerShell / WebFetch / MCP Claude_Preview tools)
866 rows (19%) end in a pure text response

Content domain: web/game development, Three.js scenes, multiplayer FPS prototype, fluid simulation, Express server work, and transformer training scripts. Narrow — this is essentially one developer's Claude Code history, plus a Boeing 747 trace, plus assorted preview-tool sessions.

Evaluation

🚧 Evals are in progress. This table will fill in as each suite completes; nothing here is published until verified.

Benchmark	Setup	Tests	Score	Status
GSM8K-CoT	8-shot, multi-turn, limit 300	Grade-school math; verify reasoning prior preserved through the second SFT round	pending	🚧 in progress
MMLU-Pro	5-shot, multi-turn, limit 500	Hard multi-subject knowledge reasoning	pending	🚧 in progress
MMLU-Pro (per-subject)	Same as above	Biology / Math / Psychology / etc. breakdown	pending	🚧 in progress
GPQA Diamond	0-shot CoT	Graduate-level STEM	pending	🚧 in progress
MATH-500	0-shot, `math_verify` metric	Competition math; tests reasoning depth	pending	🚧 in progress
AIME 2024 / 2025	0-shot CoT	Olympiad-level math; sensitivity to answer-extraction	pending	🚧 in progress
HumanEval / MBPP	pass@1 / pass@10	Pure code completion (non-agentic baseline)	pending	🚧 in progress
IFEval	0-shot	Instruction-following adherence	pending	🚧 in progress
SWE-bench Lite (or BCB-Hard)	with agent harness + tool registry	The key test: agentic coding ability vs Opus 4.7 base	pending	🚧 in progress
`qwen3-6-distill-eval` Space	17 head-to-head prompts (12 design + 5 agentic)	Side-by-side qualitative comparison vs Qwen3.6 base + Opus 4.7 + Kimi K2.6 distills, with human-readable HTML output	pending	🚧 in progress

Methodology used (same as the Opus 4.7 / Kimi K2.6 evals on this project):

vLLM serving at 64k context so reasoning chains never truncate before answering
<think>…</think> stripped before regex extractors run (otherwise extractors grab letters/numbers from inside the reasoning, not the final answer)
Per-task num_fewshot (lm-eval's single global value can't handle GSM8K-8shot + GPQA-0shot together)
fewshot_as_multiturn=True for chat-template fidelity
math_verify metric for MATH-500 and AIME (catches semantic equivalence; raw strict-match against \boxed{N} returns 0% even on correct answers because the model says **Answer: N**)

Standing rule on this project: numbers stay blank until verified. If a benchmark hits a known extraction bug we couldn't cleanly fix, the row says so and we omit the score rather than publish a misleading one.

Usage

Transformers (full bf16, ~70 GB)

Important: Qwable-v1 emits <tool_use> XML reliably only when prompted as an agent. Use a system prompt that explicitly requests the XML format (see below).

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v1")
model = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwable-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

SYSTEM = (
    "You are a coding agent. When you need to read, write, edit, or run code, "
    "emit XML tool calls in this exact format:\n"
    '<tool_use name="X" id="toolu_01abc">\n{"...": "..."}\n</tool_use>\n'
    "Do NOT respond with markdown code blocks. Always use <tool_use> XML."
)
messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                  return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))

Output starts with <think>…</think> followed by a <tool_use name="…" id="…">{json}</tool_use> block. Without the system prompt, Qwable-v1 falls back to the Opus 4.7 reasoning prior (markdown code blocks) — usable but not agentic.

For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic "You are a helpful AI assistant." — the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.

vLLM serving

vllm serve lordx64/Qwable-v1 \
    --max-model-len 16384 \
    --tensor-parallel-size 2 \
    --trust-remote-code

llama.cpp / LM Studio (GGUF)

llama-cli -m Qwable-v1-IQ4_XS.gguf -p "Read /tmp/server.py and find the port..."

Adapter-only (compose on top of the Opus 4.7 distill)

If you already have the Opus 4.7 distill loaded:

from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled",
    torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter")

Tool-use format

The Fable-5 SFT data uses a custom XML envelope for tool calls, not Qwen's native <tool_call> token format. Properly-elicited outputs look like:

<think>
The user wants me to change the port from 8000 to 8080. I should Read the file first
to see the current configuration, then Edit it.
</think>

<tool_use name="Read" id="toolu_01ABC...">
{
  "file_path": "/tmp/server.py"
}
</tool_use>

Tool results come back as:

<tool_result id="toolu_01ABC..." is_error="false">
{file contents}
</tool_result>

Eliciting the format reliably

Two paths produce the XML format consistently:

1. Agent system prompt — the simplest, works in one-shot:

system: You are a coding agent. When you need to read, write, edit, or run code,
emit XML tool calls in this exact format:
<tool_use name="X" id="toolu_01abc">
{"...": "..."}
</tool_use>
Do NOT respond with markdown code blocks. Always use <tool_use> XML.

2. Multi-turn conversation — supply a prior <tool_result> and the model continues in XML for the rest of the conversation, no system prompt needed.

Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The format is learned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic.

Tool names are not bound to the Claude Code inventory

The training data uses Claude Code's tool names (Read, Edit, Bash, WebFetch, mcp__*, etc.). The merged model emits sensible-but-invented names like read_file, Replace, write_file instead. The XML envelope transferred; the vocabulary didn't bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but anything that routes calls by exact tool name needs a normalizer (e.g. read_file → Read).

Native Qwen tool calling

This format is chat-template-agnostic and parses with a small regex. Downstream consumers wanting native Qwen <tool_call> JSON calling will need either (a) a wrapper that converts the XML to <tool_call> JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.

Limitations

Tool-use format is system-prompt-conditional. With a generic prompt ("Fix this bug for me"), Qwable-v1 falls back to the Opus 4.7 prior — explains the fix in markdown code blocks instead of emitting <tool_use> XML. With either (a) an explicit system prompt asking for tool calls in <tool_use name="X" id="Y">…</tool_use> format, or (b) a preceding <tool_result>…</tool_result> turn in the conversation, the format works correctly. Treat Qwable-v1 like Claude Code: always run it inside a harness that supplies a tool-use system prompt + tool registry.
Tool names don't bind to the original Claude Code inventory. The model emits XML with sensible-but-invented tool names like read_file, Replace, etc., rather than the exact Claude Code tool names (Read, Edit, etc.) from the training data. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but auto-routing tool calls to a fixed schema will need a tool-name normalizer.
Narrow training distribution. ~5k rows from one developer's Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows that weren't in the training data) will be hit-or-miss.
Custom tool envelope. <tool_use> XML doesn't slot into vLLM's tool-calling API automatically. Need a parser wrapper to convert to <tool_call> JSON if you want vLLM's native tool-call detection.
Persona drift. Two SFT rounds against Anthropic-style outputs may produce a model that occasionally refuses things Qwen wouldn't refuse, or that self-identifies as Claude in chat. Mild on Opus 4.7 alone; unknown additive effect from Fable-5.
Reasoning is from Opus 4.7, not Fable-5. Don't expect Qwable-v1 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks (math, science, GPQA). It should match. The new capability axis is agentic tool-use, not better reasoning.
No formal evals at v1 ship time. Pending.

License & terms

This model is released under AGPL-3.0, inherited from the upstream Glint-Research/Fable-5-traces dataset license. Downstream users running Qwable-v1 in a network-accessible service must comply with AGPL §13 (source disclosure for network use).

The underlying Fable-5 thinking traces are derivative content from Anthropic's claude-fable-5 preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance with Anthropic's usage policies for their specific use case before fine-tuning further or building commercial products on this model.

The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v1's AGPL designation supersedes those due to the Fable-5 data's AGPL upstream.

Citation

@misc{lordx64_qwable_v1_2026,
  title  = {Qwable-v1: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B},
  author = {lordx64},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/lordx64/Qwable-v1}},
}

Acknowledgements

Glint-Research for collecting and re-publishing the Fable-5 trace corpus with cleartext CoT — the only viable source after Anthropic's API-side redaction policy.
TeichAI for the upstream 953-trace collection that Glint-Research built on.
Anthropic for the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
Qwen team for releasing Qwen3.6-35B-A3B under Apache 2.0.
Unsloth for 2× faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR #601.
HuggingFace for the Inference Endpoint H200 fleet (Seoul ap-northeast-2) where the training actually ran.