FastContext 1.0 4B SFT

FastContext 1.0 4B SFT by microsoft, a text-generation model. Understand and compare features, benchmarks, and capabilities.

Comparison

Feature	FastContext 1.0 4B SFT	Interfaze
Input Modalities	text	image, text, audio, video, document
Native OCR	No	Yes
Long Document Processing	No	Yes
Language Support	unknown	162+
Native Speech-to-Text	No	Yes
Native Object Detection	No	Yes
Guardrail Controls	No	Yes
Context Input Size	262.1K	1M
Tool Calling	Yes	Tool calling supported + built in browser, code execution and web search

Scaling

Feature	FastContext 1.0 4B SFT	Interfaze
Scaling	Self-hosted/Provider-hosted with quantization	Unlimited

View model card on Hugging Face

1. Model Introduction

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

Repository exploration is a major bottleneck in modern coding agents — locating relevant code consumes a large share of the token budget and pollutes the solver's context with irrelevant snippets. In our analysis of GPT-5.4 trajectories, reading and searching account for 56.2% of all tool-use turns and 46.5% of the main agent's total tokens. FastContext moves this work into a dedicated subagent so the main agent receives clean, grounded evidence rather than the long trail of exploratory reads and searches.

The model family spans 4B–30B parameters, bootstrapped from strong reference-model trajectories via supervised fine-tuning (SFT) and refined with task-grounded reinforcement learning (RL) for broad first-turn search, multi-turn evidence gathering, and precise citation generation.

Backbones: Qwen3-4B-Instruct (4B explorer) and Qwen3-Coder-30B-A3B (30B explorer)
Variants: FC-4B-SFT, FC-4B-RL (deployment targets), FC-30B-SFT (scaling reference)
Context length: up to 262K tokens
Paper: FastContext: Training Efficient Repository Explorer for Coding Agents
Code & data: https://github.com/microsoft/fastcontext

How it works

Coding Agent ──query──▶  FastContext  ──read/search──▶  Repository
     ▲                       │
     └──── file-line ────────┘
          citations

Internally, FastContext runs an exploration loop:

Query understanding — translate the issue into search intents.
Parallel tool calling — issue multiple READ / GLOB / GREP calls in a single turn to cover complementary hypotheses.
Observation-driven refinement — use tool outputs to guide the next search turn.
Final citations — return a compact <final_answer> block of file paths and line ranges.

2. Evaluation Results

End-to-end performance (Mini-SWE-Agent)

Integrating FastContext into Mini-SWE-Agent improves end-to-end resolution rates by up to 5.5% while reducing main-agent token consumption by up to 60%, with only marginal overhead. Scores, tokens, and turns are measured on the main-agent trajectory; deltas are relative to w/o Explore for the same main agent.

Main Agent	Subagent	SWE-bench Multilingual	SWE-bench Pro	SWE-QA
GPT-5.4	w/o Explore	71.7 / 457k	46.0 / 818k	81.3 / 418k
	FC-30B-SFT	75.0 (↑3.3) / 356k (↓22.1%)	49.0 (↑3.0) / 688k (↓15.9%)	82.0 (↑0.7) / 206k (↓50.7%)
	FC-4B-SFT	73.3 (↑1.6) / 364k (↓20.4%)	47.0 (↑1.0) / 689k (↓15.8%)	81.9 (↑0.6) / 213k (↓49.0%)
	FC-4B-RL	74.7 (↑3.0) / 338k (↓26.0%)	48.5 (↑2.5) / 701k (↓14.3%)	82.0 (↑0.7) / 210k (↓49.8%)
GLM-5.1	w/o Explore	72.3 / 2514k	17.5 / 2692k	72.7 / 401k
	FC-30B-SFT	73.7 (↑1.4) / 1797k (↓28.5%)	20.0 (↑2.5) / 2370k (↓12.0%)	73.3 (↑0.6) / 292k (↓27.2%)
	FC-4B-SFT	73.3 (↑1.0) / 1919k (↓23.7%)	18.0 (↑0.5) / 2279k (↓15.3%)	73.4 (↑0.7) / 306k (↓23.7%)
	FC-4B-RL	73.7 (↑1.4) / 1971k (↓21.6%)	22.5 (↑5.0) / 2210k (↓17.9%)	73.5 (↑0.8) / 302k (↓24.7%)
Kimi-K2.6	w/o Explore	76.3 / 1553k	31.0 / 2383k	71.6 / 510k
	FC-30B-SFT	76.7 (↑0.4) / 1360k (↓12.4%)	33.0 (↑2.0) / 2150k (↓9.8%)	72.8 (↑1.2) / 373k (↓26.9%)
	FC-4B-SFT	75.3 (↓1.0) / 1306k (↓15.9%)	32.5 (↑1.5) / 2159k (↓9.4%)	72.6 (↑1.0) / 402k (↓21.2%)
	FC-4B-RL	78.3 (↑2.0) / 1384k (↓10.9%)	33.5 (↑2.5) / 2158k (↓9.4%)	72.6 (↑1.0) / 378k (↓25.9%)

Score / Tokens shown per cell. Best result per main-agent block in bold.

Highlights:

FastContext improves end-to-end accuracy for every main agent and benchmark; the largest gains appear on SWE-bench Pro (e.g. GPT-5.4 +5.5, GLM-5.1 +5.0).
The biggest token savings reach 60.3% (GPT-5.4 on SWE-QA).
The compact 4B-RL explorer can outperform the larger 30B-SFT explorer — e.g. on GLM-5.1 SWE-bench Pro it reaches 22.5 vs. 20.0 while using fewer tokens.

3. Quick Start

Launch the model with an OpenAI-compatible server (e.g. SGLang). The example below serves the 4B explorer:

python3 -m sglang.launch_server \
    --model-path FastContext-1.0-4B-SFT \
    --tool-call-parser qwen \
    --context-length 262144 \
    --trust-remote-code \
    --dtype bfloat16 \
    --host 0.0.0.0 \
    --port 30000 \
    --tp-size 1 \
    --mem-fraction-static 0.8

FastContext exposes only three read-only tools to the model:

Tool	Purpose
`READ`	Return line-numbered file contents
`GLOB`	Path discovery by glob pattern
`GREP`	Regex search over repository text (ripgrep-style)

At each turn the explorer either issues one or more (parallel) tool calls or stops with a final <final_answer> evidence list. Wire FastContext into a coding agent (e.g. Mini-SWE-Agent) as an exploration subagent the main agent can invoke on demand.

4. Training Recipe

FastContext is trained in two stages:

Supervised fine-tuning (SFT): The exploration traces, split into three sources matching the runtime behavior of the subagent — parallel_toolcalls (broad first-turn search), multiturn_traj (multi-turn evidence gathering), and linerange (precise citation generation).
Reinforcement learning (RL): The model is rolled out as the actual subagent and optimized with GRPO using a deterministic reward combining file- and line-level F1, a bonus for bounded parallel exploration, and format penalties.

License

This project is licensed under the MIT License.