Interfaze

logo

Beta

pricing

help

docs

blog

sign in

LFM2.5 8B A1B

LFM2.5 8B A1B by LiquidAI, a text-generation model. Understand and compare features, benchmarks, and capabilities.

Comparison

FeatureLFM2.5 8B A1BInterfaze
Input Modalities

text

image, text, audio, video, document

Native OCRNoYes
Long Document ProcessingNoYes
Language Support

9 partial

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

131.1K

1M

Tool CallingYes

Tool calling supported + built in browser, code execution and web search

Scaling

FeatureLFM2.5 8B A1BInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

  • On-device personal assistant: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices.
  • Compressed performance: Competitive with much larger dense and MoE models on instruction following and agentic tasks.
  • Unmatched throughput: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang.

Find more information about LFM2.5-8B-A1B in our blog post.

image

*AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.

🗒️ Model Details

ModelParametersDescription
LFM2.5-8B-A1B-Base8.3B total / 1.5B activePre-trained base model for fine-tuning
LFM2.5-8B-A1B8.3B total / 1.5B activeReasoning-tuned general-purpose model

LFM2.5-8B-A1B is a general-purpose text-only model with the following features:

  • Total parameters: 8.3B
  • Active parameters: 1.5B
  • Number of layers: 24 (18 double-gated LIV conv + 6 GQA)
  • Training budget: 38 trillion tokens
  • Context length: 131,072
  • Vocabulary size: 128,000
  • Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish
  • Generation parameters: We recommend the following parameters:
    • temperature: 0.2
    • top_p: 80
    • repetition_penalty: 1.05
ModelDescription
LFM2.5-8B-A1BOriginal model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang.
LFM2.5-8B-A1B-GGUFQuantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment.
LFM2.5-8B-A1B-ONNXONNX Runtime format for cross-platform deployment.
LFM2.5-8B-A1B-MLXMLX format for Apple Silicon. Optimized for fast inference on Mac devices.

We recommend using LFM2.5-8B-A1B for agentic workflows, tool use, structured outputs, multilingual assistants, and on-device personal-assistant applications. It is not the best fit for heavy programming or knowledge-intensive question answering without retrieval.

Chat Template

LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details. Example:

<|startoftext|><|im_start|>system You are a helpful assistant trained by Liquid AI.<|im_end|> <|im_start|>user What is C. elegans?<|im_end|> <|im_start|>assistant

Because LFM2.5-8B-A1B is a reasoning model, assistant turns contain an explicit chain of thought before the final answer. You can use tokenizer.apply_chat_template() to format your messages automatically.

Tool Use

LFM2.5 supports function calling in four steps:

  1. Function definition: Provide the list of tools as a JSON object in the system prompt, or use tokenizer.apply_chat_template() with tools=....
  2. Function call: By default, LFM2.5 writes Pythonic function calls (a Python list between <|tool_call_start|> and <|tool_call_end|> special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt.
  3. Function execution: Execute the call and return the result with the tool role.
  4. Final answer: LFM2.5 interprets the tool output and returns a plain-text answer addressing the original prompt.

See the Tool Use documentation for the full guide. Example:

<|startoftext|><|im_start|>system List of tools: [{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|im_end|> <|im_start|>user What is the current status of candidate ID 12345?<|im_end|> <|im_start|>assistant <|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|> <|im_start|>tool [{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|> <|im_start|>assistant The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

🏃 Inference

LFM2.5-8B-A1B is supported by many inference frameworks. See the Inference documentation for the full list.

NameDescriptionDocsNotebook
TransformersSimple inference with direct access to model internals.Link
vLLMHigh-throughput production deployments with GPU.Link
llama.cppCross-platform inference with CPU offloading.Link
MLXApple's machine learning framework optimized for Apple Silicon.Link
LM StudioDesktop application for running LLMs locally.Link

Quick start with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-8B-A1B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",

)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.2,
    top_k=80,
    repetition_penalty=1.05,
    max_new_tokens=8192,
    streamer=streamer,
)

🔧 Fine-Tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

NameDescriptionDocsNotebook
CPT (Unsloth)Continued Pre-Training using Unsloth for text completion.Link
CPT (Unsloth)Continued Pre-Training using Unsloth for translation.Link
SFT (Unsloth)Supervised Fine-Tuning with LoRA using Unsloth.Link
SFT (TRL)Supervised Fine-Tuning with LoRA using TRL.Link
DPO (TRL)Direct Preference Optimization with LoRA using TRL.Link
GRPO (Unsloth)GRPO with LoRA using Unsloth.Link
GRPO (TRL)GRPO with LoRA using TRL.Link

📊 Performance

Improvements over LFM2-8B-A1B

Thanks to reasoning, scaled-up pre-training, and large-scale RL, LFM2.5-8B-A1B improves over its predecessor across the board:

BenchmarkLFM2-8B-A1BLFM2.5-8B-A1BΔ
AA-Omniscience Index-78.42-24.70+53.62
AA-Omniscience Accuracy7.338.67+1.34
AA-Omniscience Non-Hallucination Rate7.4663.47+56.01
IFEval79.4491.84+12.40
IFBench26.0056.47+30.47
Multi-IF58.5479.93+21.39
MATH50074.8088.76+13.96
AIME2520.0042.53+22.53
BFCLv345.0764.36+19.29
BFCLv425.5248.50+22.98
Tau² Telecom13.6088.07+74.47
Tau² Retail7.0239.82+32.80

Knowledge and instruction following

ModelParametersAA-Omni. IndexAA-Omni. AccuracyAA-Omni. Non-Halluc.IFEvalIFBenchMulti-IF
LFM2.5-8B-A1B8B/A1B-24.708.6763.4791.8456.4779.93
Granite-4.0-H-Tiny7B/A1B-75.509.376.3882.2321.2859.00
Qwen3.5-4B4B-51.5317.2016.9987.8050.3867.43
Qwen3-30B-A3B-Thinking-250730.5B/3.3B-51.3118.8013.8790.8251.1179.04
Gemma-4-E2B-IT5.1B-727.0015.0582.9333.5369.70
Gemma-4-E4B-IT8B-50.678.1036.0687.7439.4877.58
Gemma-4-26B-A4B-IT26B/4B-62.0714.3710.7591.4047.2582.06
gpt-oss-20b21B/3.6B-49.1714.5724.5086.7358.6576.64

Math and agentic workflows

ModelParametersMATH500AIME25AIME26BFCLv3BFCLv4Tau² TelecomTau² Retail
LFM2.5-8B-A1B8B/A1B88.7642.5350.0064.7949.7388.0739.82
Granite-4.0-H-Tiny7B/A1B59.204.933.3356.8928.5216.6718.42
Qwen3.5-4B4B80.7654.2858.3371.0654.0187.7271.93
Qwen3-30B-A3B-Thinking-250730.5B/3.3B86.4871.6766.6773.3950.5321.9356.14
Gemma-4-E2B-IT5.1B64.00263056.4431.9122.3718.95
Gemma-4-E4B-IT8B65.0034.3340.6757.3133.9226.7542.11

CPU Inference

image

GPU Inference

LFM2.5-8B-A1B is the fastest model in its size class, reaching 18.5K output tokens per second at high concurrency, over 1.6B tokens per day on a single H100.

image

📬 Contact

Citation

@article{liquidAI20268BA1B,
  author  = {Liquid AI},
  title   = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop},
  journal = {Liquid AI Blog},
  year    = {2026},
  note    = {www.liquid.ai/blog/lfm2-5-8b-a1b},
}
@article{liquidai2025lfm2,
  title   = {LFM2 Technical Report},
  author  = {Liquid AI},
  journal = {arXiv preprint arXiv:2511.23404},
  year    = {2025}
}

Want more deterministic results?