Interfaze

logo

Beta

pricing

help

docs

blog

sign in

LFM2.5 230M

LFM2.5 230M by LiquidAI, a text-generation model. Understand and compare features, benchmarks, and capabilities.

Comparison

FeatureLFM2.5 230MInterfaze
Input Modalities

text

image, text, audio, video, document

Native OCRNoYes
Long Document ProcessingNoYes
Language Support

10 partial

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

32.8K

1M

Tool CallingYes

Tool calling supported + built in browser, code execution and web search

Scaling

FeatureLFM2.5 230MInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

LFM2.5 is a family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

  • Our most compact model yet: 230M parameters that punch above their weight, bringing real capability to the tightest memory and compute budgets.
  • Fast edge inference: Best throughput from low-cost CPUs to production GPUs, running at 213 tok/s decode speed on Galaxy S25 Ultra and 42 tok/s on a Raspberry Pi 5.
  • Built for agentic tasks: Distilled from LFM2.5-350M and refined with multi-stage reinforcement learning, making it well-suited for tool use and data extraction.

Find more information about LFM2.5-230M in our blog post.

lfm2_5_230m_benchmarks

🗒️ Model Details

ModelParametersDescription
LFM2.5-230M-Base230MPre-trained base model for fine-tuning
LFM2.5-230M230MGeneral-purpose instruction-tuned model

LFM2.5-230M is a general-purpose text-only model with the following features:

  • Number of parameters: 230M
  • Number of layers: 14 (8 double-gated LIV convolution blocks + 6 GQA blocks)
  • Training budget: 19T tokens
  • Context length: 32,768 tokens
  • Vocabulary size: 65,536
  • Knowledge cutoff: Mid-2024
  • Languages: English, Arabic, Chinese, French, German, Italian, Japanese, Korean, Portuguese, Spanish
  • Generation parameters:
    • temperature: 0.1
    • top_k: 50
    • repetition_penalty: 1.05
ModelDescription
LFM2.5-230MOriginal model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang.
LFM2.5-230M-GGUFQuantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment.
LFM2.5-230M-ONNXONNX Runtime format for cross-platform deployment.
LFM2.5-230M-MLXMLX format for Apple Silicon. Optimized for fast inference on Mac devices.

We recommend using it for data extraction and lightweight on-device agentic pipelines. It is not recommended for reasoning-heavy workloads such as advanced math, code generation, or creative writing.

Chat Template

LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details. Example:

<|startoftext|><|im_start|>system You are a helpful assistant trained by Liquid AI.<|im_end|> <|im_start|>user What is C. elegans?<|im_end|> <|im_start|>assistant

You can use tokenizer.apply_chat_template() to format your messages automatically.

Tool Use

LFM2.5 supports function calling in four steps:

  1. Function definition: Provide the list of tools as a JSON object in the system prompt, or use tokenizer.apply_chat_template() with tools=....
  2. Function call: By default, LFM2.5 writes Pythonic function calls (a Python list between <|tool_call_start|> and <|tool_call_end|> special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt.
  3. Function execution: Execute the call and return the result with the tool role.
  4. Final answer: LFM2.5 interprets the tool output and returns a plain-text answer addressing the original prompt.

See the Tool Use documentation for the full guide. Example:

<|startoftext|><|im_start|>system List of tools: [{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|im_end|> <|im_start|>user What is the current status of candidate ID 12345?<|im_end|> <|im_start|>assistant <|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|> <|im_start|>tool [{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|> <|im_start|>assistant The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

🏃 Inference

LFM2.5 is supported by many inference frameworks. See the Inference documentation for the full list.

NameDescriptionDocsNotebook
TransformersSimple inference with direct access to model internals.Link
vLLMHigh-throughput production deployments with GPU.Link
llama.cppCross-platform inference with CPU offloading.Link
MLXApple's machine learning framework optimized for Apple Silicon.Link
LM StudioDesktop application for running LLMs locally.Link
SGLangHigh-throughput production deployments with GPU.Link-

Quick start with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-230M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",

)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
)["input_ids"].to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.1,
    top_k=50,
    repetition_penalty=1.05,
    max_new_tokens=512,
    streamer=streamer,
)

🔧 Fine-Tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

NameDescriptionDocsNotebook
CPT (Unsloth)Continued Pre-Training using Unsloth for text completion.Link
CPT (Unsloth)Continued Pre-Training using Unsloth for translation.Link
SFT (Unsloth)Supervised Fine-Tuning with LoRA using Unsloth.Link
SFT (TRL)Supervised Fine-Tuning with LoRA using TRL.Link
DPO (TRL)Direct Preference Optimization with LoRA using TRL.Link
GRPO (Unsloth)GRPO with LoRA using Unsloth.Link
GRPO (TRL)GRPO with LoRA using TRL.Link

📊 Performance

Benchmarks

ModelGPQA DiamondMMLU-ProIFEvalIFBenchMulti-IF
LFM2.5-230M25.4120.2571.7138.4037.70
LFM2.5-350M30.6420.0176.9640.6944.92
LFM2-350M27.5819.2964.9618.2032.92
Granite 4.0-H-350M22.3213.1461.2717.2228.70
Granite 4.0-350M25.9112.8453.4815.9824.21
Qwen3.5-0.8B (Instruct)27.4137.4259.9422.8741.68
Gemma 3 1B IT23.8914.0463.4920.3344.25
ModelCaseReportBenchBFCLv3BFCLv4τ²-Bench Telecomτ²-Bench Retail
LFM2.5-230M22.5143.2621.035.2613.68
LFM2.5-350M32.4544.1121.8618.8617.84
LFM2-350M11.6722.9512.2910.825.56
Granite 4.0-H-350M12.4443.0713.2813.746.14
Granite 4.0-350M0.8439.5813.732.926.14
Qwen3.5-0.8B (Instruct)13.8335.0818.7012.576.14
Gemma 3 1B IT2.2816.617.179.366.43

CPU Inference

image

GPU Inference

image

📬 Contact

Citation

@article{liquidAI2026230M,
  author = {Liquid AI},
  title = {LFM2.5-230M: Built to Run Anywhere},
  journal = {Liquid AI Blog},
  year = {2026},
  note = {www.liquid.ai/blog/lfm2-5-230m},
}
@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}

Want more deterministic results?