Interfaze

logo

Beta

pricing

help

docs

blog

sign in

Lift

Lift by datalab-to, a image-text-to-text model with multimodal capabilities. Understand and compare multimodal features, benchmarks, and capabilities.

Comparison

FeatureLiftInterfaze
Input Modalities

text, image, document

image, text, audio, video, document

Native OCRNoYes
Long Document ProcessingNoYes
Language Support

unknown

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

unknown

1M

Tool CallingNo

Tool calling supported + built in browser, code execution and web search

Scaling

FeatureLiftInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

lift is a structured extraction model from Datalab that pulls structured JSON out of PDFs and images. Pass any JSON schema and lift returns a JSON object matching it, using schema-constrained decoding to guarantee valid, well-typed output.

Try lift in the free playground, or use the hosted API for higher accuracy, per-field verification, and citations.

Features

  • Extract structured data from documents
  • Pass any JSON schema
  • Handles multi-page documents in a single pass, including values that span pages
  • Two inference modes: local (HuggingFace) and remote (vLLM server)
  • CLI for single files, inline schemas, or whole directories
  • Schema Studio: a Streamlit app to build, save, and test schemas against your documents

Quickstart

pip install lift-pdf


lift_vllm
lift_extract input.pdf ./output --schema schema.json


pip install lift-pdf[hf]
lift_extract input.pdf ./output --schema schema.json --method hf

A schema is standard JSON Schema. Keep it simple — string, number, integer, boolean, arrays of those, arrays of objects, and nested objects are all supported. Write a description for any field whose name isn't self-explanatory, and mark a field required only when it must appear; fields genuinely absent from a document come back null.

{
  "type": "object",
  "properties": {
    "invoice_number": {"type": "string", "description": "Invoice identifier"},
    "total": {"type": "number", "description": "Total amount due"},
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": {"type": "string"},
          "amount": {"type": "number"}
        }
      }
    }
  },
  "required": ["invoice_number", "total"]
}

Usage

from lift import extract
from lift.model import InferenceManager


model = InferenceManager(method="vllm")
result = extract("document.pdf", "schema.json", model=model)
print(result.extraction)

With HuggingFace Transformers

from lift import extract
from lift.model import InferenceManager


model = InferenceManager(method="hf")
result = extract("document.pdf", "schema.json", model=model)
print(result.extraction)

extract accepts the schema as a dict, a path to a .json file, an inline JSON string, or the name of a saved schema. Pass page_range="0-5" to limit PDF pages, and set VLLM_API_BASE to target a remote server.

Benchmarks

Evaluated on a 225-document extraction benchmark (6–64 pages per document, ~11,000 scored fields) with adversarial cases planted throughout: cross-page values, exhaustive lists, fields that must be left null, near-miss distractors, multi-source aggregation. Scoring is deterministic exact-match against ground truth (numeric tolerance, normalized strings).

All models receive the same rendered page images, and extract each document in a single pass.

ModelSizeField accuracyFull-document accuracyMedian latency*Features
Datalab API95.9%44.4%30.8sCitations + Verification
Gemini Flash 3.591.3%40.0%28.1s
lift9B90.2%20.9%9.5s
Azure Content Understanding83.4%22.2%73.7s
NuExtract34B81.5%8.4%8.3s
Qwen3.5-9B9B76.3%24.0%16.8s

* Per document, 8 concurrent requests. Local models (lift, Qwen3.5-9B, NuExtract3) served with vLLM on a single GPU; Gemini, Datalab, and Azure via API. Latency varies with hardware and load — treat as relative, not absolute.

  • Field accuracy — fraction of individual schema fields extracted correctly.
  • Full-document accuracy — fraction of documents where every field is correct.

Hosted models with verification, citations, and confidence scores are available via the Datalab API — test in the playground.

Commercial Usage

Code is Apache 2.0. Model weights use a modified OpenRAIL-M license: free for research, personal use, and startups under $5M funding/revenue. Cannot be used competitively with our API. For broader commercial licensing, see pricing.

Credits

Want more deterministic results?