Document processing

Document Processing API

Tinfoil’s document processing service extracts structured Markdown from uploaded documents — including PDFs, DOCX, PPTX, XLSX, HTML, CSV, and images. The entire service runs inside a secure enclave, and the VLM used for OCR and visual extraction also runs in its own secure enclave — so your documents are never exposed to any operator. Born-digital PDFs are parsed using MuPDF inside a sandboxed subprocess with no network access, environment variables, or filesystem; scanned pages and images are sent to the VLM for OCR. You can use document processing in two ways:

Call /v1/convert/file directly when you want extracted Markdown (or page images) back from the document service.
Send a base64-encoded file through the OpenAI-compatible /v1/responses or /v1/chat/completions APIs. Tinfoil privately converts the attachment and forwards either Markdown (for text-only models) or per-page Markdown plus page images (for vision-capable models) to the model. You can override the default with the optional tinfoil_mode field.

Current scope: OpenAI-compatible file input support currently accepts base64 file_data only. file_id and the /v1/files upload flow are not supported.

1. Convert A Document Directly

The document processing endpoint accepts multipart/form-data requests at /v1/convert/file. Upload one or more files with field name files. You can control extraction behavior with the mode query parameter:

Mode	Description
`text` (default)	Markdown from the text layer. VLM OCR only for scanned pages.
`vision`	Text plus VLM OCR for scanned pages and VLM visual descriptions (tables, charts, diagrams, formulas) for born-digital pages.
`images`	Per-page text plus page images as base64 PNG. No VLM.
`raw`	Text layer only. No VLM, no image rendering.
`vlm`	Full-page VLM OCR on every page.

import { SecureClient } from 'tinfoil'
import fs from 'fs'

const client = new SecureClient()

const fileBuffer = fs.readFileSync('doc.pdf')
const blob = new Blob([fileBuffer], { type: 'application/pdf' })

const formData = new FormData()
formData.append('files', blob, 'doc.pdf')

// Default mode — fast, no VLM for born-digital PDFs
const response = await client.fetch('/v1/convert/file', {
  method: 'POST',
  body: formData,
})

const result = await response.json()
// result.document.md_content contains the converted Markdown
console.log(result.document.md_content)

The response includes the extracted Markdown content. When uploading a single file, the result is in document; for multiple files, results are in a documents array:

{
  "document": {
    "md_content": "# Title\n\nExtracted text..."
  },
  "status": "success",
  "processing_time": 1.23
}

In images mode, each page includes its extracted text, a base64-encoded PNG, and a scanned/born-digital flag:

{
  "document": {
    "md_content": "# Title\n\nExtracted text...",
    "pages": [
      { "page": 1, "text": "# Title\n\nExtracted text...", "image": "iVBORw0KGgo...", "is_scanned": false },
      { "page": 2, "text": "",                              "image": "iVBORw0KGgo...", "is_scanned": true },
      { "page": 3, "text": "## Conclusion\n\n...",          "image": "iVBORw0KGgo...", "is_scanned": false }
    ]
  },
  "status": "success",
  "processing_time": 2.45
}

text mirrors the per-page slice of md_content; pure scans come back with an empty text field. When uploading multiple files, the response uses documents (an array) instead of document:

{
  "documents": [
    { "md_content": "# First document..." },
    { "md_content": "# Second document..." }
  ],
  "status": "success",
  "processing_time": 3.21
}

Pairing `images` Mode With A Vision Model

For a vision-capable model (e.g. kimi-k2-6, gemma4-31b), interleave each page’s text and image and wrap the raw image base64 as a data URI:

const convertResp = await client.fetch('/v1/convert/file?mode=images', {
  method: 'POST',
  body: formData,
})
const { document } = await convertResp.json()

const content = [{ type: 'text', text: '[Attached file: doc.pdf]' }]
for (const p of document.pages) {
  const label = p.is_scanned ? `Page ${p.page} (scanned):` : `Page ${p.page}:`
  content.push({ type: 'text', text: p.text ? `${label}\n${p.text}` : label })
  content.push({
    type: 'image_url',
    image_url: { url: `data:image/png;base64,${p.image}` },
  })
}
content.push({ type: 'text', text: 'What is this PDF about?' })

const visionResp = await client.fetch('/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'kimi-k2-6',
    messages: [{ role: 'user', content }],
  }),
})

This recovers visual elements that text extraction discards — illustrations, diagrams, color-coding, page decorations, and other layout cues — while still giving the model accurate, parser-extracted text. When you instead attach a PDF as base64 file_data on /v1/responses or /v1/chat/completions with a vision-capable model, Tinfoil performs this same per-page interleave automatically.

2. Use File Inputs With The Responses API

If you want OpenAI-compatible file attachments, send a base64-encoded file in an input_file content part on /v1/responses.

import { SecureClient } from 'tinfoil'
import fs from 'fs'

const client = new SecureClient()
const fileData = fs.readFileSync('doc.pdf').toString('base64')

const response = await client.fetch('/v1/responses', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gpt-oss-120b',
    input: [
      {
        role: 'user',
        content: [
          {
            type: 'input_file',
            filename: 'doc.pdf',
            file_data: `data:application/pdf;base64,${fileData}`,
          },
          {
            type: 'input_text',
            text: 'Summarize this document in 3 bullet points.',
          },
        ],
      },
    ],
  }),
})

const result = await response.json()
console.log(result.output_text)

For binary formats such as PDF, DOCX, PPTX, and images, Tinfoil processes the attachment through the private document-processing backend before forwarding it to the model. By default the router picks the best shape per attachment:

Routed model	Default PDF / image behavior
Vision-capable	Per-page interleaved Markdown and page images.
Text-only	Markdown only, for speed.

You can check whether a model is vision-capable via the multimodal field on GET /v1/models. DOCX, PPTX, XLSX, HTML, CSV, and plain text attachments are always forwarded as extracted Markdown regardless of the routed model. You can override the default per attachment with tinfoil_mode.

3. Use File Inputs With Chat Completions

The OpenAI-compatible Chat Completions shape uses type: "file" with a nested file object.

import { SecureClient } from 'tinfoil'
import fs from 'fs'

const client = new SecureClient()
const fileData = fs.readFileSync('doc.pdf').toString('base64')

const response = await client.fetch('/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gpt-oss-120b',
    messages: [
      {
        role: 'user',
        content: [
          {
            type: 'file',
            file: {
              filename: 'doc.pdf',
              file_data: `data:application/pdf;base64,${fileData}`,
            },
          },
          {
            type: 'text',
            text: 'Summarize this document in 3 bullet points.',
          },
        ],
      },
    ],
  }),
})

const result = await response.json()
console.log(result.choices[0].message.content)

4. Override The PDF Processing Mode

Set the optional Tinfoil-specific tinfoil_mode field directly on the file content part to override the auto-default — for example to force VLM full-page OCR on a low-quality scan:

const response = await client.fetch('/v1/responses', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gpt-oss-120b',
    input: [{
      role: 'user',
      content: [
        {
          type: 'input_file',
          filename: 'scan.pdf',
          file_data: `data:application/pdf;base64,${fileData}`,
          tinfoil_mode: 'vlm',
        },
        { type: 'input_text', text: 'Extract every line as plain text.' },
      ],
    }],
  }),
})

The router consumes the field and strips it before the request is forwarded, so the upstream model never sees it.

Value	Behavior
`auto` (default)	`images` for vision-capable models, `text` for text-only models.
`text`	Markdown from the text layer; VLM OCR only on scanned pages.
`vision`	Markdown plus VLM visual descriptions for figures, charts, and tables.
`images`	Per-page interleaved Markdown and images. Requires a vision-capable model; returns `400` otherwise.
`raw`	Text layer only. No VLM, no image rendering.
`vlm`	Full-page VLM OCR on every page. Highest quality, slowest.

tinfoil_mode only affects PDF and image attachments; for DOCX, PPTX, XLSX, HTML, CSV, and plain text the field has no effect.

tinfoil_mode is a Tinfoil-specific extension and is not understood by OpenAI’s API. If your code needs to target both Tinfoil and OpenAI from the same request body, omit the field.

On Chat Completions the field nests inside the file object alongside filename and file_data:

{
  "type": "file",
  "file": {
    "filename": "doc.pdf",
    "file_data": "data:application/pdf;base64,...",
    "tinfoil_mode": "text"
  }
}

Supported Formats

Format	Extraction
PDF (born-digital)	MuPDF text layer to Markdown
PDF (scanned)	VLM OCR
DOCX, PPTX, HTML, XLSX, CSV	Server-side parsers
Images	VLM OCR
Markdown, text, JSON, XML	Passthrough

Errors And Limits

Per request: up to 10 files, 50 MB each, multipart/form-data only. All non-2xx responses are {"error": "<message>"}. /health reflects the state of the different pipeline elements:

{ "status": "ok",       "router": true, "sidecar": true, "vlm": true  }
{ "status": "degraded", "router": true, "sidecar": true, "vlm": false }

Attestation

The document upload API uses the same attestation mechanism as other Tinfoil services. Use SecureClient (as shown above) to verify attestation automatically.

Try Private Chat

Experience document upload in our private chat interface with real-time privacy verification.

Configuration Repo

View the open-source configuration for Tinfoil’s confidential document processing service.

Getting Started

Model catalog

Tinfoil SDKs

Tinfoil Containers

Guides

Verification & Attestation

Tutorials

Admin API

Resources