mistral-ocr-latest
Model Overview
This Optical Character Recognition API from Mistral sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.
Maximum file size: 50
MB.
Maximum number of pages: 1000
.
Note that this OCR does not preserve character formatting: bold, underline, italics, monospace text, etc. However, it preserves footnotes (superscript text).
Setup your API Key
If you don’t have an API key for the Apilaplas API yet, feel free to use our Quickstart guide.
How to Make a Call
Copy the code from one of the examples below, depending on whether you want to process an image or a PDF.
Replace
<YOUR_LAPLASAPI_KEY>
with your LAPLAS API key from your personal account.Replace the URL of the document or image with the one you need.
If you need to use different parameters, refer to the API schema below for valid values and operational logic.
Save the modified code as a Python file and run it in an or via the console.
API Schema
Example #1: Text Recognition From an Image
We’ve found a photo of a short handwritten text for OCR testing and will be passing it to the model via URL:
import requests
def main():
response = requests.post(
"https://api.apilaplas.com/v1/ocr",
headers={
"Authorization": "Bearer <YOUR_LAPLASAPI_KEY>",
"Content-Type": "application/json",
},
json={
"document": {
"type": "image_url",
"image_url": "https://i.redd.it/hx0v4fj979k51.jpg"
},
"model": "mistral/mistral-ocr-latest",
},
)
# response.raise_for_status()
data = response.json()
# print(data)
return data
main()
Example #2: Process a PDF File
Let's process a PDF file from the internet using the described model:
import requests
def main():
response = requests.post(
"https://api.apilaplas.com/v1/ocr",
headers={
"Authorization": "Bearer <YOUR_LAPLASAPI_KEY>",
"Content-Type": "application/json",
},
json={
"document": {
"type": "document_url",
"document_url": "https://css4.pub/2015/textbook/somatosensory.pdf"
},
"model": "mistral/mistral-ocr-latest",
},
)
response.raise_for_status()
data = response.json()
print(data)
if __name__ == "__main__":
main()
Example #3: Process a PDF File And Parse the Response
As you can see above, the model returns markdown containing the recognized text with formatting elements preserved (headings, italics, bold text, etc.), along with the location of images within the text and the images themselves in base64 format, if you have enabled the corresponding option include_image_base64
. However, the markdown is returned as a string with newline characters and other string attributes, so you might need to parse the output separately to get clean markdown containing only the formatted text and images. In this example, we’ve written code that make it for us.
import os
import re
import base64
import requests
def ocr_process():
response = requests.post(
"https://api.apilaplas.com/v1/ocr",
headers={
"Authorization": "Bearer <YOUR_LAPLASAPI_KEY>",
"Content-Type": "application/json",
},
json={
"document": {
"type": "document_url",
"document_url": "https://zovi0.github.io/public_misc/test-PDF-2.pdf"
},
"model": "mistral/mistral-ocr-latest",
"include_image_base64": True,
"image_limit": 5
},
)
data = response.json()
print(data)
return data
def parse_ocr_output(ocr_output):
output_dir = "output_images"
os.makedirs(output_dir, exist_ok=True)
all_markdown = []
for page in ocr_output.get("pages", []):
md = page["markdown"]
images = {img["id"]: img["image_base64"] for img in page.get("images", []) if img.get("image_base64")}
def replace_image(match):
image_id = match.group(1)
base64_data = images.get(image_id)
if not base64_data:
return match.group(0) # Leave original markdown if no image data
# Detect image format
img_match = re.match(r"data:image/(png|jpeg|jpg);base64,(.*)", base64_data)
if not img_match:
return match.group(0)
img_format, img_b64 = img_match.groups()
ext = "jpg" if img_format in ["jpeg", "jpg"] else "png"
filename = f"{image_id}.{ext}"
filepath = os.path.join(output_dir, filename)
with open(filepath, "wb") as f:
f.write(base64.b64decode(img_b64))
return f""
# Replace image links in markdown with local image links
md = re.sub(r"!\[.*?\]\((img-\d+\.\w+)\)", replace_image, md)
all_markdown.append(md)
# Combine pages with spacing
final_md = "\n\n---\n\n".join(all_markdown)
with open("output.md", "w", encoding="utf-8") as f:
f.write(final_md)
print("Markdown and images saved.")
return final_md
if __name__ == "__main__":
ocr_output = ocr_process()
parse_ocr_output(ocr_output)
Last updated