使用 onnxruntime-web 实现本地图片文字识别-MicroMatrix

背景

在现代的 web 应用中，提供本地的文字识别功能不仅能提高用户体验，还能确保数据的隐私性。这类功能通常会在客户端直接运行，避免了将图像数据传输到服务器的隐私泄漏风险。同时，本地运行还能够减少对服务器的依赖，降低网站的维护成本。

经过对多个方案的调研，最常见的两种实现方式是：

使用 tesseract.js 配合预训练的模型来完成文字识别；
使用 onnxruntime-web 配合自己训练的模型或者公开的训练模型来完成文字识别。

这篇文章将重点介绍如何使用 onnxruntime-web 来实现图片的文字识别，并具体展示如何在 Vue.js 项目中集成它。

什么是 onnxruntime-web？

onnxruntime-web 是一个 JavaScript 库，用于在浏览器中执行 ONNX 模型。它是 ONNX Runtime 的一部分，ONNX Runtime 是一个跨平台的推理引擎，支持多种深度学习框架（如 PyTorch、TensorFlow 等）导出的模型。通过 onnxruntime-web，开发者可以在网页应用中直接运行 ONNX 模型进行推理，而无需依赖服务器端的计算资源。

如何使用 onnxruntime-web？

onnxruntime-web 的使用流程分为以下几个步骤：

用户提供图像：用户上传图像文件用于文字识别。
图像预处理：对上传的图像进行预处理（如调整大小、格式化等），确保其适配 ONNX 模型的输入要求。
加载 ONNX 模型：通过 onnxruntime-web 加载预训练的检测（det）和识别（rec）模型。
文本检测（det）：使用检测模型识别图像中的文本区域。
文本识别（rec）：识别文本区域中的文字。
解码输出：通过字典（decodeDic）将模型的输出转换为实际的字符。
返回识别结果：将识别结果展示给用户。

这里，我使用了第三方的 eSearch-OCR 插件，它封装了 ONNX 模型的推理过程，让你可以更简洁地进行图像文字识别。

在项目中实现文字识别

在实际项目中，我们需要结合 onnxruntime-web 和 eSearch-OCR 插件来完成文字识别。接下来我会展示如何在 Vue.js 中集成这些功能。

1. 引入 `onnxruntime-web` 和 `eSearch-OCR`

首先，我们需要引入 onnxruntime-web 和 eSearch-OCR，并通过 init 函数进行初始化。

import { init } from "esearch-ocr";
import * as ort from "onnxruntime-web";

2. 配置模型路径和字典文件

然后，我们需要配置检测和识别模型的路径。在这里，我使用的是 PaddleOCR 的预训练模型，并通过 decodeDic 参数传入识别字典。

const ocrInstance = shallowRef<any>(null);

const initModel = async () => {
  try {
    status.value = t("image.ocr-image.statusLoading");

    // 初始化 eSearch-OCR
    ocrInstance.value = await init({
      ort,
      det: {
        // 检测模型路径
        input: "/models/ppocr_v5_mobile_det.onnx",
      },
      rec: {
        // 识别模型路径
        input: "/models/ppocr_v5_mobile_rec.onnx",
        // 传入解码字典
        decodeDic: dictContent,
        optimize: {
          space: false,
        },
      },
    });

    status.value = t("image.ocr-image.statusReady");
  } catch (e) {
    console.error("模型加载失败:", e);
    status.value = t("image.ocr-image.statusError");
  }
};

在上面的代码中，det 是检测模型，rec 是识别模型，decodeDic 是字典内容，所有这些都必须根据实际的模型进行配置。

3. 处理用户上传的图像

用户上传图像后，我们会读取图像并将其加载到画布中。此处使用了 Fabric.js 来处理图像的显示和缩放。

const handleFile = (e: Event) => {
  const file = (e.target as HTMLInputElement).files?.[0];
  if (!file || !fabricCanvas) return;

  const reader = new FileReader();
  reader.onload = async (f) => {
    const data = f.target?.result as string;
    const img = await FabricImage.fromURL(data);

    // 缩放图片适应画布
    const scale = Math.min(800 / img.width!, 600 / img.height!);
    img.scale(scale);

    fabricCanvas?.clear();
    fabricCanvas?.add(img);
    fabricCanvas?.centerObject(img);
    fabricCanvas?.renderAll();

    results.value = [];
    status.value = t("image.ocr-image.imageReady");
  };
  reader.readAsDataURL(file);
};

4. 开始 OCR 识别

当用户点击 "开始识别" 按钮时，我们将图像从画布中提取为 base64 格式，并传递给 eSearch-OCR 进行处理。

const startOCR = async () => {
  if (!ocrInstance.value || !fabricCanvas) return;

  isLoading.value = true;
  status.value = t("image.ocr-image.recognitionInProgress");

  try {
    // 获取 Canvas 图像数据
    const canvasEl = fabricCanvas.getElement();
    const dataUrl = canvasEl.toDataURL("image/png");

    // 执行识别
    const output = await ocrInstance.value.ocr(dataUrl);

    results.value = output.parragraphs.filter((item: any) => item.text !== "");

    status.value = t("image.ocr-image.recognitionComplete", results.value.length);
  } catch (e) {
    console.error("识别出错:", e);
    status.value = t("image.ocr-image.recognitionError");
  } finally {
    isLoading.value = false;
  }
};

在这里，ocrInstance.value.ocr(dataUrl) 会调用 eSearch-OCR 的 ocr 方法，将图像数据传递给模型，执行文字识别，并返回识别结果。

5. 显示识别结果

最后，识别结果会显示在页面上，用户可以看到图像中的文字被提取出来。

<div v-if="results.length === 0" class="text-sm text-muted-foreground text-center py-10">
  {{ t("image.ocr-image.noResults") }}
</div>
<div v-for="(res, idx) in results" :key="idx" class="p-3 bg-slate-100 rounded text-sm text-slate-700 hover:bg-slate-200 transition-colors">
  {{ res.text }}
</div>

总结

通过集成 onnxruntime-web 和 eSearch-OCR，我们能够在浏览器端实现高效的图像文字识别。这种本地处理的方式不仅保障了数据隐私，还提升了应用的响应速度。本文介绍了如何使用 onnxruntime-web 结合 eSearch-OCR 插件进行模型加载、图像处理和文本识别，希望对你在实现类似功能时有所帮助。

所需文件

模型文件以及字典文件

目录CONTENT

使用 onnxruntime-web 实现本地图片文字识别

背景

什么是 onnxruntime-web？

如何使用 onnxruntime-web？

在项目中实现文字识别

1. 引入 `onnxruntime-web` 和 `eSearch-OCR`

2. 配置模型路径和字典文件

3. 处理用户上传的图像

4. 开始 OCR 识别

5. 显示识别结果

总结

所需文件

评论区

使用 onnxruntime-web 实现本地图片文字识别

背景

什么是 onnxruntime-web？

如何使用 onnxruntime-web？

在项目中实现文字识别

1. 引入 onnxruntime-web 和 eSearch-OCR

2. 配置模型路径和字典文件

3. 处理用户上传的图像

4. 开始 OCR 识别

5. 显示识别结果

总结

所需文件

评论区

1. 引入 `onnxruntime-web` 和 `eSearch-OCR`