用 MLX Whisper 在 Mac 上极速转录音频

前言
#

最近有音频转录的需求，正好刷到 Awni Hannun 发的一条推文，最新的 MLX Whisper 速度更快了，M2 Ultra 上 12.3 秒就能转录 12 分钟的音频，接近 60 倍实时速度：

The latest MLX Whisper is even faster.

Whisper v3 Turbo on an M2 Ultra transcribes ~12 minutes in 12.3 seconds. Nearly 60x real-time.

pip install -U mlx-whisper pic.twitter.com/DcKE0TRcbv
— Awni Hannun (@awnihannun) November 1, 2024

刚好手上有 Mac，就想试试这个方案。研究了一下发现用起来非常简单，记录一下过程。

安装
#

MLX Whisper 是基于 Apple MLX 框架的 Whisper 实现，跑在 Apple Silicon 上效率很高。安装方式有两种。

常规方式
#

pip install -U mlx-whisper

用 uv 安装
#

我个人有点系统洁癖，不太喜欢往全局环境里装东西。uv ¹ 是一个 Python 包管理工具，它的 uv tool install 可以把命令行工具安装到隔离环境中，不会污染系统 Python。如果你也有类似习惯，推荐这种方式：

uv tool install mlx-whisper

安装完之后 mlx_whisper 命令就可以直接用了。

使用
#

命令行直接用
#

mlx_whisper audio.mp3 --model mlx-community/whisper-large-v3-turbo

写个脚本批量处理
#

我写了一个小脚本，支持一次传入多个音频文件，转录完会自动保存为同名的 .txt 文件：

import sys
import os
import mlx_whisper

if len(sys.argv) < 2:
    print("用法: python whisper.py <音频文件> [音频文件...]")
    sys.exit(1)

model = "mlx-community/whisper-large-v3-turbo"

for file in sys.argv[1:]:
    print(f"正在转录: {file}")
    result = mlx_whisper.transcribe(file, path_or_hf_repo=model)

    output = "\n\n".join(seg["text"].strip() for seg in result["segments"])

    txt_path = os.path.splitext(file)[0] + ".txt"
    with open(txt_path, "w", encoding="utf-8") as f:
        f.write(output + "\n")
    print(f"已保存: {txt_path}")
    print()

用法很简单，把音频文件拖进去就行。因为我是用 uv 管理的，脚本也直接用 uv 运行，--with 会自动临时安装依赖：

uv run --with mlx-whisper python whisper.py 音频1.mp3 音频2.m4a

模型存储位置
#

模型会通过 Hugging Face Hub 自动下载，默认缓存在：

~/.cache/huggingface/hub/models--mlx-community--whisper-large-v3-turbo/

如果以后不想用了，直接删掉这个目录就行，不留残余。

小结
#

MLX Whisper 配合 Apple Silicon 确实很快，基本就是丢个文件进去等几秒出结果。安装和清理也都很干净，推荐有 Mac 的朋友试试。

uv，Astral 出品的 Python 包管理工具 ↩︎

前言 #

安装 #

常规方式 #

用 uv 安装 #

使用 #

命令行直接用 #

写个脚本批量处理 #

模型存储位置 #

小结 #