用Python调用百度语音，学习 python 百度语音识别和合成。

用 Python 调用百度语音

学习 python百度语音识别和合成

一、整体结构

SpeechRecognition（录音）--> 百度语音（Speech-to-Text）--> 百度语音（Text-to-Speech)--> PyAudio（音频播放）

循环

二、SpeechRecognition

SpeechRecogintion 是 Python 的一个语音识别框架。

本项目里的语音识别及合成用的是百度的开放服务，所以只是需要 SpeechRecogintion 的录音功能。
它可以检测语音中的停顿自动终止录音并保存，比 PyAudio 更人性化（代码写起来也更简单）。

安装 SpeechRecognition

pip install SpeechRecognition

录音代码

import speech_recognition as sr

def rec(rate=16000):
    r = sr.Recognizer()
    with sr.Microphone(sample_rate=rate) as source:
        print("please say something")
        audio = r.listen(source)

    with open("recording.wav", "wb") as f:
        f.write(audio.get_wav_data())

rec()

从系统麦克风拾取音频数据，采样率为 16000（貌似百度语音 API 最高就支持到 16k 的采样率）。
之后把采集到的音频数据以 wav 格式保存在当前目录下的 recording.wav 文件中，供后面的程序使用。

录音完成后，可以找到录好的音频文件试听一下效果。

三、百度语音（STT）

百度语音是百度云 AI 开放平台提供的支持语音识别和语音合成的服务，注册以后就可以直接访问它的 REST API 了，并且有向普通用户提供免费的调用额度。

创建应用

注册成功以后，进入语音服务的控制台创建一个新的应用，记下自己的 AppID、API Key 和 Secret Key。

安装百度API：

pip install baidu-aip

语音识别代码如下（代码中的 Key 替换成自己的）：

from aip import AipSpeech

APP_ID = 'Your AppID'
API_KEY = 'Your API Key'
SECRET_KEY = 'Your Secret Key'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

def listen():
    with open('recording.wav', 'rb') as f:
        audio_data = f.read()

    result = client.asr(audio_data, 'wav', 16000, {
        'dev_pid': 1536,
    })

    result_text = result["result"][0]

    print("you said: " + result_text)

    return result_text

listen()

简单来说，将 SpeechRecognition 录制的音频上传至百度语音的服务，返回识别后的文本结果并输出。

四、百度语音（TTS）

其实大部分系统都有内置的 TTS （即文本转语音）引擎，如 MacOS 的 say 命令，只不过其中有很多都显得太“机械”，呃，缺少“人情味儿”。。。

百度的 TTS 引擎语音效果听起来还是很不错的（4 号选手度丫丫）。

测试代码如下：

from aip import AipSpeech

APP_ID = 'Your AppID'
API_KEY = 'Your API Key'
SECRET_KEY = 'Your Secret Key'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

def speak(text=""):
    result = client.synthesis(text, 'zh', 1, {
        'spd': 4,
        'vol': 5,
        'per': 4,
    })

    if not isinstance(result, dict):
        with open('audio.mp3', 'wb') as f:
            f.write(result)
speak("你好啊")

就是把需要转换成语音的文本内容上传，再将返回的数据保存在本地。貌似只能生成 mp3 格式。

五、PyAudio 播放

安装 PyAudio

Windows
到官网
https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio
下载对应版本
PyAudio-0.2.11-cp38-cp38-win32.whl
cp38代表python版本号3.8
下载完成后安装
```
pip install PyAudio-0.2.11-cp38-cp38-win32.whl
```
Linux
```
apt install python3-pyaudio
```

没找到 Python 播放 MP3 的合适的方法，所以用sox 命令将 MP3 转为 wav 格式，再用 PyAudio 播放。

安装 sox

SoX 是一个强大的跨平台的音频处理工具

Windows
sox官网下载
安装目录编入环境变量
Linux
```
sudo apt-get install sox libsox-fmt-mp3
```

代码如下：

import pyaudio
import wave
import os
import time

def play():
    os.system('sox audio.mp3 audio.wav')
    wf = wave.open('audio.wav', 'rb')
    p = pyaudio.PyAudio()

    def callback(in_data, frame_count, time_info, status):
        data = wf.readframes(frame_count)
        return (data, pyaudio.paContinue)

    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                    channels=wf.getnchannels(),
                    rate=wf.getframerate(),
                    output=True,
                    stream_callback=callback)

    stream.start_stream()

    while stream.is_active():
        time.sleep(0.1)

    stream.stop_stream()
    stream.close()
    wf.close()

    p.terminate()

play()

六、最终代码

上述代码整合到一起

# -*- coding: utf-8 -*-

# 录音
import speech_recognition as sr

def rec(rate=16000):
    r = sr.Recognizer()
    with sr.Microphone(sample_rate=rate) as source:
        print("please say something")
        audio = r.listen(source)

    with open("recording.wav", "wb") as f:
        f.write(audio.get_wav_data())

# 百度API
from aip import AipSpeech

APP_ID = 'Your AppID'
API_KEY = 'Your API Key'
SECRET_KEY = 'Your Secret Key'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

# 百度语音识别
def listen():
    with open('recording.wav', 'rb') as f:
        audio_data = f.read()

    result = client.asr(audio_data, 'wav', 16000, {
        'dev_pid': 1536,
    })

    result_text = result["result"][0]

    print("you said: " + result_text)

    return result_text

# 百度语音合成
def speak(text=""):
    result = client.synthesis(text, 'zh', 1, {
        'spd': 4,
        'vol': 5,
        'per': 4,
    })

    if not isinstance(result, dict):
        with open('audio.mp3', 'wb') as f:
            f.write(result)

# 播放音频
import pyaudio
import wave
import os
import time
def play():
    os.system('sox audio.mp3 audio.wav')
    wf = wave.open('audio.wav', 'rb')
    p = pyaudio.PyAudio()

    def callback(in_data, frame_count, time_info, status):
        data = wf.readframes(frame_count)
        return (data, pyaudio.paContinue)

    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                    channels=wf.getnchannels(),
                    rate=wf.getframerate(),
                    output=True,
                    stream_callback=callback)

    stream.start_stream()

    while stream.is_active():
        time.sleep(0.1)

    stream.stop_stream()
    stream.close()
    wf.close()

    p.terminate()

# 循环调用
while True:
    rec()
    request = listen()
    speak(request)
    play()

参考资料

用 Python 实现自己的智能语音助理（百度语音 + 图灵机器人）

python调用百度语音