小狐狸事務所: 利用 Google 語音辨識 API 將語音轉成文字

2017年8月24日星期四

利用 Google 語音辨識 API 將語音轉成文字

今天在 Youtube 看到大數學堂的教學影片, 介紹如何利用 Google 語音辨識服務 (Google Speech Recognition Service) 將語音轉成文字, 據說可以翻譯 50 幾種語言 :

# [Open Jarvis] 如何讓Python 自動將語音轉譯成文字?

我發現這個大數學堂有許多非常棒的教學, 例如網路爬蟲, R 語言, 影像辨識, 交易系統, Open Jarvis 等等, 參考 :

# 大數軟體有限公司 (Youtube)
# http://www.largitdata.com/course_list/14

# https://www.openjarvis.com

# https://devpost.com/software/open-jarvis

看完上面影片後覺得利用 Python 透過 Google 語音辨識 API 做語音轉文字竟然如此簡單, 不禁躍躍欲試, 晚飯過後迫不及待要驗證一番.

安裝 SpeechRecognition 套件只要在命令提示字元視窗下 pip 或 pip3 install SpeechRecognition 指令即可 :

D:\Python>pip3 install SpeechRecognition

如果是在防火牆內 (例如公司網路) 無法直接安裝, 出現如下錯誤訊息 :

" Could not find a version that satisfies the requirement SpeechRecognition (from versions: )
No matching distribution found for SpeechRecognition"

可先到 PyPi 網站下載 whl 檔安裝 (約 32MB) :

# https://pypi.python.org/pypi/SpeechRecognition/

D:\>cd python

D:\Python>pip3 install SpeechRecognition-3.7.1-py2.py3-none-any.whl
Processing d:\python\speechrecognition-3.7.1-py2.py3-none-any.whl
Installing collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.7.1

用 pip3 list 檢查確實已安裝此套件 :

D:\Python>pip3 list
DEPRECATION: The default format will switch to columns in the future. You can us
e --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.con
f under the [list] section) to disable this warning.
adafruit-ampy (1.0.1)
beautifulsoup4 (4.5.3)
click (6.7)
colorama (0.3.9)
cycler (0.10.0)
Django (1.8.18)
matplotlib (2.0.0)
mpfshell (0.8.0)
numpy (1.12.1+mkl)
olefile (0.44)
pandas (0.19.2)
Pillow (4.1.0)
pip (9.0.1)
py2exe (0.9.2.0)
PyAutoGUI (0.9.36)
pyFirmata (1.0.3)
PyMsgBox (1.0.6)
pyparsing (2.2.0)
PyScreeze (0.1.9)
pyserial (3.3)
python-dateutil (2.6.0)
PyTweening (1.0.3)
pytz (2017.2)
pyudev (0.21.0)
requests (2.13.0)
rshell (0.0.9)
scikit-learn (0.18.1)
scipy (0.19.0)
setuptools (28.8.0)
six (1.10.0)
SpeechRecognition (3.7.1)
virtualenv (15.1.0)
websocket-client (0.40.0)

然後馬上照教學影片中的範例程式來測試 :

>>> import speech_recognition
>>> r=speech_recognition.Recognizer()
>>> with speech_recognition.Microphone() as source:
audio=r.listen(source)

但是呼叫 listen() 方法時卻出現錯誤 :

Traceback (most recent call last):
File "C:\Python36\lib\site-packages\speech_recognition\__init__.py", line 108, in get_pyaudio
import pyaudio
ModuleNotFoundError: No module named 'pyaudio'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
with speech_recognition.Microphone() as source:
File "C:\Python36\lib\site-packages\speech_recognition\__init__.py", line 79, in __init__
self.pyaudio_module = self.get_pyaudio()
File "C:\Python36\lib\site-packages\speech_recognition\__init__.py", line 110, in get_pyaudio
raise AttributeError("Could not find PyAudio; check installation")
AttributeError: Could not find PyAudio; check installation

意思是還缺一個 PyAudio 模組, 於是回到 PyPi 閱讀 SpeechRecognition 套件說明, 原來此套件若使用麥克風當作音源輸入的話, 必須安裝 PyAudio 模組才行 :

"PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations.
If not installed, everything in the library will still work, except attempting to instantiate a Microphone object will raise an AttributeError."

用下列指令安裝 PyAudio :

D:\Python> pip3 install PyAudio

或者從 PyPi 下載 whl 安裝檔 :

# https://pypi.python.org/pypi/PyAudio/0.2.11#downloads

D:\Python>pip3 install PyAudio-0.2.11-cp36-cp36m-win_amd64.whl
Processing d:\python\pyaudio-0.2.11-cp36-cp36m-win_amd64.whl
Installing collected packages: PyAudio
Successfully installed PyAudio-0.2.11

安裝好後再次執行 listen() 呼叫仍然出現錯誤訊息 "No Default Input Device Available" :

>>> with speech_recognition.Microphone() as source:
audio=r.listen(source)

Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
with speech_recognition.Microphone() as source:
File "C:\Python36\lib\site-packages\speech_recognition\__init__.py", line 86, in __init__
device_info = audio.get_device_info_by_index(device_index) if device_index is not None else audio.get_default_input_device_info()
File "C:\Python36\lib\site-packages\pyaudio.py", line 949, in get_default_input_device_info
device_index = pa.get_default_input_device()
OSError: No Default Input Device Available

我查 Google 找到下面這篇, 有人建議還要安裝 "PortAudio" :

# PyAudio IOError: No Default Input Device Available

我找到 PortAudio 網站下載 tgz 檔, 解開後發現是一堆包含 html 與 sh 副檔名的檔案, 搞不清楚是要安裝還是要放在哪裡. 後來想到該不會是因為我還沒插上麥克風的關係吧? 找出已經很久沒用的麥克風插進電腦的麥克風孔, 再次執行 listen() 就不會報錯了.

不過執行 listen() 後我對麥克風說了句 "您好嗎" 就停在那邊很久都沒反應, 後來找到下面文章, 原來要先用 adjust_for_ambient_noise() 函數調整麥克風的噪音 :

# Easy Speech Recognition in Python with PyAudio and Pocketsphinx

我將其範例程式修改如下 :

import speech_recognition as sr

#obtain audio from the microphone
r=sr.Recognizer()
with sr.Microphone() as source:
print("Please wait. Calibrating microphone...")
#listen for 5 seconds and create the ambient noise energy level
r.adjust_for_ambient_noise(source, duration=5)
print("Say something!")
audio=r.listen(source)

# recognize speech using Google Speech Recognition
try:
print("Google Speech Recognition thinks you said:")
print(r.recognize_google(audio, language="zh-TW"))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("No response from Google Speech Recognition service: {0}".format(e))

上面程式是利用 SpeechRecognition 模組中的 recognixe_google() 函數透過 Google 語音辨識 API 來將麥克風收到的語音物件 audio 辨識成指定語系的文字 :

r.recognize_google(audio, language='zh-TW')

這裡要傳入語音物件 audio 與 language 參數, 指定語系為繁體中文的 "zh-TW", 不過中文語系下唸英文也是可以辨識出來的. 將此程式存成 google_sr.py 後執行, 果然就成功了 :

D:\ESP8266\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
你好嗎

D:\ESP8266\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
Google Speech Recognition could not understand audio

D:\ESP8266\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
新年快樂

D:\ESP8266\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
川普是笨蛋

D:\ESP8266\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
金正恩是瘋子

D:\ESP8266\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
good morning

D:\ESP8266\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
Donald trump is stupid

Google 果然厲害! 辨識率 100%!

參考 :

# SPEECH RECOGNİTİON WİTH PYTHON
# Coding Jarvis in Python in 2016
# Speech Recognition with Python
# 透過 Python 使用 Google Speech Recognition 語音辨識服務
# https://www.youtube.com/channel/UCFdTiwvDjyc62DBWrlYDtlQ

2017-08-25 補充 :

早上提早到公司也如法泡製一番, 發現可能因為防火牆無法運作 :

D:\Python\test>python google_sr.py
Please wait. Calibrating microphone...
Say something!
Google Speech Recognition thinks you said:
No response from Google Speech Recognition service: recognition connection faile
d: [WinError 10060] 連線嘗試失敗，因為連線對象有一段時間並未正確回應，或是連線建
立失敗，因為連線的主機無法回應。

21 則留言 :

霧丸提到...: 您好，拜讀您很多文章，受益良多大感謝，這個語音辨識的是不是也有辦法用 MicroPython 實作出來; 2017年12月8日上午9:25
小狐狸事務所提到...: 我覺得有困難, 因為這些模組好像沒有 MicroPython 的版本, 而且 ESP8266 還要驅動麥克風與喇叭哩. 我想樹莓派應該可以.; 2017年12月8日上午10:01
Unknown 提到...: https://drive.google.com/open?id=1mpi6PbWNZuCpLkgoMu1e5CBrSme2VSoj
不好意思我安裝的時候遇到這個問題!; 2018年2月20日晚上10:53
小狐狸事務所提到...: 應該是 pip 版本太舊啦!; 2018年2月21日凌晨12:38
Unknown 提到...: 不好意思我是新手我把原本python刪掉照http://yhhuang1966.blogspot.tw/2017/04/windows-python.html灌3.6.4可是在cmd會找不到pip (https://drive.google.com/open?id=1TILSy5E8JYP63ksVIJAIar85oiokfd7n)謝謝你~; 2018年2月21日晚上10:23
小狐狸事務所提到...: 安裝畫面第二張圖的 pip 要勾選, 另外第三張圖的 add python to environmental variables 也要勾選.; 2018年2月22日清晨7:39
Unknown 提到...: 不好意思我升級玩pip後還是有紅色的error
https://drive.google.com/open?id=17CaXMhHHzG5QoDqYV87fDkTequtC6wNR; 2018年2月22日晚上10:05
Unknown 提到...: 他有需要另外安裝什麼嗎!?; 2018年2月22日晚上11:29
小狐狸事務所提到...: 沒有喔! 用 PIP3 試試看, 應該設定系統環境變數, 在其他目錄操作, 不要在 ProgramFile 下操作.; 2018年2月23日上午10:58
Unknown 提到...: 不好意思我用pip3在別的目錄下操作還是有這段錯誤，請問是要設什麼環境變數!?
謝謝!!; 2018年2月23日下午3:57
小狐狸事務所提到...: 在控制台/系統/環境變數/裡編輯系統變數 path, 把安裝的 Python 路徑前面加 ; 號放到結尾, 例如 ;C:\Python36\Scripts\;C:\Python36\; 這樣在任何目錄下都可以執行 Python.
錯誤訊息似乎與權限有關, 試試看以系統管理員身分開啟命令提示字元視窗, 再執行 PYTHON.
開始 > 命令提示字元視窗 > 以系統管理員身分執行; 2018年2月23日下午4:22
小狐狸事務所提到...: OK 嗎?; 2018年2月26日上午9:08
Unknown 提到...: 不好意思我按照你的影片步驟出現這個
https://drive.google.com/file/d/1UlwiOtI7xfZk9pkkmKPknM5nzvzRSseW/view?usp=sharing; 2018年2月28日中午12:25
小狐狸事務所提到...: 應該是安裝套件沒有成功 :

pip3 install SpeechRecognition-3.7.1-py2.py3-none-any.whl

使用 Anaconda 好像要用 conda 去安裝, 我純粹使用命令列.; 2018年2月28日晚上7:20
Unknown 提到...: 我是用pi 3
pip3 list 有 SpeechRecognition 3.8.1 為甚麼還會這樣?; 2018年2月28日晚上10:59
小狐狸事務所提到...: 我只在 Windows 上測試 OK, 還沒時間在 Pi 3 上面試, 我找時間玩看看.; 2018年3月1日上午8:27
Unknown 提到...: 請問沒有麥克風,想直接使用錄音檔,該如何操作?; 2018年3月14日下午4:46
小狐狸事務所提到...: Sorry, 還沒研究如何上傳錄音檔案.好像要先存放在 Google Cloud Storage, 參考 :

https://cloud.google.com/speech/docs/sync-recognize; 2018年3月14日晚上11:14
Unknown 提到...: 老師您好，想請教您一下，這個GOOGLE的語音辨識有辦法做喚醒的功能嗎？比方說我只要喊OK GOOGLE然後它就自動開啟辨識，再來我就可以唸我要辨識的內容，請問有這樣的功能嗎，謝謝; 2018年9月9日凌晨3:11
小狐狸事務所提到...: 這個我不確定, 但很有興趣, 值得進一步研究看看喔; 2018年9月10日上午11:28
小狐狸事務所提到...: 這個我不確定, 但很有興趣, 值得進一步研究看看喔; 2018年9月10日中午12:42