這幾天安裝 Anaconda 後發現它已內建 NLTK 套件, 不過語料庫需自行下載安裝, 開啟 Anaconda shell 匯入 nltk 套件呼叫 nltk.download() 下載語料庫 :
(base) PS C:\Users\user> python
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
執行 nltk.download() 後會連線 GitHub 並跳出一個視窗顯示語料庫選項, 可選擇所要之語料庫或全部下載, 但可能是公司防火牆阻擋而連線失敗 :
看來唯一的辦法就是像安裝 wheel 檔那樣看看是否能離線安裝, 我找到下面這篇文章, 作者直接下載語料庫 zip 壓縮檔 :
GitHub NLTK 語料庫下載網址 :
解壓縮後將目錄由 nltk_data_gh_pages 改成 nltk_data 並移到 d: 資料夾下, 也可以放在在線安裝之預設目錄 C:\\Users\\YOURNAME\\AppData\\Roaming\\nltk_data, 但似乎因為找不到 nltk_data 而出現錯誤 :
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
Traceback (most recent call last):
File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 83, in __load
root = nltk.data.find("{}/{}".format(self.subdir, zip_name))
File "C:\Users\user\anaconda3\lib\site-packages\nltk\data.py", line 585, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mgutenberg[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('gutenberg')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mcorpora/gutenberg.zip/gutenberg/[0m
Searched in:
- 'C:\\Users\\user/nltk_data'
- 'C:\\Users\\user\\anaconda3\\nltk_data'
- 'C:\\Users\\user\\anaconda3\\share\\nltk_data'
- 'C:\\Users\\user\\anaconda3\\lib\\nltk_data'
- 'C:\\Users\\user\\AppData\\Roaming\\nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
**********************************************************************
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\user\anaconda3\lib\site-packages\nltk\book.py", line 27, in <module>
text1 = Text(gutenberg.words("melville-moby_dick.txt"))
File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 120, in __getattr__
self.__load()
File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 85, in __load
raise e
File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 80, in __load
root = nltk.data.find("{}/{}".format(self.subdir, self.__name))
File "C:\Users\user\anaconda3\lib\site-packages\nltk\data.py", line 585, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mgutenberg[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('gutenberg')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mcorpora/gutenberg[0m
Searched in:
- 'C:\\Users\\user/nltk_data'
- 'C:\\Users\\user\\anaconda3\\nltk_data'
- 'C:\\Users\\user\\anaconda3\\share\\nltk_data'
- 'C:\\Users\\user\\anaconda3\\lib\\nltk_data'
- 'C:\\Users\\user\\AppData\\Roaming\\nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
**********************************************************************
我把 nltk_data 複製一份到 E 碟中也是沒用, 看錯誤訊息中的搜索路徑有 E:\\nltk_data 啊! 真是奇怪怎會找不到.
沒有留言 :
張貼留言