2020年12月9日 星期三

Anaconda 離線安裝 NLTK 語料庫找不到路徑問題

這幾天安裝 Anaconda 後發現它已內建 NLTK 套件, 不過語料庫需自行下載安裝, 開啟 Anaconda shell 匯入 nltk 套件呼叫 nltk.download() 下載語料庫 :

(base) PS C:\Users\user> python
Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk  
>>> nltk.download()   

執行 nltk.download() 後會連線 GitHub 並跳出一個視窗顯示語料庫選項, 可選擇所要之語料庫或全部下載, 但可能是公司防火牆阻擋而連線失敗 : 




看來唯一的辦法就是像安裝 wheel 檔那樣看看是否能離線安裝, 我找到下面這篇文章, 作者直接下載語料庫 zip 壓縮檔 : 


GitHub NLTK 語料庫下載網址 :





解壓縮後將目錄由 nltk_data_gh_pages 改成 nltk_data 並移到 d: 資料夾下, 也可以放在在線安裝之預設目錄 C:\\Users\\YOURNAME\\AppData\\Roaming\\nltk_data, 但似乎因為找不到 nltk_data 而出現錯誤 :  

>>> from nltk.book import *    
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
Traceback (most recent call last):
  File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 83, in __load
    root = nltk.data.find("{}/{}".format(self.subdir, zip_name))
  File "C:\Users\user\anaconda3\lib\site-packages\nltk\data.py", line 585, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource gutenberg not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('gutenberg')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/gutenberg.zip/gutenberg/

  Searched in:
    - 'C:\\Users\\user/nltk_data'
    - 'C:\\Users\\user\\anaconda3\\nltk_data'
    - 'C:\\Users\\user\\anaconda3\\share\\nltk_data'
    - 'C:\\Users\\user\\anaconda3\\lib\\nltk_data'
    - 'C:\\Users\\user\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
**********************************************************************

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\user\anaconda3\lib\site-packages\nltk\book.py", line 27, in <module>
    text1 = Text(gutenberg.words("melville-moby_dick.txt"))
  File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 120, in __getattr__
    self.__load()
  File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 85, in __load
    raise e
  File "C:\Users\user\anaconda3\lib\site-packages\nltk\corpus\util.py", line 80, in __load
    root = nltk.data.find("{}/{}".format(self.subdir, self.__name))
  File "C:\Users\user\anaconda3\lib\site-packages\nltk\data.py", line 585, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource gutenberg not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('gutenberg')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/gutenberg

  Searched in:
    - 'C:\\Users\\user/nltk_data'
    - 'C:\\Users\\user\\anaconda3\\nltk_data'
    - 'C:\\Users\\user\\anaconda3\\share\\nltk_data'
    - 'C:\\Users\\user\\anaconda3\\lib\\nltk_data'
    - 'C:\\Users\\user\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
**********************************************************************

我把 nltk_data 複製一份到 E 碟中也是沒用, 看錯誤訊息中的搜索路徑有 E:\\nltk_data 啊! 真是奇怪怎會找不到. 

沒有留言 :