# http://www.nltk.org/book_1ed/
下面三篇 NLTK 文章整理得非常清楚 :
# NLTK 初學指南(一):簡單易上手的自然語言工具箱-探索篇
# NLTK 初學指南(二):由外而內,從語料庫到字詞拆解 — 上手篇
# NLTK 初學指南(三):基於 WordNet 的語義關係表示法 — 上下位詞結構篇
我在十年前開始學習 Python 時便下載測試過 NLTK, 因為碩士論文原本想寫計算語言學領域的語料庫部分, 但最後卻挑了實驗語音學來做而與語料庫失之交臂, 畢業後仍然對計算語言學念念不忘.
以下按照其中第一篇文章安裝 NLTK 並小小測試一番以驗證安裝是否成功, 安裝 NLTK 用 pip 指令即可 :
D:\Python>pip3 install -U nltk
Collecting nltk
Downloading https://files.pythonhosted.org/packages/6f/ed/9c755d357d33bc1931e157f537721efb5b88d2c5
83fe593cc09603076cc3/nltk-3.4.zip (1.4MB)
Requirement not upgraded as not directly required: six in c:\python36\lib\site-packages (from nltk)
(1.11.0)
Collecting singledispatch (from nltk)
Downloading https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a9
8fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Building wheels for collected packages: nltk
Running setup.py bdist_wheel for nltk ... done
Stored in directory: C:\Users\cht\AppData\Local\pip\Cache\wheels\4b\c8\24\b2343664bcceb7147efeb21c
0b23703a05b23fcfeaceaa2a1e
Successfully built nltk
Installing collected packages: singledispatch, nltk
Successfully installed nltk-3.4 singledispatch-3.4.0.3
You are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
語料庫資料量高達 3.18GB ! 下載完後點選 "File/Exit" 跳出下載視窗, 回到 Python Shell 視窗會看到 nltk.download() 回傳 True, 表示下載成功. 接著匯入 nltk.book 這個語料庫 :
>>> import nltk
>>> nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
True
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
可見此語料庫有 9 本書, 由於時間有限, 以下只測試其中的 concordance() :
>>> text3.concordance("lived")
結果如下 :
結果會以所搜尋的字為中心排列.
測試 nltk.corpus.brown :
>>> from nltk.corpus import brown
>>> brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
先就此打住, 以後有時間再繼續學習 NLTK 的用法.
參考 :
# https://www.nltk.org/api/nltk.corpus.html
# https://www.lfd.uci.edu/~gohlke/pythonlibs/
# Python如何运行pip和如何安装whl文件(以NLTK为例)
# http://www.pitt.edu/~naraehan/presentation/cmu_dh_workshop_2017.html
# Where can I find a 64-bit version of NLTK to use with 64-bit Python 3.4.2? Should I install 32-bit Python?
2019-06-27 :
沒錯, 下載 NLTK 資料庫時直接點 ALL 下載就不會出現 Partial 了, 今天在 Python 3.7 版重新下載結果是這樣 :
參考 :
# NLTK 初學指南(一):簡單易上手的自然語言工具箱-探索篇
# NLTK 初學指南(二):由外而內,從語料庫到字詞拆解 — 上手篇
# NLTK 初學指南(三):基於 WordNet 的語義關係表示法 — 上下位詞結構篇
我在十年前開始學習 Python 時便下載測試過 NLTK, 因為碩士論文原本想寫計算語言學領域的語料庫部分, 但最後卻挑了實驗語音學來做而與語料庫失之交臂, 畢業後仍然對計算語言學念念不忘.
以下按照其中第一篇文章安裝 NLTK 並小小測試一番以驗證安裝是否成功, 安裝 NLTK 用 pip 指令即可 :
D:\Python>pip3 install -U nltk
Collecting nltk
Downloading https://files.pythonhosted.org/packages/6f/ed/9c755d357d33bc1931e157f537721efb5b88d2c5
83fe593cc09603076cc3/nltk-3.4.zip (1.4MB)
Requirement not upgraded as not directly required: six in c:\python36\lib\site-packages (from nltk)
(1.11.0)
Collecting singledispatch (from nltk)
Downloading https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a9
8fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Building wheels for collected packages: nltk
Running setup.py bdist_wheel for nltk ... done
Stored in directory: C:\Users\cht\AppData\Local\pip\Cache\wheels\4b\c8\24\b2343664bcceb7147efeb21c
0b23703a05b23fcfeaceaa2a1e
Successfully built nltk
Installing collected packages: singledispatch, nltk
Successfully installed nltk-3.4 singledispatch-3.4.0.3
You are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
程式不大 (但語料庫很大), 一下子就安裝完成了. 安裝完後第一件事便是用 nltk.download() 下載語料庫 :
>>> import nltk
>>> nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
新版 NLTK 會跳出 index.xml 呈現的視窗, 案需求依序點選要下載的語料庫後按左下角的 Download 鈕下載 :
完成下載的項目在 Status 欄會顯示 Installed. 此次我是點選 all 以外的全部都下載, 但完成後 all 的狀態還是 partial, 難道想下載全部的話, 一開始就只要點 all 那一項嗎? 在別台電腦下載時再確認看看.
下載的語料庫儲存在 C:\Users\user\AppData\Roaming\nltk_data 底下, AppData 是系統隱藏資料夾, 必須在 "組合管理/資料夾與搜尋選項/檢視" 中開啟顯示隱藏檔選項才找得到 :
>>> import nltk
>>> nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
True
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
可見此語料庫有 9 本書, 由於時間有限, 以下只測試其中的 concordance() :
>>> text3.concordance("lived")
結果如下 :
結果會以所搜尋的字為中心排列.
測試 nltk.corpus.brown :
>>> from nltk.corpus import brown
>>> brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
先就此打住, 以後有時間再繼續學習 NLTK 的用法.
參考 :
# https://www.nltk.org/api/nltk.corpus.html
# https://www.lfd.uci.edu/~gohlke/pythonlibs/
# Python如何运行pip和如何安装whl文件(以NLTK为例)
# http://www.pitt.edu/~naraehan/presentation/cmu_dh_workshop_2017.html
# Where can I find a 64-bit version of NLTK to use with 64-bit Python 3.4.2? Should I install 32-bit Python?
2019-06-27 :
沒錯, 下載 NLTK 資料庫時直接點 ALL 下載就不會出現 Partial 了, 今天在 Python 3.7 版重新下載結果是這樣 :
參考 :
沒有留言 :
張貼留言