2018年10月22日 星期一

Python Fintech 學習筆記 : 從 Yahoo Finance 擷取台股成交資料

上周利用 "Python 網頁程式交易 APP 實作" 第 14 章的程式去下載 Google Finance 台股歷史資料, 測試結果失敗, 因為 Google 已經關閉了歷史資料下載功能, 只提供線型瀏覽. 殘念! 不過另一個抓 Yahoo Finance 的範例程式倒是可用, 以下為測試紀錄.

Python Fintech 測試系列此前文章參考 :

Python Fintech 學習筆記 : 安裝技術指標套件 TA-Lib
Python Fintech 學習筆記 : Google Finance 無法下載歷史資料

"Python 網頁程式交易 APP 實作" 範例程式下載連結 :

https://github.com/letylin/pyptbook

我下載的是第二版範例 :

# 二版課本範例程式.zip

Yahoo 版抓取程式為其中的 E_14_1.py, 但在我的電腦直接執行時卻出現缺少 pymysql, xlrd, fix_yahoo_finance 與 openpyxl 等模組之錯誤訊息, 須用 pip3 安裝 :

D:\Python\E_14_1>pip3 install pymysql
Collecting pymysql
  Downloading https://files.pythonhosted.org/packages/a7/7d/682c4a7da195a678047c8f1c51bb7682aaedee1dca7547883c3993ca9282/PyMySQL-0.9.2-py2.py3-none-any.whl (47kB)
Collecting cryptography (from pymysql)
  Downloading https://files.pythonhosted.org/packages/f1/01/a144ec664d3f9ae5837bd72c4d11bdd2d8d403318898e4092457e8af9272/cryptography-2.3.1-cp36-cp36m-win_amd64.whl (1.3MB)
Requirement already satisfied: six>=1.4.1 in c:\python36\lib\site-packages (from cryptography->pymysql) (1.11.0)
Requirement already satisfied: idna>=2.1 in c:\python36\lib\site-packages (from cryptography->pymysql) (2.6)
Collecting asn1crypto>=0.21.0 (from cryptography->pymysql)
  Downloading https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)
Collecting cffi!=1.11.3,>=1.7 (from cryptography->pymysql)
  Downloading https://files.pythonhosted.org/packages/2f/85/a9184548ad4261916d08a50d9e272bf6f93c54f3735878fbfc9335efd94b/cffi-1.11.5-cp36-cp36m-win_amd64.whl (166kB)
Collecting pycparser (from cffi!=1.11.3,>=1.7->cryptography->pymysql)
  Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
Building wheels for collected packages: pycparser
  Running setup.py bdist_wheel for pycparser: started
  Running setup.py bdist_wheel for pycparser: finished with status 'done'
  Stored in directory: C:\Users\Tony\AppData\Local\pip\Cache\wheels\f2\9a\90\de94f8556265ddc9d9c8b271b0f63e57b26fb1d67a45564511
Successfully built pycparser
Installing collected packages: asn1crypto, pycparser, cffi, cryptography, pymysql
Successfully installed asn1crypto-0.24.0 cffi-1.11.5 cryptography-2.3.1 pycparser-2.19 pymysql-0.9.2

D:\Python\E_14_1>pip3 install xlrd
Collecting xlrd
  Downloading https://files.pythonhosted.org/packages/07/e6/e95c4eec6221bfd8528bcc4ea252a850bffcc4be88ebc367e23a1a84b0bb/xlrd-1.1.0-py2.py3-none-any.whl (108kB)
Installing collected packages: xlrd
Successfully installed xlrd-1.1.0

D:\Python\E_14_1>pip3 install fix_yahoo_finance
Collecting fix_yahoo_finance
  Downloading https://files.pythonhosted.org/packages/0a/96/d44330e427f5368cb8abd25997b72956a31b52073d285c4d5cd56e5fdc17/fix-yahoo-finance-0.0.22.tar.gz
Requirement already satisfied: pandas in c:\python36\lib\site-packages (from fix_yahoo_finance) (0.22.0)
Requirement already satisfied: numpy in c:\python36\lib\site-packages (from fix_yahoo_finance) (1.14.1)
Requirement already satisfied: requests in c:\python36\lib\site-packages (from fix_yahoo_finance) (2.18.4)
Collecting multitasking (from fix_yahoo_finance)
  Downloading https://files.pythonhosted.org/packages/ac/1a/0750416c5e3683d170757e423f097fdf78ceb9ccdc65658b24341664e53e/multitasking-0.0.7.tar.gz
Requirement already satisfied: python-dateutil>=2 in c:\python36\lib\site-packages (from pandas->fix_yahoo_finance) (2.6.1)
Requirement already satisfied: pytz>=2011k in c:\python36\lib\site-packages (from pandas->fix_yahoo_finance) (2018.3)
Requirement already satisfied: chardet<3 .1.0="">=3.0.2 in c:\python36\lib\site-packages (from requests->fix_yahoo_finance) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\python36\lib\site-packages (from requests->fix_yahoo_finance) (2018.4.16)
Requirement already satisfied: urllib3<1 .23="">=1.21.1 in c:\python36\lib\site-packages (from requests->fix_yahoo_finance) (1.22)
Requirement already satisfied: idna<2 .7="">=2.5 in c:\python36\lib\site-packages (from requests->fix_yahoo_finance) (2.6)
Requirement already satisfied: six>=1.5 in c:\python36\lib\site-packages (from python-dateutil>=2->pandas->fix_yahoo_finance) (1.11.0)
Building wheels for collected packages: fix-yahoo-finance, multitasking
  Running setup.py bdist_wheel for fix-yahoo-finance: started
  Running setup.py bdist_wheel for fix-yahoo-finance: finished with status 'done'
  Stored in directory: C:\Users\Tony\AppData\Local\pip\Cache\wheels\2c\ca\ce\218a19aaecf63fd74c75d6a6772b1a799fa05826d8762bfd83
  Running setup.py bdist_wheel for multitasking: started
  Running setup.py bdist_wheel for multitasking: finished with status 'done'
  Stored in directory: C:\Users\Tony\AppData\Local\pip\Cache\wheels\41\e4\48\af808a1c57f43f104042abdaf80fa623ab213ca0268ba4189c
Successfully built fix-yahoo-finance multitasking
Installing collected packages: multitasking, fix-yahoo-finance
Successfully installed fix-yahoo-finance-0.0.22 multitasking-0.0.7

D:\Python\E_14_1>pip3 install openpyxl 
Collecting openpyxl
  Downloading https://files.pythonhosted.org/packages/e5/0a/e0a095149a23cedd9c8db6cdde2af7f82105e219e14edea0c31a19aeff9e/openpyxl-2.5.8.tar.gz (1.9MB)
Collecting jdcal (from openpyxl)
  Downloading https://files.pythonhosted.org/packages/a0/38/dcf83532480f25284f3ef13f8ed63e03c58a65c9d3ba2a6a894ed9497207/jdcal-1.4-py2.py3-none-any.whl
Collecting et_xmlfile (from openpyxl)
  Downloading https://files.pythonhosted.org/packages/22/28/a99c42aea746e18382ad9fb36f64c1c1f04216f41797f2f0fa567da11388/et_xmlfile-1.0.1.tar.gz
Building wheels for collected packages: openpyxl, et-xmlfile
  Running setup.py bdist_wheel for openpyxl ... done
  Stored in directory: C:\Users\Tony Huang\AppData\Local\pip\Cache\wheels\3f\37\28\5ab3dffb7ff261e6fa21455ec9d157f95958e818c6b89f024c
  Running setup.py bdist_wheel for et-xmlfile ... done
  Stored in directory: C:\Users\Tony Huang\AppData\Local\pip\Cache\wheels\2a\77\35\0da0965a057698121fc7d8c5a7a9955cdbfb3cc4e2423cad39
Successfully built openpyxl et-xmlfile
Installing collected packages: jdcal, et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.0.1 jdcal-1.4 openpyxl-2.5.8

另外還需要 pandas 模組處理 DataFrame 物件以儲存所擷取之資料, 需用 pip3 安裝 pandas, 但我電腦之前已安裝過, 因此不會有找不到 pandas 問題.

書中範例 E_14_1.py 中我只改了要抓取的起迄日期 :

    start = datetime.datetime(2017,10,1)
    end = datetime.datetime(2018,10,19)

執行結果如下 :

D:\Python\E_14_1>python E_14_1.py 

[*********************100%***********************]  1 of 1 downloaded
           Date       Open       High        Low      Close    Volume
0    2007-10-01  45.558601  46.860199  45.233200  45.558601  16490774
1    2007-10-02  45.558601  46.372101  44.907700  45.151798  10906577
2    2007-10-03  45.151798  45.883999  43.524700  44.419601  17748233
3    2007-10-04  43.931499  43.931499  41.409500  41.979000  22471999
4    2007-10-05  41.816200  43.036598  41.490799  42.792500  18974962
5    2007-10-08  43.117901  43.768799  42.711201  43.524700   9772038
6    2007-10-09  43.117901  43.199299  41.653500  43.199299   8877189
7    2007-10-11  43.199299  46.209400  43.199299  46.209400  38372764
8    2007-10-12  46.209400  47.917900  45.151798  45.151798  25902660
9    2007-10-15  45.558601  46.778900  44.500900  45.151798  12476249
10   2007-10-16  44.419601  44.826401  43.850101  44.175499  10468987
11   2007-10-17  43.931499  44.338200  42.873901  43.850101  10294442
12   2007-10-18  43.850101  44.744999  43.117901  44.338200   7405852
13   2007-10-19  44.338200  44.744999  43.850101  44.012798   4603306
14   2007-10-22  42.304401  42.467098  41.490799  42.223000  11138894
15   2007-10-23  42.223000  42.711201  41.979000  42.385700   8129844
16   2007-10-24  42.711201  43.850101  41.897598  41.897598  10767679
17   2007-10-25  42.385700  43.036598  41.897598  41.897598   7598835
18   2007-10-26  42.304401  43.524700  42.223000  43.524700  11097101
19   2007-10-29  43.768799  44.744999  43.117901  44.744999  11185603
20   2007-10-30  44.582298  44.582298  43.605999  43.687401   5092522
21   2007-10-31  43.687401  43.768799  42.060299  42.629799   8389202
22   2007-11-01  43.280602  43.443298  40.514599  40.921299  12767567
23   2007-11-02  40.270500  40.270500  38.968800  39.050201  16188394
24   2007-11-05  38.684101  38.684101  38.236599  38.562099  15356235
25   2007-11-06  39.457001  39.863701  38.765499  39.457001  14665432
26   2007-11-07  40.270500  40.473900  39.619701  39.945099  11753487
27   2007-11-08  39.212898  39.212898  38.318001  38.846802   7270642
28   2007-11-09  38.846802  39.538300  37.870602  39.538300   9424178
29   2007-11-12  38.236599  38.562099  36.772301  37.016300  10525529
...         ...        ...        ...        ...        ...       ...
2442 2017-08-21  27.200001  27.350000  27.049999  27.250000   1832333
2443 2017-08-22  27.299999  27.400000  27.250000  27.299999   1246889
2444 2017-08-23  27.400000  27.400000  27.100000  27.250000   1961836
2445 2017-08-24  27.200001  27.450001  27.150000  27.450001   1893695
2446 2017-08-25  27.500000  27.549999  27.350000  27.350000   2735921
2447 2017-08-28  27.450001  27.450001  27.350000  27.400000   2451080
2448 2017-08-29  27.400000  27.400000  27.250000  27.400000   3203620
2449 2017-08-30  26.600000  27.000000  26.600000  26.900000   3838522
2450 2017-08-31  27.000000  27.350000  27.000000  27.350000   3695972
2451 2017-09-01  27.400000  27.600000  27.299999  27.450001   3451475
2452 2017-09-04  27.299999  27.450001  27.150000  27.450001   1711265
2453 2017-09-05  27.450001  27.600000  27.299999  27.450001   2508250
2454 2017-09-06  27.350000  27.350000  27.100000  27.200001   2100535
2455 2017-09-07  27.200001  27.200001  26.950001  27.100000   2700381
2456 2017-09-08  27.100000  27.400000  27.000000  27.299999   2028684
2457 2017-09-11  27.299999  27.350000  27.100000  27.200001   1863056
2458 2017-09-12  27.250000  27.250000  27.049999  27.150000   1988872
2459 2017-09-13  27.100000  27.100000  26.950001  26.950001   1732912
2460 2017-09-14  26.799999  27.250000  26.799999  27.250000   1714303
2461 2017-09-15  27.250000  27.500000  27.000000  27.500000   3562428
2462 2017-09-18  27.500000  27.650000  27.299999  27.650000   3323904
2463 2017-09-19  27.700001  27.700001  27.450001  27.600000   2012329
2464 2017-09-20  27.299999  27.600000  27.250000  27.549999   1858452
2465 2017-09-21  27.500000  27.500000  27.200001  27.250000   1967469
2466 2017-09-22  27.200001  27.200001  26.950001  26.950001   2382746
2467 2017-09-25  26.950001  27.450001  26.850000  27.350000   2749546
2468 2017-09-26  27.400000  27.400000  27.049999  27.250000   1039137
2469 2017-09-27  27.049999  27.200001  27.000000  27.049999   1616491
2470 2017-09-28  27.049999  27.100000  26.650000  26.700001   4079153
2471 2017-09-29  26.700001  26.950001  26.650000  26.750000   1903258

[2472 rows x 6 columns]
C:\Python36\lib\site-packages\pandas\io\excel.py:784: DeprecationWarning: Call to deprecated function remove_sheet (Use wb.remove(worksheet) or del wb[sheetname]).
  self.book.remove_sheet(self.book.worksheets[0])
程式執行時間 = 5秒

如果擷取失敗會出現下列訊息 :

D:\Python\test\E_14_1>python E_14_1.py
[*********************100%***********************]  1 of 1 downloaded
zero-size array to reduction operation maximum which has no identity
Traceback (most recent call last):
  File "E_14_1.py", line 33, in <module>
    df = main(stkno,start,end) #呼叫主程式
  File "E_14_1.py", line 21, in main
    RowData =gf.getstock(True)
  File "D:\Python\test\E_14_1\ProgramTrade\class_GetYahooFinance.py", line 54, in getstock
    if  len(Data)   != 0:        #有取得資料才進行轉換
UnboundLocalError: local variable 'Data' referenced before assignment


範例程式 E_14_1.py 將擷取網頁與儲存等功能寫成 GetYahooFinance.py 類別, 然後在
E_14_1.py 中建立物件來抓取網頁資料, 這使得程式閱讀起來很彆扭.  事實上此程式是利用第三方模組 fix_yahoo_finance 與 pandas_datareader 來抓資料, 垂回 Pandas 的 DataFrame 物件.

我將其中的核心部分改寫成函數 get_yahoo_twstock(), 如下列範例所示 :

import datetime
import pandas as pd
import fix_yahoo_finance as yf
from pandas_datareader import data as pdr

def get_yahoo_twstock(stock_no, start_date, end_date):
    start=datetime.datetime.strptime(start_date,'%Y%m%d')
    end=datetime.datetime.strptime(end_date,'%Y%m%d')
    yf.pdr_override()
    df=pdr.get_data_yahoo(stock_no + '.tw', start, end)
    del df['Adj Close']
    df=df.reset_index()
    return df

stock_no='0050'
start_date='20170101'
end_date='20181022'
data=get_yahoo_twstock(stock_no, start_date, end_date)
print(data)
print(type(data))

注意, get_yahoo_twstock() 的三個參數都是字串, 起訖日期在函數裡面要用 datetime.datetime.strptime() 將字串轉成 datetime.datetime 物件. 

執行結果如下 :

>>> %Run test.py

[                       0%                       ]
[*********************100%***********************]  1 of 1 downloaded
        Date       Open       High        Low      Close    Volume
0 2018-10-01  86.900002  87.400002  86.900002  87.349998   5912234
1 2018-10-02  87.250000  87.250000  86.000000  86.050003   5434332
2 2018-10-03  86.050003  86.300003  85.800003  85.949997   4369690
3 2018-10-04  85.699997  85.699997  84.800003  84.900002   9736968
4 2018-10-05  84.500000  84.500000  83.000000  83.449997  26744096
5 2018-10-08  82.949997  83.400002  82.500000  83.000000  46512204
6 2018-10-09  83.050003  83.400002  82.800003  83.250000  20530897
7 2018-10-11  80.150002  80.150002  77.349998  77.400002  68551624
<class 'pandas.core.frame.DataFrame'>

可見它沒辦法抓到今天為止的歷史資料, 只能到 11 天前左右, 而且有時候會執行失敗, 但這樣總比 Google Finance 完全不能下載歷史資料好吧! 就是沒辦法抓到今天為止的歷史資料有點可惜, 只能用在回測.

沒有留言 :