這兩天重新審視了字集過濾的規則, 修正了規則 5 以免規則 6 形同具文, 為了驗證這些規則的正確性, 使用有限的 14 個排列來測試, 以下是測試紀錄. 本系列之前的文章參考 :
本系列之前的文章參考 :
測試用的 14 個排列語料如下 (檔名 permutation_test_in.csv) :
ability,give,about,echo,ugly,idea,fake,grow,close,brick,merge,nature
ability,give,about,echo,ugly,idea,fake,grow,close,obey,merge,nature
sugar,frown,pole,million,hair,close,silent,apology,engage,dish,harvest,license
abandon,ability,bag,balance,key,scan,select,shock,radar,radio,ugly,umbrella
abandon,baby,cabbage,dad,eager,fabric,gadget,habit,ice,jacket,kangaroo,lab
sugar,frown,pole,million,hair,close,silent,apology,engage,dish,harvest,license
cinnamon,merge,more,memory,grow,anchor,auto,major,push,desk,mass,swallow
knife,brick,quote,interest,kind,jealous,afraid,jar,job,much,hat,umbrella
cost,brick,gate,interest,depart,jealous,afraid,diamond,merit,much,hat,umbrella
cost,brick,gate,interest,depart,face,afraid,diamond,merit,much,hat,umbrella
give,fade,pole,million,gift,close,silent,apology,engage,face,ginger,license
give,fade,pole,million,date,close,silent,apology,engage,face,ginger,license
cheap,fade,pole,check,cheese,close,silent,apology,engage,this,thing,write
cheap,fade,pole,mail,cheese,close,silent,apology,engage,this,thing,write
測試程式如下 (檔名 words_permutation_test.py) :
import re
import time
start=time.time()
# 資料前處理 : 讀取 CSV 檔轉成串列
with open('permutation_test_in.csv', 'r', encoding='utf8') as fr:
with open('permutation_test_out.csv', 'w', encoding='utf8') as fw:
lines=fr.readlines()
i=1 # 排列計數器
for line in lines:
words=line.replace('\n', '')
words=words.split(',')
print(i, end=':') # 印出排列數
print(words) # 印出排列 (12 字的 tuple)
i=i+1 # 排列數增量 1
#rule1 : 母音字母 (a, e, i, o, u) 開頭的字最多出現 5 次
ptn=re.compile('^[aeiou].*') # 母音字母開始
first=[w[0] for w in words if re.match(ptn, w)]
if len(first) > 5: # 母音開頭字超過 5 次
print(" : rule 1 excluded")
continue
#rule2 : 相同字母開頭的字最少 1 組, 最多 4 組, 母音與子音可同時併計
first=[w[0] for w in words] # 找出各字之開頭字母串列
first_diff=list(set(first)) # 找出不同開頭字母串列
fc=[first.count(fd) for fd in first_diff]
fc1=[True if c > 1 else False for c in fc]
if fc1.count(True) < 2 or fc1.count(True) > 4:
print(" : rule 2 excluded")
continue
# rule3: 相同字母開頭的字最少 2 個, 最多 4 個
first=[w[0] for w in words]
first_diff=list(set(first))
fc=[first.count(fd) for fd in first_diff]
fc1=[True if first.count(fd) > 1 else False for fd in first_diff]
fc2=[True if first.count(fd) > 4 else False for fd in first_diff]
if fc1.count(True) < 1 or fc2.count(True) > 0:
print(" : rule 3 excluded")
continue
# rule4: 以 j, k, q, y, z 開頭的字最多只能有 1 個
ptn=re.compile('^[jkqyz].*')
first=[w[0] for w in words if re.match(ptn, w)]
first_diff=list(set(first))
fc=[first.count(fd) for fd in first_diff]
if sum(fc) >= 2:
print(" : rule 4 excluded")
continue
# rule5: 前 2 個字母開頭相同的字最多只能出現 2 次 (sh, ch, th, wr, un 例外)
ptn='^(?!(sh|ch|th|wr|un))[a-zA-Z]*'
first2=[w[0:2] for w in words if re.match(ptn, w)]
first2_diff=list(set(first2))
f2c=[True if first2.count(fd) > 2 else False for fd in first2_diff]
if f2c.count(True) > 0: #
print(" : rule 5 excluded")
continue
# rule6: 以 sh, ch, th, wr, un 開頭的字, 前三個字母相同者不能超過 2 個
ptn=re.compile('^(sh|ch|th|wr|un)[a-zA-Z]*')
first3=[w[0:3] for w in words if re.match(ptn, w)]
first3_diff=list(set(first3))
f3c=[True if first3.count(fd) > 2 else False for fd in first3_diff]
if f3c.count(True) > 0: #
print(" : rule 6 excluded")
continue
# 通過上面 6 個過濾 : 存入檔案
str=' '.join(words)
print(str)
fw.write(str + '\n')
end=time.time()
print(f'time elapsed : {end-start}')
執行結果如下 :
>>> %Run words_permutation_test.py
1:['ability', 'give', 'about', 'echo', 'ugly', 'idea', 'fake', 'grow', 'close', 'brick', 'merge', 'nature']
ability give about echo ugly idea fake grow close brick merge nature
2:['ability', 'give', 'about', 'echo', 'ugly', 'idea', 'fake', 'grow', 'close', 'obey', 'merge', 'nature']
: rule 1 excluded
3:['sugar', 'frown', 'pole', 'million', 'hair', 'close', 'silent', 'apology', 'engage', 'dish', 'harvest', 'license']
sugar frown pole million hair close silent apology engage dish harvest license
4:['abandon', 'ability', 'bag', 'balance', 'key', 'scan', 'select', 'shock', 'radar', 'radio', 'ugly', 'umbrella']
: rule 2 excluded
5:['abandon', 'baby', 'cabbage', 'dad', 'eager', 'fabric', 'gadget', 'habit', 'ice', 'jacket', 'kangaroo', 'lab']
: rule 2 excluded
6:['sugar', 'frown', 'pole', 'million', 'hair', 'close', 'silent', 'apology', 'engage', 'dish', 'harvest', 'license']
sugar frown pole million hair close silent apology engage dish harvest license
7:['cinnamon', 'merge', 'more', 'memory', 'grow', 'anchor', 'auto', 'major', 'push', 'desk', 'mass', 'swallow']
: rule 3 excluded
8:['knife', 'brick', 'quote', 'interest', 'kind', 'jealous', 'afraid', 'jar', 'job', 'much', 'hat', 'umbrella']
: rule 4 excluded
9:['cost', 'brick', 'gate', 'interest', 'depart', 'jealous', 'afraid', 'diamond', 'merit', 'much', 'hat', 'umbrella']
cost brick gate interest depart jealous afraid diamond merit much hat umbrella
10:['cost', 'brick', 'gate', 'interest', 'depart', 'face', 'afraid', 'diamond', 'merit', 'much', 'hat', 'umbrella']
cost brick gate interest depart face afraid diamond merit much hat umbrella
11:['give', 'fade', 'pole', 'million', 'gift', 'close', 'silent', 'apology', 'engage', 'face', 'ginger', 'license']
: rule 5 excluded
12:['give', 'fade', 'pole', 'million', 'date', 'close', 'silent', 'apology', 'engage', 'face', 'ginger', 'license']
give fade pole million date close silent apology engage face ginger license
13:['cheap', 'fade', 'pole', 'check', 'cheese', 'close', 'silent', 'apology', 'engage', 'this', 'thing', 'write']
: rule 6 excluded
14:['cheap', 'fade', 'pole', 'mail', 'cheese', 'close', 'silent', 'apology', 'engage', 'this', 'thing', 'write']
cheap fade pole mail cheese close silent apology engage this thing write
time elapsed : 0.11959671974182129
可見規則 1~6 都有正確過濾 (7 個被濾掉), 輸出檔內容如下 :
ability give about echo ugly idea fake grow close brick merge nature
sugar frown pole million hair close silent apology engage dish harvest license
sugar frown pole million hair close silent apology engage dish harvest license
cost brick gate interest depart jealous afraid diamond merit much hat umbrella
cost brick gate interest depart face afraid diamond merit much hat umbrella
give fade pole million date close silent apology engage face ginger license
cheap fade pole mail cheese close silent apology engage this thing write
14 個排列有 7 個被規則 1~6 過濾掉, 剩下 7 個.
沒有留言:
張貼留言