File size: 22,363 Bytes
7fb7db7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
**Step1. 学习率和估计器及其数目**

不管怎么样,我们先把学习率先定一个较高的值,这里取 `learning_rate = 0.1`,其次确定估计器`boosting/boost/boosting_type`的类型,不过默认都会选`gbdt`。

为了确定估计器的数目,也就是boosting迭代的次数,也可以说是残差树的数目,参数名为`n_estimators/num_iterations/num_round/num_boost_round`。我们可以先将该参数设成一个较大的数,然后在cv结果中查看最优的迭代次数,具体如代码。

在这之前,我们必须给其他重要的参数一个初始值。初始值的意义不大,只是为了方便确定其他参数。下面先给定一下初始值:

以下参数根据具体项目要求定:

```
'boosting_type'/'boosting': 'gbdt'
'objective': 'regression'
'metric': 'rmse'
```

以下参数我选择的初始值,你可以根据自己的情况来选择:

```
'max_depth': 6     ###   根据问题来定咯,由于我的数据集不是很大,所以选择了一个适中的值,其实4-10都无所谓。
'num_leaves': 50  ###   由于lightGBM是leaves_wise生长,官方说法是要小于2^max_depth
'subsample'/'bagging_fraction':0.8           ###  数据采样
'colsample_bytree'/'feature_fraction': 0.8  ###  特征采样
```

下面我是用LightGBM的cv函数进行演示:

```
params = {
    'boosting_type': 'gbdt', 
    'objective': 'regression', 

    'learning_rate': 0.1, 
    'num_leaves': 50, 
    'max_depth': 6,

    'subsample': 0.8, 
    'colsample_bytree': 0.8, 
    }
data_train = lgb.Dataset(df_train, y_train, silent=True)
cv_results = lgb.cv(
    params, data_train, num_boost_round=1000, nfold=5, stratified=False, shuffle=True, metrics='rmse',
    early_stopping_rounds=50, verbose_eval=50, show_stdv=True, seed=0)

print('best n_estimators:', len(cv_results['rmse-mean']))
print('best cv score:', cv_results['rmse-mean'][-1])
[50]    cv_agg's rmse: 1.38497 + 0.0202823
best n_estimators: 43
best cv score: 1.3838664241
```

由于我的数据集不是很大,所以在学习率为0.1时,最优的迭代次数只有43。那么现在,我们就代入(0.1, 43)进入其他参数的tuning。但是还是建议,在硬件条件允许的条件下,学习率还是越小越好。

**Step2. max_depth 和 num_leaves**

这是提高精确度的最重要的参数。

`max_depth` :设置树深度,深度越大可能过拟合

`num_leaves`:因为 LightGBM 使用的是 leaf-wise 的算法,因此在调节树的复杂程度时,使用的是 num_leaves 而不是 max_depth。大致换算关系:num_leaves = 2^(max_depth),但是它的值的设置应该小于 2^(max_depth),否则可能会导致过拟合。

我们也可以同时调节这两个参数,对于这两个参数调优,我们先粗调,再细调:

这里我们引入`sklearn`里的`GridSearchCV()`函数进行搜索。不知道怎的,这个函数特别耗内存,特别耗时间,特别耗精力。

```
from sklearn.model_selection import GridSearchCV
### 我们可以创建lgb的sklearn模型,使用上面选择的(学习率,评估器数目)
model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=50,
                              learning_rate=0.1, n_estimators=43, max_depth=6,
                              metric='rmse', bagging_fraction = 0.8,feature_fraction = 0.8)

params_test1={
    'max_depth': range(3,8,2),
    'num_leaves':range(50, 170, 30)
}
gsearch1 = GridSearchCV(estimator=model_lgb, param_grid=params_test1, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
gsearch1.fit(df_train, y_train)
gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_
Fitting 5 folds for each of 12 candidates, totalling 60 fits


[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  2.0min
[Parallel(n_jobs=4)]: Done  60 out of  60 | elapsed:  3.1min finished


([mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 50},
  mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 80},
  mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 110},
  mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 140},
  mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 50},
  mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 80},
  mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 110},
  mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 140},
  mean: -1.89254, std: 0.10904, params: {'max_depth': 7, 'num_leaves': 50},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 80},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 110},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 140}],
 {'max_depth': 7, 'num_leaves': 80},
 -1.8602436718814157)
```

这里,我们运行了12个参数组合,得到的最优解是在max_depth为7,num_leaves为80的情况下,分数为-1.860。

这里必须说一下,sklearn模型评估里的scoring参数都是采用的**higher return values are better than lower return values(较高的返回值优于较低的返回值)**。

但是,我采用的metric策略采用的是均方误差(rmse),越低越好,所以sklearn就提供了`neg_mean_squared_erro`参数,也就是返回metric的负数,所以就均方差来说,也就变成负数越大越好了。

所以,可以看到,最优解的分数为-1.860,转化为均方差为np.sqrt(-(-1.860)) = 1.3639,明显比step1的分数要好很多。

至此,我们将我们这步得到的最优解代入第三步。其实,我这里只进行了粗调,如果要得到更好的效果,可以将max_depth在7附近多取几个值,num_leaves在80附近多取几个值。千万不要怕麻烦,虽然这确实很麻烦。

```
params_test2={
    'max_depth': [6,7,8],
    'num_leaves':[68,74,80,86,92]
}

gsearch2 = GridSearchCV(estimator=model_lgb, param_grid=params_test2, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
gsearch2.fit(df_train, y_train)
gsearch2.grid_scores_, gsearch2.best_params_, gsearch2.best_score_
Fitting 5 folds for each of 15 candidates, totalling 75 fits


[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  2.8min
[Parallel(n_jobs=4)]: Done  75 out of  75 | elapsed:  5.1min finished


([mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 68},
  mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 74},
  mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 80},
  mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 86},
  mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 92},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 68},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 74},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 80},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 86},
  mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 92},
  mean: -1.88197, std: 0.11295, params: {'max_depth': 8, 'num_leaves': 68},
  mean: -1.89117, std: 0.12686, params: {'max_depth': 8, 'num_leaves': 74},
  mean: -1.86390, std: 0.12259, params: {'max_depth': 8, 'num_leaves': 80},
  mean: -1.86733, std: 0.12159, params: {'max_depth': 8, 'num_leaves': 86},
  mean: -1.86665, std: 0.12174, params: {'max_depth': 8, 'num_leaves': 92}],
 {'max_depth': 7, 'num_leaves': 68},
 -1.8602436718814157)
```

可见最大深度7是没问题的,但是看细节的话,发现在最大深度为7的情况下,叶结点的数量对分数并没有影响。

**Step3: min_data_in_leaf 和 min_sum_hessian_in_leaf**

说到这里,就该降低过拟合了。

`min_data_in_leaf` 是一个很重要的参数, 也叫min_child_samples,它的值取决于训练数据的样本个树和num_leaves. 将其设置的较大可以避免生成一个过深的树, 但有可能导致欠拟合。

`min_sum_hessian_in_leaf`:也叫min_child_weight,使一个结点分裂的最小海森值之和,真拗口(Minimum sum of hessians in one leaf to allow a split. Higher values potentially decrease overfitting)

我们采用跟上面相同的方法进行:

```
params_test3={
    'min_child_samples': [18, 19, 20, 21, 22],
    'min_child_weight':[0.001, 0.002]
}
model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
                              learning_rate=0.1, n_estimators=43, max_depth=7, 
                              metric='rmse', bagging_fraction = 0.8, feature_fraction = 0.8)
gsearch3 = GridSearchCV(estimator=model_lgb, param_grid=params_test3, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
gsearch3.fit(df_train, y_train)
gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_
Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  2.9min
[Parallel(n_jobs=4)]: Done  50 out of  50 | elapsed:  3.3min finished


([mean: -1.88057, std: 0.13948, params: {'min_child_samples': 18, 'min_child_weight': 0.001},
  mean: -1.88057, std: 0.13948, params: {'min_child_samples': 18, 'min_child_weight': 0.002},
  mean: -1.88365, std: 0.13650, params: {'min_child_samples': 19, 'min_child_weight': 0.001},
  mean: -1.88365, std: 0.13650, params: {'min_child_samples': 19, 'min_child_weight': 0.002},
  mean: -1.86024, std: 0.11364, params: {'min_child_samples': 20, 'min_child_weight': 0.001},
  mean: -1.86024, std: 0.11364, params: {'min_child_samples': 20, 'min_child_weight': 0.002},
  mean: -1.86980, std: 0.14251, params: {'min_child_samples': 21, 'min_child_weight': 0.001},
  mean: -1.86980, std: 0.14251, params: {'min_child_samples': 21, 'min_child_weight': 0.002},
  mean: -1.86750, std: 0.13898, params: {'min_child_samples': 22, 'min_child_weight': 0.001},
  mean: -1.86750, std: 0.13898, params: {'min_child_samples': 22, 'min_child_weight': 0.002}],
 {'min_child_samples': 20, 'min_child_weight': 0.001},
 -1.8602436718814157)
```

这是我经过粗调后细调的结果,可以看到,min_data_in_leaf的最优值为20,而min_sum_hessian_in_leaf对最后的值几乎没有影响。且这里调参之后,最后的值没有进行优化,说明之前的默认值即为20,0.001。

**Step4: feature_fraction 和 bagging_fraction**

这两个参数都是为了降低过拟合的。

feature_fraction参数来进行特征的子抽样。这个参数可以用来防止过拟合及提高训练速度。

bagging_fraction+bagging_freq参数必须同时设置,bagging_fraction相当于subsample样本采样,可以使bagging更快的运行,同时也可以降拟合。bagging_freq默认0,表示bagging的频率,0意味着没有使用bagging,k意味着每k轮迭代进行一次bagging。

不同的参数,同样的方法。

```
params_test4={
    'feature_fraction': [0.5, 0.6, 0.7, 0.8, 0.9],
    'bagging_fraction': [0.6, 0.7, 0.8, 0.9, 1.0]
}
model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
                              learning_rate=0.1, n_estimators=43, max_depth=7, 
                              metric='rmse', bagging_freq = 5,  min_child_samples=20)
gsearch4 = GridSearchCV(estimator=model_lgb, param_grid=params_test4, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
gsearch4.fit(df_train, y_train)
gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_
Fitting 5 folds for each of 25 candidates, totalling 125 fits


[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  2.6min
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:  7.1min finished


([mean: -1.90447, std: 0.15841, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.5},
  mean: -1.90846, std: 0.13925, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.6},
  mean: -1.91695, std: 0.14121, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.7},
  mean: -1.90115, std: 0.12625, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.8},
  mean: -1.92586, std: 0.15220, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.9},
  mean: -1.88031, std: 0.17157, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.5},
  mean: -1.89513, std: 0.13718, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.6},
  mean: -1.88845, std: 0.13864, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.7},
  mean: -1.89297, std: 0.12374, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.8},
  mean: -1.89432, std: 0.14353, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.9},
  mean: -1.88088, std: 0.14247, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.5},
  mean: -1.90080, std: 0.13174, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.6},
  mean: -1.88364, std: 0.14732, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.7},
  mean: -1.88987, std: 0.13344, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.8},
  mean: -1.87752, std: 0.14802, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.9},
  mean: -1.88348, std: 0.13925, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.5},
  mean: -1.87472, std: 0.13301, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.6},
  mean: -1.88656, std: 0.12241, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.7},
  mean: -1.89029, std: 0.10776, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.8},
  mean: -1.88719, std: 0.11915, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.9},
  mean: -1.86170, std: 0.12544, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.5},
  mean: -1.87334, std: 0.13099, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.6},
  mean: -1.85412, std: 0.12698, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.7},
  mean: -1.86024, std: 0.11364, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.8},
  mean: -1.87266, std: 0.12271, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.9}],
 {'bagging_fraction': 1.0, 'feature_fraction': 0.7},
 -1.8541224387666373)
```

从这里可以看出来,bagging_feaction和feature_fraction的理想值分别是1.0和0.7,一个很重要原因就是,我的样本数量比较小(4000+),但是特征数量很多(1000+)。所以,这里我们取更小的步长,对feature_fraction进行更细致的取值。

```
params_test5={
    'feature_fraction': [0.62, 0.65, 0.68, 0.7, 0.72, 0.75, 0.78 ]
}
model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
                              learning_rate=0.1, n_estimators=43, max_depth=7, 
                              metric='rmse',  min_child_samples=20)
gsearch5 = GridSearchCV(estimator=model_lgb, param_grid=params_test5, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
gsearch5.fit(df_train, y_train)
gsearch5.grid_scores_, gsearch5.best_params_, gsearch5.best_score_
Fitting 5 folds for each of 7 candidates, totalling 35 fits


[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:  2.3min finished


([mean: -1.86696, std: 0.12658, params: {'feature_fraction': 0.62},
  mean: -1.88337, std: 0.13215, params: {'feature_fraction': 0.65},
  mean: -1.87282, std: 0.13193, params: {'feature_fraction': 0.68},
  mean: -1.85412, std: 0.12698, params: {'feature_fraction': 0.7},
  mean: -1.88235, std: 0.12682, params: {'feature_fraction': 0.72},
  mean: -1.86329, std: 0.12757, params: {'feature_fraction': 0.75},
  mean: -1.87943, std: 0.12107, params: {'feature_fraction': 0.78}],
 {'feature_fraction': 0.7},
 -1.8541224387666373)
```

好吧,feature_fraction就是0.7了

**Step5: 正则化参数**

正则化参数lambda_l1(reg_alpha), lambda_l2(reg_lambda),毫无疑问,是降低过拟合的,两者分别对应l1正则化和l2正则化。我们也来尝试一下使用这两个参数。

```
params_test6={
    'reg_alpha': [0, 0.001, 0.01, 0.03, 0.08, 0.3, 0.5],
    'reg_lambda': [0, 0.001, 0.01, 0.03, 0.08, 0.3, 0.5]
}
model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
                              learning_rate=0.b1, n_estimators=43, max_depth=7, 
                              metric='rmse',  min_child_samples=20, feature_fraction=0.7)
gsearch6 = GridSearchCV(estimator=model_lgb, param_grid=params_test6, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
gsearch6.fit(df_train, y_train)
gsearch6.grid_scores_, gsearch6.best_params_, gsearch6.best_score_
Fitting 5 folds for each of 49 candidates, totalling 245 fits


[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  2.8min
[Parallel(n_jobs=4)]: Done 192 tasks      | elapsed: 10.6min
[Parallel(n_jobs=4)]: Done 245 out of 245 | elapsed: 13.3min finished


([mean: -1.85412, std: 0.12698, params: {'reg_alpha': 0, 'reg_lambda': 0},
  mean: -1.85990, std: 0.13296, params: {'reg_alpha': 0, 'reg_lambda': 0.001},
  mean: -1.86367, std: 0.13634, params: {'reg_alpha': 0, 'reg_lambda': 0.01},
  mean: -1.86787, std: 0.13881, params: {'reg_alpha': 0, 'reg_lambda': 0.03},
  mean: -1.87099, std: 0.12476, params: {'reg_alpha': 0, 'reg_lambda': 0.08},
  mean: -1.87670, std: 0.11849, params: {'reg_alpha': 0, 'reg_lambda': 0.3},
  mean: -1.88278, std: 0.13064, params: {'reg_alpha': 0, 'reg_lambda': 0.5},
  mean: -1.86190, std: 0.13613, params: {'reg_alpha': 0.001, 'reg_lambda': 0},
  mean: -1.86190, std: 0.13613, params: {'reg_alpha': 0.001, 'reg_lambda': 0.001},
  mean: -1.86515, std: 0.14116, params: {'reg_alpha': 0.001, 'reg_lambda': 0.01},
  mean: -1.86908, std: 0.13668, params: {'reg_alpha': 0.001, 'reg_lambda': 0.03},
  mean: -1.86852, std: 0.12289, params: {'reg_alpha': 0.001, 'reg_lambda': 0.08},
  mean: -1.88076, std: 0.11710, params: {'reg_alpha': 0.001, 'reg_lambda': 0.3},
  mean: -1.88278, std: 0.13064, params: {'reg_alpha': 0.001, 'reg_lambda': 0.5},
  mean: -1.87480, std: 0.13889, params: {'reg_alpha': 0.01, 'reg_lambda': 0},
  mean: -1.87284, std: 0.14138, params: {'reg_alpha': 0.01, 'reg_lambda': 0.001},
  mean: -1.86030, std: 0.13332, params: {'reg_alpha': 0.01, 'reg_lambda': 0.01},
  mean: -1.86695, std: 0.12587, params: {'reg_alpha': 0.01, 'reg_lambda': 0.03},
  mean: -1.87415, std: 0.13100, params: {'reg_alpha': 0.01, 'reg_lambda': 0.08},
  mean: -1.88543, std: 0.13195, params: {'reg_alpha': 0.01, 'reg_lambda': 0.3},
  mean: -1.88076, std: 0.13502, params: {'reg_alpha': 0.01, 'reg_lambda': 0.5},
  mean: -1.87729, std: 0.12533, params: {'reg_alpha': 0.03, 'reg_lambda': 0},
  mean: -1.87435, std: 0.12034, params: {'reg_alpha': 0.03, 'reg_lambda': 0.001},
  mean: -1.87513, std: 0.12579, params: {'reg_alpha': 0.03, 'reg_lambda': 0.01},
  mean: -1.88116, std: 0.12218, params: {'reg_alpha': 0.03, 'reg_lambda': 0.03},
  mean: -1.88052, std: 0.13585, params: {'reg_alpha': 0.03, 'reg_lambda': 0.08},
  mean: -1.87565, std: 0.12200, params: {'reg_alpha': 0.03, 'reg_lambda': 0.3},
  mean: -1.87935, std: 0.13817, params: {'reg_alpha': 0.03, 'reg_lambda': 0.5},
  mean: -1.87774, std: 0.12477, params: {'reg_alpha': 0.08, 'reg_lambda': 0},
  mean: -1.87774, std: 0.12477, params: {'reg_alpha': 0.08, 'reg_lambda': 0.001},
  mean: -1.87911, std: 0.12027, params: {'reg_alpha': 0.08, 'reg_lambda': 0.01},
  mean: -1.86978, std: 0.12478, params: {'reg_alpha': 0.08, 'reg_lambda': 0.03},
  mean: -1.87217, std: 0.12159, params: {'reg_alpha': 0.08, 'reg_lambda': 0.08},
  mean: -1.87573, std: 0.14137, params: {'reg_alpha': 0.08, 'reg_lambda': 0.3},
  mean: -1.85969, std: 0.13109, params: {'reg_alpha': 0.08, 'reg_lambda': 0.5},
  mean: -1.87632, std: 0.12398, params: {'reg_alpha': 0.3, 'reg_lambda': 0},
  mean: -1.86995, std: 0.12651, params: {'reg_alpha': 0.3, 'reg_lambda': 0.001},
  mean: -1.86380, std: 0.12793, params: {'reg_alpha': 0.3, 'reg_lambda': 0.01},
  mean: -1.87577, std: 0.13002, params: {'reg_alpha': 0.3, 'reg_lambda': 0.03},
  mean: -1.87402, std: 0.13496, params: {'reg_alpha': 0.3, 'reg_lambda': 0.08},
  mean: -1.87032, std: 0.12504, params: {'reg_alpha': 0.3, 'reg_lambda': 0.3},
  mean: -1.88329, std: 0.13237, params: {'reg_alpha': 0.3, 'reg_lambda': 0.5},
  mean: -1.87196, std: 0.13099, params: {'reg_alpha': 0.5, 'reg_lambda': 0},
  mean: -1.87196, std: 0.13099, params: {'reg_alpha': 0.5, 'reg_lambda': 0.001},
  mean: -1.88222, std: 0.14735, params: {'reg_alpha': 0.5, 'reg_lambda': 0.01},
  mean: -1.86618, std: 0.14006, params: {'reg_alpha': 0.5, 'reg_lambda': 0.03},
  mean: -1.88579, std: 0.12398, params: {'reg_alpha': 0.5, 'reg_lambda': 0.08},
  mean: -1.88297, std: 0.12307, params: {'reg_alpha': 0.5, 'reg_lambda': 0.3},
  mean: -1.88148, std: 0.12622, params: {'reg_alpha': 0.5, 'reg_lambda': 0.5}],
 {'reg_alpha': 0, 'reg_lambda': 0},
 -1.8541224387666373)
```

哈哈,看来我多此一举了。

**step6: 降低learning_rate**

之前使用较高的学习速率是因为可以让收敛更快,但是准确度肯定没有细水长流来的好。最后,我们使用较低的学习速率,以及使用更多的决策树n_estimators来训练数据,看能不能可以进一步的优化分数。

我们可以用回lightGBM的cv函数了 ,我们代入之前优化好的参数。

```
params = {
    'boosting_type': 'gbdt', 
    'objective': 'regression', 

    'learning_rate': 0.005, 
    'num_leaves': 80, 
    'max_depth': 7,
    'min_data_in_leaf': 20,

    'subsample': 1, 
    'colsample_bytree': 0.7, 
    }

data_train = lgb.Dataset(df_train, y_train, silent=True)
cv_results = lgb.cv(
    params, data_train, num_boost_round=10000, nfold=5, stratified=False, shuffle=True, metrics='rmse',
    early_stopping_rounds=50, verbose_eval=100, show_stdv=True)

print('best n_estimators:', len(cv_results['rmse-mean']))
print('best cv score:', cv_results['rmse-mean'][-1])
[100]   cv_agg's rmse: 1.52939 + 0.0261756
[200]   cv_agg's rmse: 1.43535 + 0.0187243
[300]   cv_agg's rmse: 1.39584 + 0.0157521
[400]   cv_agg's rmse: 1.37935 + 0.0157429
[500]   cv_agg's rmse: 1.37313 + 0.0164503
[600]   cv_agg's rmse: 1.37081 + 0.0172752
[700]   cv_agg's rmse: 1.36942 + 0.0177888
[800]   cv_agg's rmse: 1.36854 + 0.0180575
[900]   cv_agg's rmse: 1.36817 + 0.0188776
[1000]  cv_agg's rmse: 1.36796 + 0.0190279
[1100]  cv_agg's rmse: 1.36783 + 0.0195969
best n_estimators: 1079
best cv score: 1.36772351783
```

这就是一个大概过程吧,其实也有更高级的方法,但是这种基本的对于GBM模型的调参方法也是需要了解的吧。如有问题,请多指教。