µ½ÁË 2020 Ä꣬ÎÒÃÇÒѾÄÜÕÒµ½ºÜ¶àºÃÍæµÄ»úÆ÷ѧϰ½Ì³Ì¡£±¾ÎÄÔò´Ó×îÁ÷ÐеÄËæ»úÉÁÖ³ö·¢£¬ÊÖ°ÑÊÖ½ÌÄã¹¹½¨Ò»¸öÄ£ÐÍ£¬ËüµÄÍêÕûÁ÷³Ìµ½µ×ÊÇʲôÑùµÄ¡£
×÷ΪÊý¾Ý¿Æѧ¼Ò£¬ÎÒÃÇ¿ÉÒÔͨ¹ýºÜ¶à·½·¨À´´´½¨·ÖÀàÄ£ÐÍ¡£×îÊÜ»¶ÓµÄ·½·¨Ö®Ò»ÊÇËæ»úÉÁÖ¡£ÎÒÃÇ¿ÉÒÔÔÚËæ»úÉÁÖÉϵ÷Õû³¬²ÎÊýÀ´ÓÅ»¯Ä£Ð͵ÄÐÔÄÜ¡£
ÔÚÓÃÄ£ÐÍÄâºÏ֮ǰ£¬³¢ÊÔÖ÷³É·Ö·ÖÎö£¨PCA£©Ò²Êdz£¼ûµÄ×ö·¨¡£µ«ÊÇ£¬ÎªÊ²Ã´»¹ÒªÔö¼ÓÕâÒ»²½ÄØ£¿ÄѵÀËæ»úÉÁÖµÄÄ¿µÄ²»ÊÇ°ïÖúÎÒÃǸüÇáËɵØÀí½âÌØÕ÷ÖØÒªÐÔÂð£¿
µ±ÎÒÃÇ·ÖÎöËæ»úÉÁÖÄ£Ð͵ġ¸ÌØÕ÷ÖØÒªÐÔ¡¹Ê±£¬PCA »áʹÿ¸ö¡¸ÌØÕ÷¡¹µÄ½âÊͱäµÃ¸ü¼ÓÀ§ÄÑ¡£µ«ÊÇ PCA »á½øÐнµÎ¬²Ù×÷£¬Õâ¿ÉÒÔ¼õÉÙËæ»úÉÁÖÒª´¦ÀíµÄÌØÕ÷ÊýÁ¿£¬Òò´Ë PCA ¿ÉÄÜÓÐÖúÓÚ¼Ó¿ìËæ»úÉÁÖÄ£Ð͵ÄѵÁ·Ëٶȡ£
Çë×¢Ò⣬¼ÆËã³É±¾¸ßÊÇËæ»úÉÁÖµÄ×î´óȱµãÖ®Ò»£¨ÔËÐÐÄ£ÐÍ¿ÉÄÜÐèÒªºÜ³¤Ê±¼ä£©¡£ÓÈÆäÊǵ±ÄãʹÓÃÊý°ÙÉõÖÁÉÏǧ¸öÔ¤²âÌØÕ÷ʱ£¬PCA ¾Í±äµÃ·Ç³£ÖØÒª¡£Òò´Ë£¬Èç¹ûÖ»Ïë¼òµ¥µØÓµÓÐ×î¼ÑÐÔÄܵÄÄ£ÐÍ£¬²¢ÇÒ¿ÉÒÔÎþÉü½âÊÍÌØÕ÷µÄÖØÒªÐÔ£¬ÄÇô PCA ¿ÉÄÜ»áºÜÓÐÓá£
ÏÖÔÚÈÃÎÒÃǾٸöÀý×Ó¡£ÎÒÃǽ«Ê¹Óà Scikit-learn µÄ¡¸ÈéÏÙ°©¡¹Êý¾Ý¼¯£¬²¢´´½¨ 3 ¸öÄ£ÐÍ£¬±È½ÏËüÃǵÄÐÔÄÜ£º
1. Ëæ»úÉÁÖ
2. ¾ßÓÐ PCA ½µÎ¬µÄËæ»úÉÁÖ
3. ¾ßÓÐ PCA ½µÎ¬ºÍ³¬²ÎÊýµ÷ÕûµÄËæ»úÉÁÖ
Ê×ÏÈ£¬ÎÒÃǼÓÔØÊý¾Ý²¢´´½¨Ò»¸ö DataFrame¡£ÕâÊÇ Scikit-learn Ô¤ÏÈÇåÀíµÄ¡¸toy¡¹Êý¾Ý¼¯£¬Òò´ËÎÒÃÇ¿ÉÒÔ¼ÌÐø¿ìËÙ½¨Ä£¡£µ«ÊÇ£¬×÷Ϊ×î¼Ñʵ¼ù£¬ÎÒÃÇÓ¦¸ÃÖ´ÐÐÒÔϲÙ×÷£º
ÃûΪ¡¸cancer¡¹µÄÁÐÊÇÎÒÃÇҪʹÓÃÄ£ÐÍÔ¤²âµÄÄ¿±ê±äÁ¿¡£¡¸0¡¹±íʾ¡¸ÎÞ°©Ö¢¡¹£¬¡¸1¡¹±íʾ¡¸°©Ö¢¡¹¡£
import?pandas?as?pd?from?sklearn.datasets?import?load_breast_cancercolumns?=?['mean?radius',?'mean?texture',?'mean?perimeter',?'mean?area',?'mean?smoothness',?'mean?compactness',?'mean?concavity',?'mean?concave?points',?'mean?symmetry',?'mean?fractal?dimension',?'radius?error',?'texture?error',?'perimeter?error',?'area?error',?'smoothness?error',?'compactness?error',?'concavity?error',?'concave?points?error',?'symmetry?error',?'fractal?dimension?error',?'worst?radius',?'worst?texture',?'worst?perimeter',?'worst?area',?'worst?smoothness',?'worst?compactness',?'worst?concavity',?'worst?concave?points',?'worst?symmetry',?'worst?fractal?dimension']dataset?=?load_breast_cancer()?data?=?pd.DataFrame(dataset['data'],?columns=columns)?data['cancer']?=?dataset['target']display(data.head())?display(data.info())?display(data.isna().sum())?display(data.describe())?
ÉÏͼÊÇÈéÏÙ°© DataFrame µÄÒ»²¿·Ö¡£Ã¿ÐÐÊÇÒ»¸ö»¼ÕߵĹ۲ì½á¹û¡£×îºóÒ»ÁÐÃûΪ¡¸cancer¡¹ÊÇÎÒÃÇÒªÔ¤²âµÄÄ¿±ê±äÁ¿¡£0 ±íʾ¡¸ÎÞ°©Ö¢¡¹£¬1 ±íʾ¡¸°©Ö¢¡¹¡£
ÏÖÔÚ£¬ÎÒÃÇʹÓà Scikit-learn µÄ¡¸train_test_split¡¹º¯Êý²ð·ÖÊý¾Ý¡£ÎÒÃÇÏëÈÃÄ£ÐÍÓо¡¿ÉÄܶàµÄÊý¾Ý½øÐÐѵÁ·¡£µ«ÊÇ£¬ÎÒÃÇҲҪȷ±£ÓÐ×ã¹»µÄÊý¾ÝÀ´²âÊÔÄ£ÐÍ¡£Í¨³£Êý¾Ý¼¯ÖÐÐÐÊýÔ½¶à£¬ÎÒÃÇ¿ÉÒÔÌṩ¸øѵÁ·¼¯µÄÊý¾ÝÔ½¶à¡£
ÀýÈ磬Èç¹ûÎÒÃÇÓÐÊý°ÙÍòÐУ¬ÄÇôÎÒÃÇ¿ÉÒÔ½«ÆäÖÐµÄ 90£¥ÓÃ×÷ѵÁ·£¬10£¥ÓÃ×÷²âÊÔ¡£µ«ÊÇ£¬ÎÒÃǵÄÊý¾Ý¼¯Ö»ÓÐ 569 ÐУ¬Êý¾ÝÁ¿²¢²»´ó¡£Òò´Ë£¬ÎªÁËÆ¥ÅäÕâÖÖСÐÍÊý¾Ý¼¯£¬ÎÒÃǻὫÊý¾Ý·ÖΪ 50£¥µÄѵÁ·ºÍ 50£¥µÄ²âÊÔ¡£ÎÒÃÇÉèÖà stratify = y ÒÔÈ·±£ÑµÁ·¼¯ºÍ²âÊÔ¼¯ÓëÔʼÊý¾Ý¼¯µÄ 0 ºÍ 1 µÄ±ÈÀýÒ»Ö¡£
from?sklearn.model_selection?import?train_test_splitX?=?data.drop('cancer',?axis=1)???y?=?data['cancer']??X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.50,?random_state?=?2020,?stratify=y)?
ÔÚ½¨Ä£Ö®Ç°£¬ÎÒÃÇÐèÒªÏȽ«Êý¾Ý¡¸¾ÓÖС¹ºÍ¡¸±ê×¼»¯¡¹£¬¶Ô²»Í¬µÄ±äÁ¿ÒªÔÚÏàͬ³ß¶È½øÐвâÁ¿¡£ÎÒÃǽøÐÐËõ·ÅÒÔ±ã¾ö¶¨Ô¤²â±äÁ¿µÄÌØÕ÷¿ÉÒԱ˴ˡ¸¹«Æ½¾ºÕù¡¹¡£ÎÒÃÇ»¹½«¡¸y_train¡¹´Ó Pandas¡¸Series¡¹¶ÔÏóת»»Îª NumPy Êý×飬ÒÔ¹©Ä£ÐÍÉÔºó½ÓÊÕѵÁ·Êý¾Ý¡£
import?numpy?as?np?from?sklearn.preprocessing?import?StandardScalerss?=?StandardScaler()?X_train_scaled?=?ss.fit_transform(X_train)?X_test_scaled?=?ss.transform(X_test)?y_train?=?np.array(y_train)?
ÏÖÔÚ£¬ÎÒÃÇ´´½¨Ò»¸ö¡¸»ùÏß¡¹Ëæ»úÉÁÖÄ£ÐÍ¡£¸ÃÄ£ÐÍʹÓà Scikit-learn Ëæ»úÉÁÖ·ÖÀàÆ÷ÎĵµÖж¨ÒåµÄËùÓÐÔ¤²âÌØÕ÷ºÍĬÈÏÉèÖá£Ê×ÏÈ£¬ÎÒÃÇʵÀý»¯Ä£ÐͲ¢Ê¹Óù淶»¯µÄÊý¾ÝÄâºÏÄ£ÐÍ¡£ÎÒÃÇ¿ÉÒÔͨ¹ýѵÁ·Êý¾Ý²âÁ¿Ä£Ð͵Ä׼ȷÐÔ¡£
from?sklearn.ensemble?import?RandomForestClassifier?from?sklearn.metrics?import?recall_scorerfc?=?RandomForestClassifier()?rfc.fit(X_train_scaled,?y_train)?display(rfc.score(X_train_scaled,?y_train))#?1.0?
Èç¹ûÎÒÃÇÏëÖªµÀÄÄЩÌØÕ÷¶ÔËæ»úÉÁÖÄ£ÐÍÔ¤²âÈéÏÙ°©×îÖØÒª£¬ÎÒÃÇ¿ÉÒÔͨ¹ýµ÷Óá¸feature_importances _¡¹·½·¨À´¿ÉÊÓ»¯ºÍÁ¿»¯ÕâЩÖØÒªÌØÕ÷£º
feats?=?{}?for?feature,?importance?in?zip(data.columns,?rfc_1.feature_importances_):?feats[feature]?=?importanceimportances?=?pd.DataFrame.from_dict(feats,?orient='index').rename(columns={0:?'Gini-Importance'})?importances?=?importances.sort_values(by='Gini-Importance',?ascending=False)?importances?=?importances.reset_index()?importances?=?importances.rename(columns={'index':?'Features'})sns.set(font_scale?=?5)?sns.set(style="whitegrid",?color_codes=True,?font_scale?=?1.7)?fig,?ax?=?plt.subplots()?fig.set_size_inches(30,15)?sns.barplot(x=importances['Gini-Importance'],?y=importances['Features'],?data=importances,?color='skyblue')?plt.xlabel('Importance',?fontsize=25,?weight?=?'bold')?plt.ylabel('Features',?fontsize=25,?weight?=?'bold')?plt.title('Feature?Importance',?fontsize=25,?weight?=?'bold')display(plt.show())?display(importances)?
ÏÖÔÚ£¬ÎÒÃÇÈçºÎ¸Ä½ø»ùÏßÄ£ÐÍÄØ£¿Ê¹ÓýµÎ¬£¬ÎÒÃÇ¿ÉÒÔÓøüÉٵıäÁ¿À´ÄâºÏÔʼÊý¾Ý¼¯£¬Í¬Ê±½µµÍÔËÐÐÄ£Ð͵ļÆË㻨Ïú¡£Ê¹Óà PCA£¬ÎÒÃÇ¿ÉÒÔÑо¿ÕâЩÌØÕ÷µÄÀÛ»ý·½²î±È£¬ÒÔÁ˽âÄÄЩÌØÕ÷´ú±íÊý¾ÝÖеÄ×î´ó·½²î¡£
ÎÒÃÇʵÀý»¯ PCA º¯Êý²¢ÉèÖÃÎÒÃÇÒª¿¼Âǵijɷ֣¨ÌØÕ÷£©ÊýÁ¿¡£´Ë´¦ÎÒÃÇÉèÖÃΪ 30£¬ÒԲ鿴ËùÓÐÉú³É³É·ÖµÄ·½²î£¬²¢¾ö¶¨Ôںδ¦ÇиȻºó£¬ÎÒÃǽ«Ëõ·ÅºóµÄ X_train Êý¾Ý¡¸ÄâºÏ¡¹µ½ PCA º¯ÊýÖС£
import?matplotlib.pyplot?as?plt?import?seaborn?as?sns?from?sklearn.decomposition?import?PCApca_test?=?PCA(n_components=30)?pca_test.fit(X_train_scaled)sns.set(style='whitegrid')?plt.plot(np.cumsum(pca_test.explained_variance_ratio_))?plt.xlabel('number?of?components')?plt.ylabel('cumulative?explained?variance')?plt.axvline(linewidth=4,?color='r',?linestyle?=?'--',?x=10,?ymin=0,?ymax=1)?display(plt.show())evr?=?pca_test.explained_variance_ratio_?cvr?=?np.cumsum(pca_test.explained_variance_ratio_)pca_df?=?pd.DataFrame()?pca_df['Cumulative?Variance?Ratio']?=?cvr?pca_df['Explained?Variance?Ratio']?=?evr?display(pca_df.head(10))?
¸ÃͼÏÔʾ£¬ÔÚ³¬¹ý 10 ¸öÌØÕ÷Ö®ºó£¬ÎÒÃDz¢Î´»ñµÃÌ«¶àµÄ½âÊÍ·½²î¡£´Ë DataFrame ÏÔʾÁËÀÛ»ý·½²î±È£¨½âÊÍÁËÊý¾ÝµÄ×Ü·½²î£©ºÍ½âÊÍ·½²î±È£¨Ã¿¸ö PCA ³É·Ö˵Ã÷Á˶àÉÙÊý¾ÝµÄ×Ü·½²î£©¡£
´ÓÉÏÃæµÄ DataFrame ¿ÉÒÔ¿´³ö£¬µ±ÎÒÃÇʹÓà PCA ½« 30 ¸öÔ¤²â±äÁ¿¼õÉÙµ½ 10 ¸ö·ÖÁ¿Ê±£¬ÎÒÃÇÈÔÈ»¿ÉÒÔ½âÊÍ 95£¥ÒÔÉϵķ½²î¡£ÆäËû 20 ¸ö·ÖÁ¿½ö½âÊÍÁ˲»µ½ 5£¥µÄ·½²î£¬Òò´Ë ÎÒÃÇ¿ÉÒÔ¼õÉÙËûÃǵÄȨÖØ¡£°´´ËÂß¼£¬ÎÒÃǽ«Ê¹Óà PCA ½« X_train ºÍ X_test µÄ³É·ÖÊýÁ¿´Ó 30 ¸ö¼õÉÙµ½ 10 ¸ö¡£ÎÒÃǽ«ÕâЩÖØд´½¨µÄ¡¸½µÎ¬¡¹Êý¾Ý¼¯·ÖÅä¸ø¡¸X_train_scaled_pca¡¹ºÍ¡¸X_test_scaled_pca¡¹¡£
pca?=?PCA(n_components=10)?pca.fit(X_train_scaled)X_train_scaled_pca?=?pca.transform(X_train_scaled)?X_test_scaled_pca?=?pca.transform(X_test_scaled)?
ÿ¸ö·ÖÁ¿¶¼ÊÇÔʼ±äÁ¿ºÍÏàÓ¦¡¸È¨ÖØ¡¹µÄÏßÐÔ×éºÏ¡£Í¨¹ý´´½¨Ò»¸ö DataFrame£¬ÎÒÃÇ¿ÉÒÔ¿´µ½Ã¿¸ö PCA ³É·ÖµÄ¡¸È¨ÖØ¡¹¡£
pca_dims?=?[]?for?x?in?range(0,?len(pca_df)):?pca_dims.append('PCA?Component?{}'.format(x))pca_test_df?=?pd.DataFrame(pca_test.components_,?columns=columns,?index=pca_dims)?pca_test_df.head(10).T?
ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔ½« X_train_scaled_pca ºÍ y_train Êý¾ÝÄâºÏµ½ÁíÒ»¸ö¡¸»ùÏß¡¹Ëæ»úÉÁÖÄ£ÐÍÖУ¬²âÊÔÎÒÃǶԸÃÄ£Ð͵ÄÔ¤²âÊÇ·ñÓÐËù¸Ä½ø¡£
rfc?=?RandomForestClassifier()?rfc.fit(X_train_scaled_pca,?y_train)display(rfc.score(X_train_scaled_pca,?y_train))#?1.0?
ʵÏÖ PCA Ö®ºó£¬ÎÒÃÇ»¹¿ÉÒÔͨ¹ýһЩ³¬²ÎÊýµ÷ÓÅÀ´µ÷ÕûÎÒÃǵÄËæ»úÉÁÖÒÔ»ñµÃ¸üºÃµÄÔ¤²âЧ¹û¡£³¬²ÎÊý¿ÉÒÔ¿´×÷Ä£Ð͵ġ¸ÉèÖṡ£Á½¸ö²»Í¬Êý¾Ý¼¯µÄÀíÏëÉèÖò¢²»Ïàͬ£¬Òò´ËÎÒÃDZØÐ롸µ÷Õû¡¹Ä£ÐÍ¡£
Ê×ÏÈ£¬ÎÒÃÇ¿ÉÒÔ´Ó RandomSearchCV ¿ªÊ¼¿¼ÂǸü¶àµÄ³¬²ÎÖµ¡£ËùÓÐËæ»úÉÁֵij¬²ÎÊý¶¼¿ÉÒÔÔÚ Scikit-learn Ëæ»úÉÁÖ·ÖÀàÆ÷ÎĵµÖÐÕÒµ½¡£
ÎÒÃÇÉú³ÉÒ»¸ö¡¸param_dist¡¹£¬ÆäÖµµÄ·¶Î§ÊÊÓÃÓÚÿ¸ö³¬²ÎÊý¡£ÊµÀý»¯ RandomSearchCV£¬Ê×ÏÈ´«ÈëÎÒÃǵÄËæ»úÉÁÖÄ£ÐÍ£¬È»ºó´«È롸param_dist¡¹¡¢²âÊÔµü´ú´ÎÊýÒÔ¼°½»²æÑéÖ¤´ÎÊý¡£
³¬²ÎÊý¡¸n_jobs¡¹¿ÉÒÔ¾ö¶¨ÒªÊ¹ÓöàÉÙ´¦ÀíÆ÷ÄÚºËÀ´ÔËÐÐÄ£ÐÍ¡£ÉèÖá¸n_jobs = -1¡¹½«Ê¹Ä£ÐÍÔËÐÐ×î¿ì£¬ÒòΪËüʹÓÃÁËËùÓмÆËã»úºËÐÄ¡£
ÎÒÃǽ«µ÷ÕûÕâЩ³¬²ÎÊý£º
from?sklearn.model_selection?import?RandomizedSearchCVn_estimators?=?[int(x)?for?x?in?np.linspace(start?=?100,?stop?=?1000,?num?=?10)]max_features?=?['log2',?'sqrt']max_depth?=?[int(x)?for?x?in?np.linspace(start?=?1,?stop?=?15,?num?=?15)]min_samples_split?=?[int(x)?for?x?in?np.linspace(start?=?2,?stop?=?50,?num?=?10)]min_samples_leaf?=?[int(x)?for?x?in?np.linspace(start?=?2,?stop?=?50,?num?=?10)]bootstrap?=?[True,?False]param_dist?=?{'n_estimators':?n_estimators,?'max_features':?max_features,?'max_depth':?max_depth,?'min_samples_split':?min_samples_split,?'min_samples_leaf':?min_samples_leaf,?'bootstrap':?bootstrap}rs?=?RandomizedSearchCV(rfc_2,??param_dist,??n_iter?=?100,??cv?=?3,??verbose?=?1,??n_jobs=-1,??random_state=0)rs.fit(X_train_scaled_pca,?y_train)?rs.best_params_??¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª?#?{'n_estimators':?700,?#?'min_samples_split':?2,?#?'min_samples_leaf':?2,?#?'max_features':?'log2',?#?'max_depth':?11,?#?'bootstrap':?True}?
ÔÚ n_iter = 100 ÇÒ cv = 3 µÄÇé¿öÏ£¬ÎÒÃÇ´´½¨ÁË 300 ¸öËæ»úÉÁÖÄ£ÐÍ£¬¶ÔÉÏÃæÊäÈëµÄ³¬²ÎÊý½øÐÐËæ»ú²ÉÑù×éºÏ¡£ÎÒÃÇ¿ÉÒÔµ÷Óá¸best_params¡¹ÒÔ»ñÈ¡ÐÔÄÜ×î¼ÑµÄÄ£ÐͲÎÊý£¨ÈçÉÏÃæ´úÂë¿òµ×²¿Ëùʾ£©¡£
µ«ÊÇ£¬Ïֽ׶εġ¸best_params¡¹¿ÉÄÜÎÞ·¨ÎªÎÒÃÇÌṩ×îÓÐЧµÄÐÅÏ¢£¬ÒÔ»ñȡһϵÁвÎÊýÀ´Ö´ÐÐÏÂÒ»´Î³¬²ÎÊýµ÷Õû¡£ÎªÁËÔÚ¸ü´ó·¶Î§ÄÚ½øÐг¢ÊÔ£¬ÎÒÃÇ¿ÉÒÔÇáËɵػñµÃ RandomSearchCV ½á¹ûµÄ DataFrame¡£
rs_df?=?pd.DataFrame(rs.cv_results_).sort_values('rank_test_score').reset_index(drop=True)?rs_df?=?rs_df.drop([?'mean_fit_time',??'std_fit_time',??'mean_score_time',?'std_score_time',??'params',??'split0_test_score',??'split1_test_score',??'split2_test_score',??'std_test_score'],?axis=1)?rs_df.head(10)?
ÏÖÔÚ£¬ÈÃÎÒÃÇÔÚ x ÖáÉÏ´´½¨Ã¿¸ö³¬²ÎÊýµÄÖù״ͼ£¬²¢Õë¶Ôÿ¸öÖµÖÆ×÷Ä£Ð͵Äƽ¾ùµÃ·Ö£¬²é¿´Æ½¾ù¶øÑÔ×îÓŵÄÖµ£º
fig,?axs?=?plt.subplots(ncols=3,?nrows=2)?sns.set(style="whitegrid",?color_codes=True,?font_scale?=?2)?fig.set_size_inches(30,25)sns.barplot(x='param_n_estimators',?y='mean_test_score',?data=rs_df,?ax=axs[0,0],?color='lightgrey')?axs[0,0].set_ylim([.83,.93])axs[0,0].set_title(label?=?'n_estimators',?size=30,?weight='bold')sns.barplot(x='param_min_samples_split',?y='mean_test_score',?data=rs_df,?ax=axs[0,1],?color='coral')?axs[0,1].set_ylim([.85,.93])axs[0,1].set_title(label?=?'min_samples_split',?size=30,?weight='bold')sns.barplot(x='param_min_samples_leaf',?y='mean_test_score',?data=rs_df,?ax=axs[0,2],?color='lightgreen')?axs[0,2].set_ylim([.80,.93])axs[0,2].set_title(label?=?'min_samples_leaf',?size=30,?weight='bold')sns.barplot(x='param_max_features',?y='mean_test_score',?data=rs_df,?ax=axs[1,0],?color='wheat')?axs[1,0].set_ylim([.88,.92])axs[1,0].set_title(label?=?'max_features',?size=30,?weight='bold')sns.barplot(x='param_max_depth',?y='mean_test_score',?data=rs_df,?ax=axs[1,1],?color='lightpink')?axs[1,1].set_ylim([.80,.93])axs[1,1].set_title(label?=?'max_depth',?size=30,?weight='bold')sns.barplot(x='param_bootstrap',y='mean_test_score',?data=rs_df,?ax=axs[1,2],?color='skyblue')?axs[1,2].set_ylim([.88,.92])?
ͨ¹ýÉÏÃæµÄͼ£¬ÎÒÃÇ¿ÉÒÔÁ˽âÿ¸ö³¬²ÎÊýµÄÖµµÄƽ¾ùÖ´ÐÐÇé¿ö¡£
n_estimators£º300¡¢500¡¢700 µÄƽ¾ù·ÖÊý¼¸ºõ×î¸ß£»
min_samples_split£º½ÏСµÄÖµ£¨Èç 2 ºÍ 7£©µÃ·Ö½Ï¸ß¡£23 ´¦µÃ·ÖÒ²ºÜ¸ß¡£ÎÒÃÇ¿ÉÒÔ³¢ÊÔһЩ´óÓÚ 2 µÄÖµ£¬ÒÔ¼° 23 ¸½½üµÄÖµ£»
min_samples_leaf£º½ÏСµÄÖµ¿ÉÄܵõ½¸ü¸ßµÄ·Ö£¬ÎÒÃÇ¿ÉÒÔ³¢ÊÔʹÓà 2¨C7 Ö®¼äµÄÖµ£»
max_features£º¡¸sqrt¡¹¾ßÓÐ×î¸ßƽ¾ù·Ö£»
max_depth£ºÃ»ÓÐÃ÷È·µÄ½á¹û£¬µ«ÊÇ 2¡¢3¡¢7¡¢11¡¢15 µÄЧ¹ûºÜºÃ£»
bootstrap£º¡¸False¡¹¾ßÓÐ×î¸ßƽ¾ù·Ö¡£
ÏÖÔÚÎÒÃÇ¿ÉÒÔÀûÓÃÕâЩ½áÂÛ£¬½øÈëµÚ¶þÂÖ³¬²ÎÊýµ÷Õû£¬ÒÔ½øÒ»²½ËõСѡÔñ·¶Î§¡£
ʹÓà RandomSearchCV Ö®ºó£¬ÎÒÃÇ¿ÉÒÔʹÓà GridSearchCV ¶ÔÄ¿Ç°×î¼Ñ³¬²ÎÊýÖ´Ðиü¾«Ï¸µÄËÑË÷¡£³¬²ÎÊýÊÇÏàͬµÄ£¬µ«ÊÇÏÖÔÚÎÒÃÇʹÓà GridSearchCV Ö´Ðиü¡¸Ï꾡¡¹µÄËÑË÷¡£
ÔÚ GridSearchCV ÖУ¬ÎÒÃdz¢ÊÔÿ¸ö³¬²ÎÊýµÄµ¥¶À×éºÏ£¬Õâ±È RandomSearchCV ËùÐèµÄ¼ÆËãÁ¦Òª¶àµÃ¶à£¬ÔÚÕâÀïÎÒÃÇ¿ÉÒÔÖ±½Ó¿ØÖÆÒª³¢ÊԵĵü´ú´ÎÊý¡£ÀýÈ磬½ö¶Ô 6 ¸ö²ÎÊýËÑË÷ 10 ¸ö²»Í¬µÄ²ÎÊýÖµ£¬¾ßÓÐ 3 ÕÛ½»²æÑéÖ¤£¬ÔòÐèÒªÄâºÏÄ£ÐÍ 3,000,000 ´Î£¡Õâ¾ÍÊÇΪʲôÎÒÃÇÔÚʹÓà RandomSearchCV Ö®ºóÖ´ÐÐ GridSearchCV£¬ÕâÄÜ°ïÖúÎÒÃÇÊ×ÏÈËõСËÑË÷·¶Î§¡£
Òò´Ë£¬ÀûÓÃÎÒÃÇ´Ó RandomizedSearchCV ÖÐѧµ½µÄ֪ʶ£¬´úÈëÿ¸ö³¬²ÎÊýµÄƽ¾ù×î¼ÑÖ´Ðз¶Î§£º
from?sklearn.model_selection?import?GridSearchCVn_estimators?=?[300,500,700]?max_features?=?['sqrt']?max_depth?=?[2,3,7,11,15]?min_samples_split?=?[2,3,4,22,23,24]?min_samples_leaf?=?[2,3,4,5,6,7]?bootstrap?=?[False]param_grid?=?{'n_estimators':?n_estimators,?'max_features':?max_features,?'max_depth':?max_depth,?'min_samples_split':?min_samples_split,?'min_samples_leaf':?min_samples_leaf,?'bootstrap':?bootstrap}gs?=?GridSearchCV(rfc_2,?param_grid,?cv?=?3,?verbose?=?1,?n_jobs=-1)?gs.fit(X_train_scaled_pca,?y_train)?rfc_3?=?gs.best_estimator_?gs.best_params_??¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª?#?{'bootstrap':?False,?#?'max_depth':?7,?#?'max_features':?'sqrt',?#?'min_samples_leaf':?3,?#?'min_samples_split':?2,?#?'n_estimators':?500}?
ÔÚÕâÀïÎÒÃǽ«¶Ô 3x 1 x 5x 6 x 6 x 1 = 540 ¸öÄ£ÐͽøÐÐ 3 ÕÛ½»²æÑéÖ¤£¬×ܹ²ÊÇ 1,620 ¸öÄ£ÐÍ£¡ÏÖÔÚ£¬ÔÚÖ´ÐÐ RandomizedSearchCV ºÍ GridSearchCV Ö®ºó£¬ÎÒÃÇ ¿ÉÒÔµ÷Óá¸best_params_¡¹»ñµÃÒ»¸ö×î¼ÑÄ£ÐÍÀ´Ô¤²âÎÒÃǵÄÊý¾Ý£¨ÈçÉÏÃæ´úÂë¿òµÄµ×²¿Ëùʾ£©¡£
ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔÔÚ²âÊÔÊý¾ÝÉÏÆÀ¹ÀÎÒÃǽ¨Á¢µÄÄ£ÐÍ¡£ÎÒÃÇ»á²âÊÔ 3 ¸öÄ£ÐÍ£º
ÈÃÎÒÃÇΪÿ¸öÄ£ÐÍÉú³ÉÔ¤²â½á¹û£º
y_pred?=?rfc.predict(X_test_scaled)?y_pred_pca?=?rfc.predict(X_test_scaled_pca)?y_pred_gs?=?gs.best_estimator_.predict(X_test_scaled_pca)?
È»ºó£¬ÎÒÃÇΪÿ¸öÄ£ÐÍ´´½¨»ìÏý¾ØÕ󣬲鿴ÿ¸öÄ£ÐͶÔÈéÏÙ°©µÄÔ¤²âÄÜÁ¦£º
from?sklearn.metrics?import?confusion_matrixconf_matrix_baseline?=?pd.DataFrame(confusion_matrix(y_test,?y_pred),?index?=?['actual?0',?'actual?1'],?columns?=?['predicted?0',?'predicted?1'])conf_matrix_baseline_pca?=?pd.DataFrame(confusion_matrix(y_test,?y_pred_pca),?index?=?['actual?0',?'actual?1'],?columns?=?['predicted?0',?'predicted?1'])conf_matrix_tuned_pca?=?pd.DataFrame(confusion_matrix(y_test,?y_pred_gs),?index?=?['actual?0',?'actual?1'],?columns?=?['predicted?0',?'predicted?1'])display(conf_matrix_baseline)?display('Baseline?Random?Forest?recall?score',?recall_score(y_test,?y_pred))?display(conf_matrix_baseline_pca)?display('Baseline?Random?Forest?With?PCA?recall?score',?recall_score(y_test,?y_pred_pca))?display(conf_matrix_tuned_pca)?display('Hyperparameter?Tuned?Random?Forest?With?PCA?Reduced?Dimensionality?recall?score',?recall_score(y_test,?y_pred_gs))?
ÏÂÃæÊÇÔ¤²â½á¹û£º
ÎÒÃǽ«ÕÙ»ØÂÊ×÷ΪÐÔÄÜÖ¸±ê£¬ÒòΪÎÒÃÇ´¦ÀíµÄÊÇ°©Ö¢Õï¶Ï£¬ÎÒÃÇ×î¹ØÐĵÄÊǽ«Ä£ÐÍÖеļÙÒõÐÔÔ¤²âÎó²î×îС¡£
¿¼Âǵ½ÕâÒ»µã£¬¿´ÆðÀ´ÎÒÃǵĻùÏßËæ»úÉÁÖÄ£ÐͱíÏÖ×îºÃ£¬Õٻص÷ÖΪ 94.97£¥¡£¸ù¾ÝÎÒÃǵIJâÊÔÊý¾Ý¼¯£¬»ùÏßÄ£ÐÍ¿ÉÒÔÕýÈ·Ô¤²â 179 Ãû°©Ö¢»¼ÕßÖÐµÄ 170 Ãû¡£
Õâ¸ö°¸ÀýÑо¿Ìá³öÁËÒ»¸öÖØÒªµÄ×¢ÒâÊÂÏÓÐʱ£¬ÔÚ PCA Ö®ºó£¬ÉõÖÁÔÚ½øÐдóÁ¿µÄ³¬²ÎÊýµ÷ÕûÖ®ºó£¬µ÷ÕûµÄÄ£ÐÍÐÔÄÜ¿ÉÄܲ»ÈçÆÕͨµÄ¡¸Ôʼ¡¹Ä£ÐÍ¡£µ«Êdz¢ÊÔºÜÖØÒª£¬Äã²»³¢ÊÔ£¬¾ÍÓÀÔ¶¶¼²»ÖªµÀÄÄÖÖÄ£ÐÍ×îºÃ¡£ÔÚÔ¤²â°©Ö¢·½Ã棬ģÐÍÔ½ºÃ£¬¿ÉÒÔÍì¾ÈµÄÉúÃü¾Í¸ü¶à¡£
ÁìȡרÊô 10ÔªÎÞÃż÷ȯ
˽Ïí×îР¼¼Êõ¸É»õ