Ê×Ò³
ѧϰ
»î¶¯
רÇø
¹¤¾ß
TVP
·¢²¼
¾«Ñ¡ÄÚÈÝ/¼¼ÊõÉçȺ/ÓŻݲúÆ·,¾¡ÔÚС³ÌÐò
Á¢¼´Ç°Íù

»úÆ÷ѧϰµÚÒ»²½£¬ÕâÊÇһƪÊÖ°ÑÊÖµÄËæ»úÉ­ÁÖÈëÃÅʵս

µ½ÁË 2020 Ä꣬ÎÒÃÇÒѾ­ÄÜÕÒµ½ºÜ¶àºÃÍæµÄ»úÆ÷ѧϰ½Ì³Ì¡£±¾ÎÄÔò´Ó×îÁ÷ÐеÄËæ»úÉ­ÁÖ³ö·¢£¬ÊÖ°ÑÊÖ½ÌÄã¹¹½¨Ò»¸öÄ£ÐÍ£¬ËüµÄÍêÕûÁ÷³Ìµ½µ×ÊÇʲôÑùµÄ¡£

×÷ΪÊý¾Ý¿Æѧ¼Ò£¬ÎÒÃÇ¿ÉÒÔͨ¹ýºÜ¶à·½·¨À´´´½¨·ÖÀàÄ£ÐÍ¡£×îÊÜ»¶Ó­µÄ·½·¨Ö®Ò»ÊÇËæ»úÉ­ÁÖ¡£ÎÒÃÇ¿ÉÒÔÔÚËæ»úÉ­ÁÖÉϵ÷Õû³¬²ÎÊýÀ´ÓÅ»¯Ä£Ð͵ÄÐÔÄÜ¡£

ÔÚÓÃÄ£ÐÍÄâºÏ֮ǰ£¬³¢ÊÔÖ÷³É·Ö·ÖÎö£¨PCA£©Ò²Êdz£¼ûµÄ×ö·¨¡£µ«ÊÇ£¬ÎªÊ²Ã´»¹ÒªÔö¼ÓÕâÒ»²½ÄØ£¿ÄѵÀËæ»úÉ­ÁÖµÄÄ¿µÄ²»ÊÇ°ïÖúÎÒÃǸüÇáËɵØÀí½âÌØÕ÷ÖØÒªÐÔÂð£¿

µ±ÎÒÃÇ·ÖÎöËæ»úÉ­ÁÖÄ£Ð͵ġ¸ÌØÕ÷ÖØÒªÐÔ¡¹Ê±£¬PCA »áʹÿ¸ö¡¸ÌØÕ÷¡¹µÄ½âÊͱäµÃ¸ü¼ÓÀ§ÄÑ¡£µ«ÊÇ PCA »á½øÐнµÎ¬²Ù×÷£¬Õâ¿ÉÒÔ¼õÉÙËæ»úÉ­ÁÖÒª´¦ÀíµÄÌØÕ÷ÊýÁ¿£¬Òò´Ë PCA ¿ÉÄÜÓÐÖúÓÚ¼Ó¿ìËæ»úÉ­ÁÖÄ£Ð͵ÄѵÁ·Ëٶȡ£

Çë×¢Ò⣬¼ÆËã³É±¾¸ßÊÇËæ»úÉ­ÁÖµÄ×î´óȱµãÖ®Ò»£¨ÔËÐÐÄ£ÐÍ¿ÉÄÜÐèÒªºÜ³¤Ê±¼ä£©¡£ÓÈÆäÊǵ±ÄãʹÓÃÊý°ÙÉõÖÁÉÏǧ¸öÔ¤²âÌØÕ÷ʱ£¬PCA ¾Í±äµÃ·Ç³£ÖØÒª¡£Òò´Ë£¬Èç¹ûÖ»Ïë¼òµ¥µØÓµÓÐ×î¼ÑÐÔÄܵÄÄ£ÐÍ£¬²¢ÇÒ¿ÉÒÔÎþÉü½âÊÍÌØÕ÷µÄÖØÒªÐÔ£¬ÄÇô PCA ¿ÉÄÜ»áºÜÓÐÓá£

ÏÖÔÚÈÃÎÒÃǾٸöÀý×Ó¡£ÎÒÃǽ«Ê¹Óà Scikit-learn µÄ¡¸ÈéÏÙ°©¡¹Êý¾Ý¼¯£¬²¢´´½¨ 3 ¸öÄ£ÐÍ£¬±È½ÏËüÃǵÄÐÔÄÜ£º

1. Ëæ»úÉ­ÁÖ

2. ¾ßÓÐ PCA ½µÎ¬µÄËæ»úÉ­ÁÖ

3. ¾ßÓÐ PCA ½µÎ¬ºÍ³¬²ÎÊýµ÷ÕûµÄËæ»úÉ­ÁÖ

µ¼ÈëÊý¾Ý

Ê×ÏÈ£¬ÎÒÃǼÓÔØÊý¾Ý²¢´´½¨Ò»¸ö DataFrame¡£ÕâÊÇ Scikit-learn Ô¤ÏÈÇåÀíµÄ¡¸toy¡¹Êý¾Ý¼¯£¬Òò´ËÎÒÃÇ¿ÉÒÔ¼ÌÐø¿ìËÙ½¨Ä£¡£µ«ÊÇ£¬×÷Ϊ×î¼Ñʵ¼ù£¬ÎÒÃÇÓ¦¸ÃÖ´ÐÐÒÔϲÙ×÷£º

  • ʹÓà df.head£¨£©²é¿´Ð嵀 DataFrame£¬ÒÔÈ·±£Ëü·ûºÏÔ¤ÆÚ¡£
  • ʹÓà df.info£¨£©¿ÉÒÔÁ˽âÿһÁÐÖеÄÊý¾ÝÀàÐͺÍÊý¾ÝÁ¿¡£¿ÉÄÜÐèÒª¸ù¾ÝÐèҪת»»Êý¾ÝÀàÐÍ¡£
  • ʹÓà df.isna£¨£©È·±£Ã»ÓÐ NaN Öµ¡£¿ÉÄÜÐèÒª¸ù¾ÝÐèÒª´¦Àíȱʧֵ»òɾ³ýÐС£
  • ʹÓà df.describe£¨£©¿ÉÒÔÁ˽âÿÁеÄ×îСֵ¡¢×î´óÖµ¡¢¾ùÖµ¡¢ÖÐλÊý¡¢±ê×¼²îºÍËÄ·ÖλÊý·¶Î§¡£

ÃûΪ¡¸cancer¡¹µÄÁÐÊÇÎÒÃÇҪʹÓÃÄ£ÐÍÔ¤²âµÄÄ¿±ê±äÁ¿¡£¡¸0¡¹±íʾ¡¸ÎÞ°©Ö¢¡¹£¬¡¸1¡¹±íʾ¡¸°©Ö¢¡¹¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
import?pandas?as?pd?from?sklearn.datasets?import?load_breast_cancercolumns?=?['mean?radius',?'mean?texture',?'mean?perimeter',?'mean?area',?'mean?smoothness',?'mean?compactness',?'mean?concavity',?'mean?concave?points',?'mean?symmetry',?'mean?fractal?dimension',?'radius?error',?'texture?error',?'perimeter?error',?'area?error',?'smoothness?error',?'compactness?error',?'concavity?error',?'concave?points?error',?'symmetry?error',?'fractal?dimension?error',?'worst?radius',?'worst?texture',?'worst?perimeter',?'worst?area',?'worst?smoothness',?'worst?compactness',?'worst?concavity',?'worst?concave?points',?'worst?symmetry',?'worst?fractal?dimension']dataset?=?load_breast_cancer()?data?=?pd.DataFrame(dataset['data'],?columns=columns)?data['cancer']?=?dataset['target']display(data.head())?display(data.info())?display(data.isna().sum())?display(data.describe())?

ÉÏͼÊÇÈéÏÙ°© DataFrame µÄÒ»²¿·Ö¡£Ã¿ÐÐÊÇÒ»¸ö»¼ÕߵĹ۲ì½á¹û¡£×îºóÒ»ÁÐÃûΪ¡¸cancer¡¹ÊÇÎÒÃÇÒªÔ¤²âµÄÄ¿±ê±äÁ¿¡£0 ±íʾ¡¸ÎÞ°©Ö¢¡¹£¬1 ±íʾ¡¸°©Ö¢¡¹¡£

ѵÁ·¼¯/²âÊÔ¼¯·Ö¸î

ÏÖÔÚ£¬ÎÒÃÇʹÓà Scikit-learn µÄ¡¸train_test_split¡¹º¯Êý²ð·ÖÊý¾Ý¡£ÎÒÃÇÏëÈÃÄ£ÐÍÓо¡¿ÉÄܶàµÄÊý¾Ý½øÐÐѵÁ·¡£µ«ÊÇ£¬ÎÒÃÇҲҪȷ±£ÓÐ×ã¹»µÄÊý¾ÝÀ´²âÊÔÄ£ÐÍ¡£Í¨³£Êý¾Ý¼¯ÖÐÐÐÊýÔ½¶à£¬ÎÒÃÇ¿ÉÒÔÌṩ¸øѵÁ·¼¯µÄÊý¾ÝÔ½¶à¡£

ÀýÈ磬Èç¹ûÎÒÃÇÓÐÊý°ÙÍòÐУ¬ÄÇôÎÒÃÇ¿ÉÒÔ½«ÆäÖÐµÄ 90£¥ÓÃ×÷ѵÁ·£¬10£¥ÓÃ×÷²âÊÔ¡£µ«ÊÇ£¬ÎÒÃǵÄÊý¾Ý¼¯Ö»ÓÐ 569 ÐУ¬Êý¾ÝÁ¿²¢²»´ó¡£Òò´Ë£¬ÎªÁËÆ¥ÅäÕâÖÖСÐÍÊý¾Ý¼¯£¬ÎÒÃǻὫÊý¾Ý·ÖΪ 50£¥µÄѵÁ·ºÍ 50£¥µÄ²âÊÔ¡£ÎÒÃÇÉèÖà stratify = y ÒÔÈ·±£ÑµÁ·¼¯ºÍ²âÊÔ¼¯ÓëԭʼÊý¾Ý¼¯µÄ 0 ºÍ 1 µÄ±ÈÀýÒ»Ö¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
from?sklearn.model_selection?import?train_test_splitX?=?data.drop('cancer',?axis=1)???y?=?data['cancer']??X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.50,?random_state?=?2020,?stratify=y)?

¹æ·¶»¯Êý¾Ý

ÔÚ½¨Ä£Ö®Ç°£¬ÎÒÃÇÐèÒªÏȽ«Êý¾Ý¡¸¾ÓÖС¹ºÍ¡¸±ê×¼»¯¡¹£¬¶Ô²»Í¬µÄ±äÁ¿ÒªÔÚÏàͬ³ß¶È½øÐвâÁ¿¡£ÎÒÃǽøÐÐËõ·ÅÒÔ±ã¾ö¶¨Ô¤²â±äÁ¿µÄÌØÕ÷¿ÉÒԱ˴ˡ¸¹«Æ½¾ºÕù¡¹¡£ÎÒÃÇ»¹½«¡¸y_train¡¹´Ó Pandas¡¸Series¡¹¶ÔÏóת»»Îª NumPy Êý×飬ÒÔ¹©Ä£ÐÍÉÔºó½ÓÊÕѵÁ·Êý¾Ý¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
import?numpy?as?np?from?sklearn.preprocessing?import?StandardScalerss?=?StandardScaler()?X_train_scaled?=?ss.fit_transform(X_train)?X_test_scaled?=?ss.transform(X_test)?y_train?=?np.array(y_train)?

ÄâºÏ¡¸»ùÏß¡¹Ëæ»úÉ­ÁÖÄ£ÐÍ

ÏÖÔÚ£¬ÎÒÃÇ´´½¨Ò»¸ö¡¸»ùÏß¡¹Ëæ»úÉ­ÁÖÄ£ÐÍ¡£¸ÃÄ£ÐÍʹÓà Scikit-learn Ëæ»úÉ­ÁÖ·ÖÀàÆ÷ÎĵµÖж¨ÒåµÄËùÓÐÔ¤²âÌØÕ÷ºÍĬÈÏÉèÖá£Ê×ÏÈ£¬ÎÒÃÇʵÀý»¯Ä£ÐͲ¢Ê¹Óù淶»¯µÄÊý¾ÝÄâºÏÄ£ÐÍ¡£ÎÒÃÇ¿ÉÒÔͨ¹ýѵÁ·Êý¾Ý²âÁ¿Ä£Ð͵Ä׼ȷÐÔ¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
from?sklearn.ensemble?import?RandomForestClassifier?from?sklearn.metrics?import?recall_scorerfc?=?RandomForestClassifier()?rfc.fit(X_train_scaled,?y_train)?display(rfc.score(X_train_scaled,?y_train))#?1.0?

Èç¹ûÎÒÃÇÏëÖªµÀÄÄЩÌØÕ÷¶ÔËæ»úÉ­ÁÖÄ£ÐÍÔ¤²âÈéÏÙ°©×îÖØÒª£¬ÎÒÃÇ¿ÉÒÔͨ¹ýµ÷Óá¸feature_importances _¡¹·½·¨À´¿ÉÊÓ»¯ºÍÁ¿»¯ÕâЩÖØÒªÌØÕ÷£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
feats?=?{}?for?feature,?importance?in?zip(data.columns,?rfc_1.feature_importances_):?feats[feature]?=?importanceimportances?=?pd.DataFrame.from_dict(feats,?orient='index').rename(columns={0:?'Gini-Importance'})?importances?=?importances.sort_values(by='Gini-Importance',?ascending=False)?importances?=?importances.reset_index()?importances?=?importances.rename(columns={'index':?'Features'})sns.set(font_scale?=?5)?sns.set(style="whitegrid",?color_codes=True,?font_scale?=?1.7)?fig,?ax?=?plt.subplots()?fig.set_size_inches(30,15)?sns.barplot(x=importances['Gini-Importance'],?y=importances['Features'],?data=importances,?color='skyblue')?plt.xlabel('Importance',?fontsize=25,?weight?=?'bold')?plt.ylabel('Features',?fontsize=25,?weight?=?'bold')?plt.title('Feature?Importance',?fontsize=25,?weight?=?'bold')display(plt.show())?display(importances)?

Ö÷³É·Ö·ÖÎö£¨PCA£©

ÏÖÔÚ£¬ÎÒÃÇÈçºÎ¸Ä½ø»ùÏßÄ£ÐÍÄØ£¿Ê¹ÓýµÎ¬£¬ÎÒÃÇ¿ÉÒÔÓøüÉٵıäÁ¿À´ÄâºÏԭʼÊý¾Ý¼¯£¬Í¬Ê±½µµÍÔËÐÐÄ£Ð͵ļÆË㻨Ïú¡£Ê¹Óà PCA£¬ÎÒÃÇ¿ÉÒÔÑо¿ÕâЩÌØÕ÷µÄÀÛ»ý·½²î±È£¬ÒÔÁ˽âÄÄЩÌØÕ÷´ú±íÊý¾ÝÖеÄ×î´ó·½²î¡£

ÎÒÃÇʵÀý»¯ PCA º¯Êý²¢ÉèÖÃÎÒÃÇÒª¿¼Âǵijɷ֣¨ÌØÕ÷£©ÊýÁ¿¡£´Ë´¦ÎÒÃÇÉèÖÃΪ 30£¬ÒԲ鿴ËùÓÐÉú³É³É·ÖµÄ·½²î£¬²¢¾ö¶¨Ôںδ¦ÇиȻºó£¬ÎÒÃǽ«Ëõ·ÅºóµÄ X_train Êý¾Ý¡¸ÄâºÏ¡¹µ½ PCA º¯ÊýÖС£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
import?matplotlib.pyplot?as?plt?import?seaborn?as?sns?from?sklearn.decomposition?import?PCApca_test?=?PCA(n_components=30)?pca_test.fit(X_train_scaled)sns.set(style='whitegrid')?plt.plot(np.cumsum(pca_test.explained_variance_ratio_))?plt.xlabel('number?of?components')?plt.ylabel('cumulative?explained?variance')?plt.axvline(linewidth=4,?color='r',?linestyle?=?'--',?x=10,?ymin=0,?ymax=1)?display(plt.show())evr?=?pca_test.explained_variance_ratio_?cvr?=?np.cumsum(pca_test.explained_variance_ratio_)pca_df?=?pd.DataFrame()?pca_df['Cumulative?Variance?Ratio']?=?cvr?pca_df['Explained?Variance?Ratio']?=?evr?display(pca_df.head(10))?

¸ÃͼÏÔʾ£¬ÔÚ³¬¹ý 10 ¸öÌØÕ÷Ö®ºó£¬ÎÒÃDz¢Î´»ñµÃÌ«¶àµÄ½âÊÍ·½²î¡£´Ë DataFrame ÏÔʾÁËÀÛ»ý·½²î±È£¨½âÊÍÁËÊý¾ÝµÄ×Ü·½²î£©ºÍ½âÊÍ·½²î±È£¨Ã¿¸ö PCA ³É·Ö˵Ã÷Á˶àÉÙÊý¾ÝµÄ×Ü·½²î£©¡£

´ÓÉÏÃæµÄ DataFrame ¿ÉÒÔ¿´³ö£¬µ±ÎÒÃÇʹÓà PCA ½« 30 ¸öÔ¤²â±äÁ¿¼õÉÙµ½ 10 ¸ö·ÖÁ¿Ê±£¬ÎÒÃÇÈÔÈ»¿ÉÒÔ½âÊÍ 95£¥ÒÔÉϵķ½²î¡£ÆäËû 20 ¸ö·ÖÁ¿½ö½âÊÍÁ˲»µ½ 5£¥µÄ·½²î£¬Òò´Ë ÎÒÃÇ¿ÉÒÔ¼õÉÙËûÃǵÄȨÖØ¡£°´´ËÂß¼­£¬ÎÒÃǽ«Ê¹Óà PCA ½« X_train ºÍ X_test µÄ³É·ÖÊýÁ¿´Ó 30 ¸ö¼õÉÙµ½ 10 ¸ö¡£ÎÒÃǽ«ÕâЩÖØд´½¨µÄ¡¸½µÎ¬¡¹Êý¾Ý¼¯·ÖÅä¸ø¡¸X_train_scaled_pca¡¹ºÍ¡¸X_test_scaled_pca¡¹¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
pca?=?PCA(n_components=10)?pca.fit(X_train_scaled)X_train_scaled_pca?=?pca.transform(X_train_scaled)?X_test_scaled_pca?=?pca.transform(X_test_scaled)?

ÿ¸ö·ÖÁ¿¶¼ÊÇԭʼ±äÁ¿ºÍÏàÓ¦¡¸È¨ÖØ¡¹µÄÏßÐÔ×éºÏ¡£Í¨¹ý´´½¨Ò»¸ö DataFrame£¬ÎÒÃÇ¿ÉÒÔ¿´µ½Ã¿¸ö PCA ³É·ÖµÄ¡¸È¨ÖØ¡¹¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
pca_dims?=?[]?for?x?in?range(0,?len(pca_df)):?pca_dims.append('PCA?Component?{}'.format(x))pca_test_df?=?pd.DataFrame(pca_test.components_,?columns=columns,?index=pca_dims)?pca_test_df.head(10).T?

PCA ºóÄâºÏ¡¸»ùÏß¡¹Ëæ»úÉ­ÁÖÄ£ÐÍ

ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔ½« X_train_scaled_pca ºÍ y_train Êý¾ÝÄâºÏµ½ÁíÒ»¸ö¡¸»ùÏß¡¹Ëæ»úÉ­ÁÖÄ£ÐÍÖУ¬²âÊÔÎÒÃǶԸÃÄ£Ð͵ÄÔ¤²âÊÇ·ñÓÐËù¸Ä½ø¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
rfc?=?RandomForestClassifier()?rfc.fit(X_train_scaled_pca,?y_train)display(rfc.score(X_train_scaled_pca,?y_train))#?1.0?

µÚ 1 ÂÖ³¬²ÎÊýµ÷ÓÅ£ºRandomSearchCV

ʵÏÖ PCA Ö®ºó£¬ÎÒÃÇ»¹¿ÉÒÔͨ¹ýһЩ³¬²ÎÊýµ÷ÓÅÀ´µ÷ÕûÎÒÃǵÄËæ»úÉ­ÁÖÒÔ»ñµÃ¸üºÃµÄÔ¤²âЧ¹û¡£³¬²ÎÊý¿ÉÒÔ¿´×÷Ä£Ð͵ġ¸ÉèÖṡ£Á½¸ö²»Í¬Êý¾Ý¼¯µÄÀíÏëÉèÖò¢²»Ïàͬ£¬Òò´ËÎÒÃDZØÐ롸µ÷Õû¡¹Ä£ÐÍ¡£

Ê×ÏÈ£¬ÎÒÃÇ¿ÉÒÔ´Ó RandomSearchCV ¿ªÊ¼¿¼ÂǸü¶àµÄ³¬²ÎÖµ¡£ËùÓÐËæ»úÉ­Áֵij¬²ÎÊý¶¼¿ÉÒÔÔÚ Scikit-learn Ëæ»úÉ­ÁÖ·ÖÀàÆ÷ÎĵµÖÐÕÒµ½¡£

ÎÒÃÇÉú³ÉÒ»¸ö¡¸param_dist¡¹£¬ÆäÖµµÄ·¶Î§ÊÊÓÃÓÚÿ¸ö³¬²ÎÊý¡£ÊµÀý»¯ RandomSearchCV£¬Ê×ÏÈ´«ÈëÎÒÃǵÄËæ»úÉ­ÁÖÄ£ÐÍ£¬È»ºó´«È롸param_dist¡¹¡¢²âÊÔµü´ú´ÎÊýÒÔ¼°½»²æÑéÖ¤´ÎÊý¡£

³¬²ÎÊý¡¸n_jobs¡¹¿ÉÒÔ¾ö¶¨ÒªÊ¹ÓöàÉÙ´¦ÀíÆ÷ÄÚºËÀ´ÔËÐÐÄ£ÐÍ¡£ÉèÖá¸n_jobs = -1¡¹½«Ê¹Ä£ÐÍÔËÐÐ×î¿ì£¬ÒòΪËüʹÓÃÁËËùÓмÆËã»úºËÐÄ¡£

ÎÒÃǽ«µ÷ÕûÕâЩ³¬²ÎÊý£º

  • n_estimators£ºËæ»úÉ­ÁÖÖС¸Ê÷¡¹µÄÊýÁ¿¡£
  • max_features£ºÃ¿¸ö·Ö¸î´¦µÄÌØÕ÷Êý¡£
  • max_depth£ºÃ¿¿ÃÊ÷¿ÉÒÔÓµÓеÄ×î´ó¡¸·ÖÁÑ¡¹Êý¡£
  • min_samples_split£ºÔÚÊ÷µÄ½Úµã·ÖÁÑÇ°ËùÐèµÄ×îÉÙ¹Û²ìÊý¡£
  • min_samples_leaf£ºÃ¿¿ÃÊ÷Ä©¶ËµÄÒ¶½ÚµãËùÐèµÄ×îÉÙ¹Û²ìÊý¡£
  • bootstrap£ºÊÇ·ñʹÓà bootstrapping À´ÎªËæ»úÁÖÖеÄÿ¿ÃÊ÷ÌṩÊý¾Ý¡££¨bootstrapping ÊÇ´ÓÊý¾Ý¼¯ÖнøÐÐÌæ»»µÄËæ»ú³éÑù¡££©
´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
from?sklearn.model_selection?import?RandomizedSearchCVn_estimators?=?[int(x)?for?x?in?np.linspace(start?=?100,?stop?=?1000,?num?=?10)]max_features?=?['log2',?'sqrt']max_depth?=?[int(x)?for?x?in?np.linspace(start?=?1,?stop?=?15,?num?=?15)]min_samples_split?=?[int(x)?for?x?in?np.linspace(start?=?2,?stop?=?50,?num?=?10)]min_samples_leaf?=?[int(x)?for?x?in?np.linspace(start?=?2,?stop?=?50,?num?=?10)]bootstrap?=?[True,?False]param_dist?=?{'n_estimators':?n_estimators,?'max_features':?max_features,?'max_depth':?max_depth,?'min_samples_split':?min_samples_split,?'min_samples_leaf':?min_samples_leaf,?'bootstrap':?bootstrap}rs?=?RandomizedSearchCV(rfc_2,??param_dist,??n_iter?=?100,??cv?=?3,??verbose?=?1,??n_jobs=-1,??random_state=0)rs.fit(X_train_scaled_pca,?y_train)?rs.best_params_??¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª?#?{'n_estimators':?700,?#?'min_samples_split':?2,?#?'min_samples_leaf':?2,?#?'max_features':?'log2',?#?'max_depth':?11,?#?'bootstrap':?True}?

ÔÚ n_iter = 100 ÇÒ cv = 3 µÄÇé¿öÏ£¬ÎÒÃÇ´´½¨ÁË 300 ¸öËæ»úÉ­ÁÖÄ£ÐÍ£¬¶ÔÉÏÃæÊäÈëµÄ³¬²ÎÊý½øÐÐËæ»ú²ÉÑù×éºÏ¡£ÎÒÃÇ¿ÉÒÔµ÷Óá¸best_params¡¹ÒÔ»ñÈ¡ÐÔÄÜ×î¼ÑµÄÄ£ÐͲÎÊý£¨ÈçÉÏÃæ´úÂë¿òµ×²¿Ëùʾ£©¡£

µ«ÊÇ£¬Ïֽ׶εġ¸best_params¡¹¿ÉÄÜÎÞ·¨ÎªÎÒÃÇÌṩ×îÓÐЧµÄÐÅÏ¢£¬ÒÔ»ñȡһϵÁвÎÊýÀ´Ö´ÐÐÏÂÒ»´Î³¬²ÎÊýµ÷Õû¡£ÎªÁËÔÚ¸ü´ó·¶Î§ÄÚ½øÐг¢ÊÔ£¬ÎÒÃÇ¿ÉÒÔÇáËɵػñµÃ RandomSearchCV ½á¹ûµÄ DataFrame¡£

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
rs_df?=?pd.DataFrame(rs.cv_results_).sort_values('rank_test_score').reset_index(drop=True)?rs_df?=?rs_df.drop([?'mean_fit_time',??'std_fit_time',??'mean_score_time',?'std_score_time',??'params',??'split0_test_score',??'split1_test_score',??'split2_test_score',??'std_test_score'],?axis=1)?rs_df.head(10)?

ÏÖÔÚ£¬ÈÃÎÒÃÇÔÚ x ÖáÉÏ´´½¨Ã¿¸ö³¬²ÎÊýµÄÖù״ͼ£¬²¢Õë¶Ôÿ¸öÖµÖÆ×÷Ä£Ð͵Äƽ¾ùµÃ·Ö£¬²é¿´Æ½¾ù¶øÑÔ×îÓŵÄÖµ£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
fig,?axs?=?plt.subplots(ncols=3,?nrows=2)?sns.set(style="whitegrid",?color_codes=True,?font_scale?=?2)?fig.set_size_inches(30,25)sns.barplot(x='param_n_estimators',?y='mean_test_score',?data=rs_df,?ax=axs[0,0],?color='lightgrey')?axs[0,0].set_ylim([.83,.93])axs[0,0].set_title(label?=?'n_estimators',?size=30,?weight='bold')sns.barplot(x='param_min_samples_split',?y='mean_test_score',?data=rs_df,?ax=axs[0,1],?color='coral')?axs[0,1].set_ylim([.85,.93])axs[0,1].set_title(label?=?'min_samples_split',?size=30,?weight='bold')sns.barplot(x='param_min_samples_leaf',?y='mean_test_score',?data=rs_df,?ax=axs[0,2],?color='lightgreen')?axs[0,2].set_ylim([.80,.93])axs[0,2].set_title(label?=?'min_samples_leaf',?size=30,?weight='bold')sns.barplot(x='param_max_features',?y='mean_test_score',?data=rs_df,?ax=axs[1,0],?color='wheat')?axs[1,0].set_ylim([.88,.92])axs[1,0].set_title(label?=?'max_features',?size=30,?weight='bold')sns.barplot(x='param_max_depth',?y='mean_test_score',?data=rs_df,?ax=axs[1,1],?color='lightpink')?axs[1,1].set_ylim([.80,.93])axs[1,1].set_title(label?=?'max_depth',?size=30,?weight='bold')sns.barplot(x='param_bootstrap',y='mean_test_score',?data=rs_df,?ax=axs[1,2],?color='skyblue')?axs[1,2].set_ylim([.88,.92])?

ͨ¹ýÉÏÃæµÄͼ£¬ÎÒÃÇ¿ÉÒÔÁ˽âÿ¸ö³¬²ÎÊýµÄÖµµÄƽ¾ùÖ´ÐÐÇé¿ö¡£

n_estimators£º300¡¢500¡¢700 µÄƽ¾ù·ÖÊý¼¸ºõ×î¸ß£»

min_samples_split£º½ÏСµÄÖµ£¨Èç 2 ºÍ 7£©µÃ·Ö½Ï¸ß¡£23 ´¦µÃ·ÖÒ²ºÜ¸ß¡£ÎÒÃÇ¿ÉÒÔ³¢ÊÔһЩ´óÓÚ 2 µÄÖµ£¬ÒÔ¼° 23 ¸½½üµÄÖµ£»

min_samples_leaf£º½ÏСµÄÖµ¿ÉÄܵõ½¸ü¸ßµÄ·Ö£¬ÎÒÃÇ¿ÉÒÔ³¢ÊÔʹÓà 2¨C7 Ö®¼äµÄÖµ£»

max_features£º¡¸sqrt¡¹¾ßÓÐ×î¸ßƽ¾ù·Ö£»

max_depth£ºÃ»ÓÐÃ÷È·µÄ½á¹û£¬µ«ÊÇ 2¡¢3¡¢7¡¢11¡¢15 µÄЧ¹ûºÜºÃ£»

bootstrap£º¡¸False¡¹¾ßÓÐ×î¸ßƽ¾ù·Ö¡£

ÏÖÔÚÎÒÃÇ¿ÉÒÔÀûÓÃÕâЩ½áÂÛ£¬½øÈëµÚ¶þÂÖ³¬²ÎÊýµ÷Õû£¬ÒÔ½øÒ»²½ËõСѡÔñ·¶Î§¡£

µÚ 2 ÂÖ³¬²ÎÊýµ÷Õû£ºGridSearchCV

ʹÓà RandomSearchCV Ö®ºó£¬ÎÒÃÇ¿ÉÒÔʹÓà GridSearchCV ¶ÔÄ¿Ç°×î¼Ñ³¬²ÎÊýÖ´Ðиü¾«Ï¸µÄËÑË÷¡£³¬²ÎÊýÊÇÏàͬµÄ£¬µ«ÊÇÏÖÔÚÎÒÃÇʹÓà GridSearchCV Ö´Ðиü¡¸Ï꾡¡¹µÄËÑË÷¡£

ÔÚ GridSearchCV ÖУ¬ÎÒÃdz¢ÊÔÿ¸ö³¬²ÎÊýµÄµ¥¶À×éºÏ£¬Õâ±È RandomSearchCV ËùÐèµÄ¼ÆËãÁ¦Òª¶àµÃ¶à£¬ÔÚÕâÀïÎÒÃÇ¿ÉÒÔÖ±½Ó¿ØÖÆÒª³¢ÊԵĵü´ú´ÎÊý¡£ÀýÈ磬½ö¶Ô 6 ¸ö²ÎÊýËÑË÷ 10 ¸ö²»Í¬µÄ²ÎÊýÖµ£¬¾ßÓÐ 3 ÕÛ½»²æÑéÖ¤£¬ÔòÐèÒªÄâºÏÄ£ÐÍ 3,000,000 ´Î£¡Õâ¾ÍÊÇΪʲôÎÒÃÇÔÚʹÓà RandomSearchCV Ö®ºóÖ´ÐÐ GridSearchCV£¬ÕâÄÜ°ïÖúÎÒÃÇÊ×ÏÈËõСËÑË÷·¶Î§¡£

Òò´Ë£¬ÀûÓÃÎÒÃÇ´Ó RandomizedSearchCV ÖÐѧµ½µÄ֪ʶ£¬´úÈëÿ¸ö³¬²ÎÊýµÄƽ¾ù×î¼ÑÖ´Ðз¶Î§£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
from?sklearn.model_selection?import?GridSearchCVn_estimators?=?[300,500,700]?max_features?=?['sqrt']?max_depth?=?[2,3,7,11,15]?min_samples_split?=?[2,3,4,22,23,24]?min_samples_leaf?=?[2,3,4,5,6,7]?bootstrap?=?[False]param_grid?=?{'n_estimators':?n_estimators,?'max_features':?max_features,?'max_depth':?max_depth,?'min_samples_split':?min_samples_split,?'min_samples_leaf':?min_samples_leaf,?'bootstrap':?bootstrap}gs?=?GridSearchCV(rfc_2,?param_grid,?cv?=?3,?verbose?=?1,?n_jobs=-1)?gs.fit(X_train_scaled_pca,?y_train)?rfc_3?=?gs.best_estimator_?gs.best_params_??¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª?#?{'bootstrap':?False,?#?'max_depth':?7,?#?'max_features':?'sqrt',?#?'min_samples_leaf':?3,?#?'min_samples_split':?2,?#?'n_estimators':?500}?

ÔÚÕâÀïÎÒÃǽ«¶Ô 3x 1 x 5x 6 x 6 x 1 = 540 ¸öÄ£ÐͽøÐÐ 3 ÕÛ½»²æÑéÖ¤£¬×ܹ²ÊÇ 1,620 ¸öÄ£ÐÍ£¡ÏÖÔÚ£¬ÔÚÖ´ÐÐ RandomizedSearchCV ºÍ GridSearchCV Ö®ºó£¬ÎÒÃÇ ¿ÉÒÔµ÷Óá¸best_params_¡¹»ñµÃÒ»¸ö×î¼ÑÄ£ÐÍÀ´Ô¤²âÎÒÃǵÄÊý¾Ý£¨ÈçÉÏÃæ´úÂë¿òµÄµ×²¿Ëùʾ£©¡£

¸ù¾Ý²âÊÔÊý¾ÝÆÀ¹ÀÄ£Ð͵ÄÐÔÄÜ

ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔÔÚ²âÊÔÊý¾ÝÉÏÆÀ¹ÀÎÒÃǽ¨Á¢µÄÄ£ÐÍ¡£ÎÒÃÇ»á²âÊÔ 3 ¸öÄ£ÐÍ£º

  • »ùÏßËæ»úÉ­ÁÖ
  • ¾ßÓÐ PCA ½µÎ¬µÄ»ùÏßËæ»úÉ­ÁÖ
  • ¾ßÓÐ PCA ½µÎ¬ºÍ³¬²ÎÊýµ÷ÓŵĻùÏßËæ»úÉ­ÁÖ

ÈÃÎÒÃÇΪÿ¸öÄ£ÐÍÉú³ÉÔ¤²â½á¹û£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
y_pred?=?rfc.predict(X_test_scaled)?y_pred_pca?=?rfc.predict(X_test_scaled_pca)?y_pred_gs?=?gs.best_estimator_.predict(X_test_scaled_pca)?

È»ºó£¬ÎÒÃÇΪÿ¸öÄ£ÐÍ´´½¨»ìÏý¾ØÕ󣬲鿴ÿ¸öÄ£ÐͶÔÈéÏÙ°©µÄÔ¤²âÄÜÁ¦£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
from?sklearn.metrics?import?confusion_matrixconf_matrix_baseline?=?pd.DataFrame(confusion_matrix(y_test,?y_pred),?index?=?['actual?0',?'actual?1'],?columns?=?['predicted?0',?'predicted?1'])conf_matrix_baseline_pca?=?pd.DataFrame(confusion_matrix(y_test,?y_pred_pca),?index?=?['actual?0',?'actual?1'],?columns?=?['predicted?0',?'predicted?1'])conf_matrix_tuned_pca?=?pd.DataFrame(confusion_matrix(y_test,?y_pred_gs),?index?=?['actual?0',?'actual?1'],?columns?=?['predicted?0',?'predicted?1'])display(conf_matrix_baseline)?display('Baseline?Random?Forest?recall?score',?recall_score(y_test,?y_pred))?display(conf_matrix_baseline_pca)?display('Baseline?Random?Forest?With?PCA?recall?score',?recall_score(y_test,?y_pred_pca))?display(conf_matrix_tuned_pca)?display('Hyperparameter?Tuned?Random?Forest?With?PCA?Reduced?Dimensionality?recall?score',?recall_score(y_test,?y_pred_gs))?

ÏÂÃæÊÇÔ¤²â½á¹û£º

ÎÒÃǽ«ÕÙ»ØÂÊ×÷ΪÐÔÄÜÖ¸±ê£¬ÒòΪÎÒÃÇ´¦ÀíµÄÊÇ°©Ö¢Õï¶Ï£¬ÎÒÃÇ×î¹ØÐĵÄÊǽ«Ä£ÐÍÖеļÙÒõÐÔÔ¤²âÎó²î×îС¡£

¿¼Âǵ½ÕâÒ»µã£¬¿´ÆðÀ´ÎÒÃǵĻùÏßËæ»úÉ­ÁÖÄ£ÐͱíÏÖ×îºÃ£¬Õٻص÷ÖΪ 94.97£¥¡£¸ù¾ÝÎÒÃǵIJâÊÔÊý¾Ý¼¯£¬»ùÏßÄ£ÐÍ¿ÉÒÔÕýÈ·Ô¤²â 179 Ãû°©Ö¢»¼ÕßÖÐµÄ 170 Ãû¡£

Õâ¸ö°¸ÀýÑо¿Ìá³öÁËÒ»¸öÖØÒªµÄ×¢ÒâÊÂÏÓÐʱ£¬ÔÚ PCA Ö®ºó£¬ÉõÖÁÔÚ½øÐдóÁ¿µÄ³¬²ÎÊýµ÷ÕûÖ®ºó£¬µ÷ÕûµÄÄ£ÐÍÐÔÄÜ¿ÉÄܲ»ÈçÆÕͨµÄ¡¸Ô­Ê¼¡¹Ä£ÐÍ¡£µ«Êdz¢ÊÔºÜÖØÒª£¬Äã²»³¢ÊÔ£¬¾ÍÓÀÔ¶¶¼²»ÖªµÀÄÄÖÖÄ£ÐÍ×îºÃ¡£ÔÚÔ¤²â°©Ö¢·½Ã棬ģÐÍÔ½ºÃ£¬¿ÉÒÔÍì¾ÈµÄÉúÃü¾Í¸ü¶à¡£

  • ·¢±íÓÚ:
  • Ô­ÎÄÁ´½Ó£ºhttp://news.51cto.com/art/202002/610736.htm
  • ÈçÓÐÇÖȨ£¬ÇëÁªÏµ cloudcommunity@tencent.com ɾ³ý¡£

Ïà¹Ø¿ìѶ

ɨÂë

Ìí¼ÓÕ¾³¤ ½ø½»Á÷Ⱥ

ÁìȡרÊô 10ÔªÎÞÃż÷ȯ

˽Ïí×îР¼¼Êõ¸É»õ

ɨÂë¼ÓÈ뿪·¢ÕßÉçȺ
Áìȯ
http://www.vxiaotou.com