前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >R语言特征选择方法——最佳子集回归、逐步回归|附代码数据

R语言特征选择方法——最佳子集回归、逐步回归|附代码数据

原创
作者头像
拓端
发布2023-02-20 22:22:32
9190
发布2023-02-20 22:22:32
举报
文章被收录于专栏:拓端tecdat拓端tecdat

原文链接:http://tecdat.cn/?p=5453

最近我们被客户要求撰写关于特征选择方法的研究报告,包括一些图形和统计输出。

变量选择方法

所有可能的回归

代码语言:javascript
复制
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
ols_all_subset(model)

##?#?A?tibble:?15?x?6
##????Index?????N??????Predictors?`R-Square`?`Adj.?R-Square`?`Mallow's?Cp`
##??????????????????????????????????????????
##??1?????1?????1??????????????wt????0.75283?????????0.74459??????12.48094
##??2?????2?????1????????????disp????0.71834?????????0.70895??????18.12961
##??3?????3?????1??????????????hp????0.60244?????????0.58919??????37.11264
##??4?????4?????1????????????qsec????0.17530?????????0.14781?????107.06962
##??5?????5?????2???????????hp?wt????0.82679?????????0.81484???????2.36900
##??6?????6?????2?????????wt?qsec????0.82642?????????0.81444???????2.42949
##??7?????7?????2?????????disp?wt????0.78093?????????0.76582???????9.87910
##??8?????8?????2?????????disp?hp????0.74824?????????0.73088??????15.23312
##??9?????9?????2???????disp?qsec????0.72156?????????0.70236??????19.60281
##?10????10?????2?????????hp?qsec????0.63688?????????0.61183??????33.47215
##?11????11?????3??????hp?wt?qsec????0.83477?????????0.81706???????3.06167
##?12????12?????3??????disp?hp?wt????0.82684?????????0.80828???????4.36070
##?13????13?????3????disp?wt?qsec????0.82642?????????0.80782???????4.42934
##?14????14?????3????disp?hp?qsec????0.75420?????????0.72786??????16.25779
##?15????15?????4?disp?hp?wt?qsec????0.83514?????????0.81072???????5.00000

plot方法显示了所有可能的回归方法的拟合 ?。

代码语言:javascript
复制
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
k?<-?ols_all_subset(model)
plot(k)
图片
图片
图片
图片

最佳子集回归

选择在满足一些明确的客观标准时做得最好的预测变量的子集,例如具有最大R2值或最小MSE, Cp或AIC。

代码语言:javascript
复制
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
ols_best_subset(model)

##????Best?Subsets?Regression????
##?------------------------------
##?Model?Index????Predictors
##?------------------------------
##??????1?????????wt??????????????
##??????2?????????hp?wt???????????
##??????3?????????hp?wt?qsec??????
##??????4?????????disp?hp?wt?qsec?
##?------------------------------
##?
##???????????????????????????????????????????????????Subsets?Regression?Summary???????????????????????????????????????????????????
##?-------------------------------------------------------------------------------------------------------------------------------
##????????????????????????Adj.????????Pred?????????????????????????????????????????????????????????????????????????????????????????
##?Model????R-Square????R-Square????R-Square?????C(p)????????AIC????????SBIC????????SBC????????MSEP??????FPE???????HSP???????APC??
##?-------------------------------------------------------------------------------------------------------------------------------
##???1????????0.7528??????0.7446??????0.7087????12.4809????166.0294????74.2916????170.4266????9.8972????9.8572????0.3199????0.2801?
##???2????????0.8268??????0.8148??????0.7811?????2.3690????156.6523????66.5755????162.5153????7.4314????7.3563????0.2402????0.2091?
##???3????????0.8348??????0.8171???????0.782?????3.0617????157.1426????67.7238????164.4713????7.6140????7.4756????0.2461????0.2124?
##???4????????0.8351??????0.8107???????0.771?????5.0000????159.0696????70.0408????167.8640????8.1810????7.9497????0.2644????0.2259?
##?-------------------------------------------------------------------------------------------------------------------------------
##?AIC:?Akaike?Information?Criteria?
##??SBIC:?Sawa's?Bayesian?Information?Criteria?
##??SBC:?Schwarz?Bayesian?Criteria?
##??MSEP:?Estimated?error?of?prediction,?assuming?multivariate?normality?
##??FPE:?Final?Prediction?Error?
##??HSP:?Hocking's?Sp?
##??APC:?Amemiya?Prediction?Criteria

plot

代码语言:javascript
复制
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
k?<-?ols_best_subset(model)
plot(k)
图片
图片
图片
图片
图片
图片

逐步前进回归

从一组候选预测变量中建立回归模型,方法是逐步输入基于p值的预测变量,直到没有变量进入变量。该模型应该包括所有的候选预测变量。如果细节设置为TRUE,则显示每个步骤。


点击标题查阅往期内容

图片
图片

R语言多元逐步回归模型分析房价和葡萄酒价格:选择最合适的预测变量

图片
图片

左右滑动查看更多

图片
图片

01

图片
图片

02

图片
图片

03

图片
图片

04

图片
图片

变量选择

代码语言:javascript
复制
#向前逐步回归
model?<-?lm(y?~?.,?data?=?surgical)
ols_step_forward(model)

##?We?are?selecting?variables?based?on?p?value...

##?1?variable(s)?added....

##?1?variable(s)?added...
##?1?variable(s)?added...
##?1?variable(s)?added...
##?1?variable(s)?added...

##?No?more?variables?satisfy?the?condition?of?penter:?0.3

##?Forward?Selection?Method???????????????????????????????????????????????????????
##?
##?Candidate?Terms:???????????????????????????????????????????????????????????????
##?
##?1?.?bcs????????????????????????????????????????????????????????????????????????
##?2?.?pindex?????????????????????????????????????????????????????????????????????
##?3?.?enzyme_test????????????????????????????????????????????????????????????????
##?4?.?liver_test?????????????????????????????????????????????????????????????????
##?5?.?age????????????????????????????????????????????????????????????????????????
##?6?.?gender?????????????????????????????????????????????????????????????????????
##?7?.?alc_mod????????????????????????????????????????????????????????????????????
##?8?.?alc_heavy??????????????????????????????????????????????????????????????????
##?
##?------------------------------------------------------------------------------
##???????????????????????????????Selection?Summary????????????????????????????????
##?------------------------------------------------------------------------------
##?????????Variable?????????????????????Adj.?????????????????????????????????????????
##?Step??????Entered??????R-Square????R-Square?????C(p)????????AIC?????????RMSE??????
##?------------------------------------------------------------------------------
##????1????liver_test???????0.4545??????0.4440????62.5119????771.8753????296.2992????
##????2????alc_heavy????????0.5667??????0.5498????41.3681????761.4394????266.6484????
##????3????enzyme_test??????0.6590??????0.6385????24.3379????750.5089????238.9145????
##????4????pindex???????????0.7501??????0.7297?????7.5373????735.7146????206.5835????
##????5????bcs??????????????0.7809??????0.7581?????3.1925????730.6204????195.4544????
##?------------------------------------------------------------------------------

?
model?<-?lm(y?~?.,?data?=?surgical)
k?<-?ols_step_forward(model)

##?We?are?selecting?variables?based?on?p?value...

##?1?variable(s)?added....

##?1?variable(s)?added...
##?1?variable(s)?added...
##?1?variable(s)?added...
##?1?variable(s)?added...

##?No?more?variables?satisfy?the?condition?of?penter:?0.3

plot(k)
图片
图片
图片
图片
图片
图片


图片
图片

本文摘选 R语言特征选择——逐步回归 ,点击“阅读原文”获取全文完整资料。


点击标题查阅往期内容

R语言多元逐步回归模型分析房价和葡萄酒价格:选择最合适的预测变量 R语言逐步多元回归模型分析长鼻鱼密度影响因素 R语言特征选择——逐步回归 r语言中对LASSO回归,Ridge岭回归和弹性网络Elastic Net模型实现 回归分析与相关分析的区别和联系 R语言分位数回归预测筛选有上升潜力的股票 R语言实现LASSO回归——自己编写LASSO回归算法 R语言泊松Poisson回归模型预测人口死亡率和期望寿命 R语言时间序列TAR阈值自回归模型 R语言用泊松Poisson回归、GAM样条曲线模型预测骑自行车者的数量 R语言分位数回归Quantile Regression分析租房价格 R语言用Garch模型和回归模型对股票价格分析 R语言广义线性模型GLM、多项式回归和广义可加模型GAM预测泰坦尼克号幸存者 R语言分段回归数据数据分析案例报告 R语言实现CNN(卷积神经网络)模型进行回归数据分析 R语言分位数回归、GAM样条曲线、指数平滑和SARIMA对电力负荷时间序列预测

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 原文链接:http://tecdat.cn/?p=5453
  • 变量选择方法
  • 所有可能的回归
  • 最佳子集回归
  • 逐步前进回归
    • 变量选择
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档
    http://www.vxiaotou.com