ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生
?
?
?
?
?
?
目录
基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生
?
?
推荐文章
Py之featuretools:featuretools库的简介、安装、使用方法之详细攻略
ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生
ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生实现
?
contents={"name": ['Bob', ? ? ? ?'LiSa', ? ? ? ? ? ? ? ? ? ? 'Mary', ? ? ? ? ? ? ? ? ? ? ? 'Alan'],
? ? ? ? ? "ID": ? [1, ? ? ? ? ? ? ?2, ? ? ? ? ? ? ? ? ? ? ? ? ? ?3, ? ? ? ? ? ? ? ? ? ? ? ? ? ?4], ? ?# 输出 NaN
? ? ? ? ? "age": ?[np.nan, ? ? ? ?28, ? ? ? ? ? ? ? ? ? ? ? ? ? 38 , ? ? ? ? ? ? ? ? ? ? ? ? ?'' ], ? # 输出?
? ? ? ? "born": [pd.NaT, ? ? pd.Timestamp("1990-01-01"), ?pd.Timestamp("1980-01-01"), ? ? ? ?''], ? ? # 输出 NaT
? ? ? ? ? "sex": ?['男', ? ? ? ? ?'女', ? ? ? ? ? ? ? ? ? ? ? ?'女', ? ? ? ? ? ? ? ? ? ? ? ?'男',], ? # 输出 None
? ? ? ? ? "hobbey":['打篮球', ? ? '打羽毛球', ? ? ? ? ? ? ? ? ? '打乒乓球', ? ? ? ? ? ? ? ? ? ?'',], ? # 输出?
? ? ? ? ? "money":[200.0, ? ? ? ? ? ? ? ?240.0, ? ? ? ? ? ? ? ? ? 290.0, ? ? ? ? ? ? ? ? ? ? 300.0], ?# 输出
? ? ? ? ? "weight":[140.5, ? ? ? ? ? ? ? ?120.8, ? ? ? ? ? ? ? ? 169.4, ? ? ? ? ? ? ? ? ? ? ?155.6], ?# 输出
? ? ? ? ? }
?
?
name ID age born sex hobbey money weight
0 Bob 1 NaN NaT 男 打篮球 200.0 140.5
1 LiSa 2 28 1990-01-01 女 打羽毛球 240.0 120.8
2 Mary 3 38 1980-01-01 女 打乒乓球 290.0 169.4
3 Alan 4 NaT 男 300.0 155.6
-------------------------------------------
nums_df:----------------------------------
name ID age money weight
0 Bob 1 NaN 200.0 140.5
1 LiSa 2 28.0 240.0 120.8
2 Mary 3 38.0 290.0 169.4
3 Alan 4 NaN 300.0 155.6
cats_df:----------------------------------
ID hobbey sex born
0 4 NaN 男 NaN
1 1 打篮球 男 NaN
2 2 打羽毛球 女 1990-01-01
---------------------------------DFS设计:-----------------------------------
feature_matrix_nums
ID age money weight cats.hobbey cats.sex cats.COUNT(nums) \
name
Bob 1 NaN 200.0 140.5 打篮球 男 1.0
LiSa 2 28.0 240.0 120.8 打羽毛球 女 1.0
Mary 3 38.0 290.0 169.4 NaN NaN NaN
cats.MAX(nums.age) cats.MAX(nums.money) cats.MAX(nums.weight) \
name
Bob NaN 200.0 140.5
LiSa 28.0 240.0 120.8
Mary NaN NaN NaN
cats.MEAN(nums.age) cats.MEAN(nums.money) cats.MEAN(nums.weight) \
name
Bob NaN 200.0 140.5
LiSa 28.0 240.0 120.8
Mary NaN NaN NaN
cats.MIN(nums.age) cats.MIN(nums.money) cats.MIN(nums.weight) \
name
Bob NaN 200.0 140.5
LiSa 28.0 240.0 120.8
Mary NaN NaN NaN
cats.SKEW(nums.age) cats.SKEW(nums.money) cats.SKEW(nums.weight) \
name
Bob NaN NaN NaN
LiSa NaN NaN NaN
Mary NaN NaN NaN
cats.STD(nums.age) cats.STD(nums.money) cats.STD(nums.weight) \
name
Bob NaN NaN NaN
LiSa NaN NaN NaN
Mary NaN NaN NaN
cats.SUM(nums.age) cats.SUM(nums.money) cats.SUM(nums.weight) \
name
Bob 0.0 200.0 140.5
LiSa 28.0 240.0 120.8
Mary NaN NaN NaN
cats.DAY(born) cats.MONTH(born) cats.WEEKDAY(born) cats.YEAR(born)
name
Bob NaN NaN NaN NaN
LiSa 1.0 1.0 0.0 1990.0
Mary NaN NaN NaN NaN
features_defs_nums: 29 [<Feature: ID>, <Feature: age>, <Feature: money>, <Feature: weight>, <Feature: cats.hobbey>, <Feature: cats.sex>, <Feature: cats.COUNT(nums)>, <Feature: cats.MAX(nums.age)>, <Feature: cats.MAX(nums.money)>, <Feature: cats.MAX(nums.weight)>, <Feature: cats.MEAN(nums.age)>, <Feature: cats.MEAN(nums.money)>, <Feature: cats.MEAN(nums.weight)>, <Feature: cats.MIN(nums.age)>, <Feature: cats.MIN(nums.money)>, <Feature: cats.MIN(nums.weight)>, <Feature: cats.SKEW(nums.age)>, <Feature: cats.SKEW(nums.money)>, <Feature: cats.SKEW(nums.weight)>, <Feature: cats.STD(nums.age)>, <Feature: cats.STD(nums.money)>, <Feature: cats.STD(nums.weight)>, <Feature: cats.SUM(nums.age)>, <Feature: cats.SUM(nums.money)>, <Feature: cats.SUM(nums.weight)>, <Feature: cats.DAY(born)>, <Feature: cats.MONTH(born)>, <Feature: cats.WEEKDAY(born)>, <Feature: cats.YEAR(born)>]
feature_matrix_cats_df
hobbey sex COUNT(nums) MAX(nums.age) MAX(nums.money) MAX(nums.weight) \
ID
4 NaN 男 1 NaN 300.0 155.6
1 打篮球 男 1 NaN 200.0 140.5
2 打羽毛球 女 1 28.0 240.0 120.8
MEAN(nums.age) MEAN(nums.money) MEAN(nums.weight) MIN(nums.age) \
ID
4 NaN 300.0 155.6 NaN
1 NaN 200.0 140.5 NaN
2 28.0 240.0 120.8 28.0
MIN(nums.money) MIN(nums.weight) SKEW(nums.age) SKEW(nums.money) \
ID
4 300.0 155.6 NaN NaN
1 200.0 140.5 NaN NaN
2 240.0 120.8 NaN NaN
SKEW(nums.weight) STD(nums.age) STD(nums.money) STD(nums.weight) \
ID
4 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
SUM(nums.age) SUM(nums.money) SUM(nums.weight) DAY(born) MONTH(born) \
ID
4 0.0 300.0 155.6 NaN NaN
1 0.0 200.0 140.5 NaN NaN
2 28.0 240.0 120.8 1.0 1.0
WEEKDAY(born) YEAR(born)
ID
4 NaN NaN
1 NaN NaN
2 0.0 1990.0
features_defs_cats_df: 25 [<Feature: hobbey>, <Feature: sex>, <Feature: COUNT(nums)>, <Feature: MAX(nums.age)>, <Feature: MAX(nums.money)>, <Feature: MAX(nums.weight)>, <Feature: MEAN(nums.age)>, <Feature: MEAN(nums.money)>, <Feature: MEAN(nums.weight)>, <Feature: MIN(nums.age)>, <Feature: MIN(nums.money)>, <Feature: MIN(nums.weight)>, <Feature: SKEW(nums.age)>, <Feature: SKEW(nums.money)>, <Feature: SKEW(nums.weight)>, <Feature: STD(nums.age)>, <Feature: STD(nums.money)>, <Feature: STD(nums.weight)>, <Feature: SUM(nums.age)>, <Feature: SUM(nums.money)>, <Feature: SUM(nums.weight)>, <Feature: DAY(born)>, <Feature: MONTH(born)>, <Feature: WEEKDAY(born)>, <Feature: YEAR(born)>]
<Feature: SUM(nums.age)>
The sum of the "age" of all instances of "nums" for each "ID" in "cats".
?
?
features_defs_cats_df: 25
[<Feature: hobbey>, <Feature: sex>, <Feature: COUNT(nums)>, <Feature: MAX(nums.age)>, <Feature: MAX(nums.money)>, <Feature: MAX(nums.weight)>, <Feature: MEAN(nums.age)>, <Feature: MEAN(nums.money)>, <Feature: MEAN(nums.weight)>, <Feature: MIN(nums.age)>, <Feature: MIN(nums.money)>, <Feature: MIN(nums.weight)>, <Feature: SKEW(nums.age)>, <Feature: SKEW(nums.money)>, <Feature: SKEW(nums.weight)>, <Feature: STD(nums.age)>, <Feature: STD(nums.money)>, <Feature: STD(nums.weight)>, <Feature: SUM(nums.age)>, <Feature: SUM(nums.money)>, <Feature: SUM(nums.weight)>, <Feature: DAY(born)>, <Feature: MONTH(born)>, <Feature: WEEKDAY(born)>, <Feature: YEAR(born)>]
ID | hobbey | sex | COUNT(nums) | MAX(nums.age) | MAX(nums.money) | MAX(nums.weight) | MEAN(nums.age) | MEAN(nums.money) | MEAN(nums.weight) | MIN(nums.age) | MIN(nums.money) | MIN(nums.weight) | SKEW(nums.age) | SKEW(nums.money) | SKEW(nums.weight) | STD(nums.age) | STD(nums.money) | STD(nums.weight) | SUM(nums.age) | SUM(nums.money) | SUM(nums.weight) | DAY(born) | MONTH(born) | WEEKDAY(born) | YEAR(born) |
4 | ? | 男 | 1 | ? | 300 | 155.6 | ? | 300 | 155.6 | ? | 300 | 155.6 | ? | ? | ? | ? | ? | ? | 0 | 300 | 155.6 | ? | ? | ? | ? |
1 | 打篮球 | 男 | 1 | ? | 200 | 140.5 | ? | 200 | 140.5 | ? | 200 | 140.5 | ? | ? | ? | ? | ? | ? | 0 | 200 | 140.5 | ? | ? | ? | ? |
2 | 打羽毛球 | 女 | 1 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | ? | ? | ? | ? | ? | ? | 28 | 240 | 120.8 | 1 | 1 | 0 | 1990 |
?
ID | hobbey | sex | COUNT(nums) | ? | ? | ? | ? | ? | ? |
4 | ? | 男 | 1 | ? | ? | ? | ? | ? | ? |
1 | 打篮球 | 男 | 1 | ? | ? | ? | ? | ? | ? |
2 | 打羽毛球 | 女 | 1 | ? | ? | ? | ? | ? | ? |
? | MAX(nums.age) | MAX(nums.money) | MAX(nums.weight) | MEAN(nums.age) | MEAN(nums.money) | MEAN(nums.weight) | MIN(nums.age) | MIN(nums.money) | MIN(nums.weight) |
? | ? | 300 | 155.6 | ? | 300 | 155.6 | ? | 300 | 155.6 |
? | ? | 200 | 140.5 | ? | 200 | 140.5 | ? | 200 | 140.5 |
? | 28 | 240 | 120.8 | 28 | 240 | 120.8 | 28 | 240 | 120.8 |
? | SKEW(nums.age) | SKEW(nums.money) | SKEW(nums.weight) | STD(nums.age) | STD(nums.money) | STD(nums.weight) | SUM(nums.age) | SUM(nums.money) | SUM(nums.weight) |
? | ? | ? | ? | ? | ? | ? | 0 | 300 | 155.6 |
? | ? | ? | ? | ? | ? | ? | 0 | 200 | 140.5 |
? | ? | ? | ? | ? | ? | ? | 28 | 240 | 120.8 |
? | DAY(born) | MONTH(born) | WEEKDAY(born) | YEAR(born) | ? | ? | ? | ? | ? |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
? | 1 | 1 | 0 | 1990 | ? | ? | ? | ? | ? |
字段解释:
?
?
features_defs_nums: 29
[<Feature: ID>, <Feature: age>, <Feature: money>, <Feature: weight>, <Feature: cats.hobbey>, <Feature: cats.sex>, <Feature: cats.COUNT(nums)>, <Feature: cats.MAX(nums.age)>, <Feature: cats.MAX(nums.money)>, <Feature: cats.MAX(nums.weight)>, <Feature: cats.MEAN(nums.age)>, <Feature: cats.MEAN(nums.money)>, <Feature: cats.MEAN(nums.weight)>, <Feature: cats.MIN(nums.age)>, <Feature: cats.MIN(nums.money)>, <Feature: cats.MIN(nums.weight)>, <Feature: cats.SKEW(nums.age)>, <Feature: cats.SKEW(nums.money)>, <Feature: cats.SKEW(nums.weight)>, <Feature: cats.STD(nums.age)>, <Feature: cats.STD(nums.money)>, <Feature: cats.STD(nums.weight)>, <Feature: cats.SUM(nums.age)>, <Feature: cats.SUM(nums.money)>, <Feature: cats.SUM(nums.weight)>, <Feature: cats.DAY(born)>, <Feature: cats.MONTH(born)>, <Feature: cats.WEEKDAY(born)>, <Feature: cats.YEAR(born)>]
name | ID | age | money | weight | cats.hobbey | cats.sex | cats.COUNT(nums) | cats.MAX(nums.age) | cats.MAX(nums.money) | cats.MAX(nums.weight) | cats.MEAN(nums.age) | cats.MEAN(nums.money) | cats.MEAN(nums.weight) | cats.MIN(nums.age) | cats.MIN(nums.money) | cats.MIN(nums.weight) | cats.SKEW(nums.age) | cats.SKEW(nums.money) | cats.SKEW(nums.weight) | cats.STD(nums.age) | cats.STD(nums.money) | cats.STD(nums.weight) | cats.SUM(nums.age) | cats.SUM(nums.money) | cats.SUM(nums.weight) | cats.DAY(born) | cats.MONTH(born) | cats.WEEKDAY(born) | cats.YEAR(born) |
Bob | 1 | ? | 200 | 140.5 | 打篮球 | 男 | 1 | ? | 200 | 140.5 | ? | 200 | 140.5 | ? | 200 | 140.5 | ? | ? | ? | ? | ? | ? | 0 | 200 | 140.5 | ? | ? | ? | ? |
LiSa | 2 | 28 | 240 | 120.8 | 打羽毛球 | 女 | 1 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | ? | ? | ? | ? | ? | ? | 28 | 240 | 120.8 | 1 | 1 | 0 | 1990 |
Mary | 3 | 38 | 290 | 169.4 | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
Alan | 4 | ? | 300 | 155.6 | ? | 男 | 1 | ? | 300 | 155.6 | ? | 300 | 155.6 | ? | 300 | 155.6 | ? | ? | ? | ? | ? | ? | 0 | 300 | 155.6 | ? | ? | ? | ? |
?
name | ID | age | money | weight | ? | ? | ? | ? | ? |
Bob | 1 | ? | 200 | 140.5 | ? | ? | ? | ? | ? |
LiSa | 2 | 28 | 240 | 120.8 | ? | ? | ? | ? | ? |
Mary | 3 | 38 | 290 | 169.4 | ? | ? | ? | ? | ? |
Alan | 4 | ? | 300 | 155.6 | ? | ? | ? | ? | ? |
? | cats.hobbey | cats.sex | cats.COUNT(nums) | ? | ? | ? | ? | ? | ? |
? | 打篮球 | 男 | 1 | ? | ? | ? | ? | ? | ? |
? | 打羽毛球 | 女 | 1 | ? | ? | ? | ? | ? | ? |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
? | ? | 男 | 1 | ? | ? | ? | ? | ? | ? |
? | cats.MAX(nums.age) | cats.MAX(nums.money) | cats.MAX(nums.weight) | cats.MEAN(nums.age) | cats.MEAN(nums.money) | cats.MEAN(nums.weight) | cats.MIN(nums.age) | cats.MIN(nums.money) | cats.MIN(nums.weight) |
? | ? | 200 | 140.5 | ? | 200 | 140.5 | ? | 200 | 140.5 |
? | 28 | 240 | 120.8 | 28 | 240 | 120.8 | 28 | 240 | 120.8 |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
? | ? | 300 | 155.6 | ? | 300 | 155.6 | ? | 300 | 155.6 |
? | cats.SKEW(nums.age) | cats.SKEW(nums.money) | cats.SKEW(nums.weight) | cats.STD(nums.age) | cats.STD(nums.money) | cats.STD(nums.weight) | cats.SUM(nums.age) | cats.SUM(nums.money) | cats.SUM(nums.weight) |
? | ? | ? | ? | ? | ? | ? | 0 | 200 | 140.5 |
? | ? | ? | ? | ? | ? | ? | 28 | 240 | 120.8 |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
? | ? | ? | ? | ? | ? | ? | 0 | 300 | 155.6 |
? | cats.DAY(born) | cats.MONTH(born) | cats.WEEKDAY(born) | cats.YEAR(born) | ? | ? | ? | ? | ? |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
? | 1 | 1 | 0 | 1990 | ? | ? | ? | ? | ? |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
?
字段解释:
?
1、首先去官网下载FCKeditor2.6.5 多国语言版。http://ckeditor.com/download,...
asp使用MSXML2.ServerXMLHTTP异步发送请求时,需要注意判断MSXML2.ServerXMLHTTP...
在 macOS Big Sur 的众多新功能中,Safari 的升级无疑是许多人期待已久的特性之...
本文转载自网络,原文链接:https://mp.weixin.qq.com/s/nRU1cbbWSZ1IW29KgKoVCQ...
在去年下半年的win 10功能更新中,新版Edge已经成为默认浏览器,很多用户已经开...
首先,我们复习一下InputStream read方法的基础知识, java InputStream read方...
vscode怎么浏览器打开html预览?这里大家可以通过安装open in browser插件解决。...
前言 日常的开发,我们难免会创建错误的git提交记录,整个时候git给我们提供了两...
【51CTO.com原创稿件】关于学习这件事情宁可花点时间系统学习,也不要东一榔头西...
详解http请求中的Content-Type http头部字段Content-Type约定请求和响应的HTTP b...