前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Mutual information

Mutual information

原创
作者头像
403 Forbidden
发布2021-05-19 14:24:17
2900
发布2021-05-19 14:24:17
举报
文章被收录于专栏:hsdoifh biuwedsyhsdoifh biuwedsy

Lecture 10: Mutual information

  • Mutual Information(MI)
    • detect non-linear, discrete features
    • preprocessing: continuous features are first discreted into bins (by domain knowledge or equal width or equal frequency)
    • Entropy is a measuer used to assess the amount of uncertainty in an outcome
    • about how to calculate MI is on the formula sheet
    • mutual information indicate that the amount of information about X we gain from knowing Y
    • Normalised mutual information: NMI can be used to provide a more interpretable measure of correlation than MI
    • difference with Pearson correlation:
      • MI can measure non-linear relation
      • MI is very effective for use with discrete features

-understand the meaning of the variables in the (normalised) mutual information and how they can be calculated. Be able to compute this measure on a pair of features. The formula for (normalised) mutual information will be provided on the exam.

  • Normalised Mutual Information (NMI)
    • Range [0,1]
    • Large = high correlation
    • Small = low correlation

-understand the role of data discretization in computing (normalised) mutual information

  • Variable discretization
    • Doman knowledge: assign thresholds manually
    • Equal-width bin
      • Divide the range of continuous feature into equal length intervals
      • Width =
    • Equal frequency bin
      • Divide range of continuous feature into equal frequency intervals
      • Sort the values & divide -> each bin has same number of objects
      • Freq =

-understand the meaning of the entropy of a random variable and how to interpret an entropy value. Understand its extension to conditional entropy

  • Entropy
    • A measure used to assess the amount of uncertainty in an outcome
      • quantify degree of uncertainty
      • amount of randomness
    • Low entropy = more certain
    • Higher entropy = less certain
  • X = feature
  • pi = proportion of points in the i-th bin
  • Example
  • H(x) >= 0
  • Entropy is maximized for uniform distribution
  • conditional entropy H(y|x)
  • Measures how much information needed to describe outcome Y, given that outcome X is known
  • The amount of information shared between two variables X & Y
    • The amount of information about X gained by knowing Y
    • The amount of information about Y gained by knowing X
  • MI (X, Y)
    • Large -> highly correlated (more dependent)
    • Small -> low correlation (more independent)
    • >=0
  • 0 <= MI(X,Y) <= min( H(X),H(Y) )

-be able to interpret the meaning of the (normalised) mutual information between two variables

  • Where X and Y are features (columns) in a dataset. MI (mutual information) is a measure of correlation
  • the amount of information about X we gain by knowing Y, or the amount of information about Y we gain by knowing X
  • MI(X,Y) is always at least zero, may be larger than 1
  • A correlation measure that can detect non-linear relationships
  • Operates with discrete features
  • Pre-processing: continues features are discretised into bins
  • In fact, one can show it is true that
    • 0 ≤ MI(X,Y) ≤ min(H(X),H(Y)) (where min(a,b) indicates the minimum of a and b)
    • Thus if want a measure in the interval [0,1], we can define normalized mutual information (NMI)
  • NMI(X,Y) = MI(X,Y) / min(H(X),H(Y))
  • NMI(X,Y)
    • large: X and Y are highly correlated (more dependent)
    • small: X and Y have low correlation (more independent)

-understand the use of (normalised) mutual information for computing correlation of some feature with a class feature and why this is useful. Understand how this provides a ranking of features, according to their predictiveness of the class

  • NMI is a good measure for determining the quality of clustering.
  • It is an external measure because we need the class labels of the instances to determine the NMI.
  • Since it’s normalized we can measure and compare the NMI between different clusterings having different number of clusters.

-understand that normalised mutual information can be used to provide a more interpretable measure of correlation than mutual information. The formula for normalised mutual information will be provided on the exam

-understand the advantages and disadvantages of using (normalised) mutual information for computing correlation between a pair of features. Understand the main differences between this and Pearson correlation.

  • Advantage
    • Can detect both linear & non-linear dependencies
    • Applicable and very effective for use with discrete features
  • Disadvantage
    • If feature is continuous, it first must be discretised to compute mutual information
    • Making choices about what bins to use
    • Different bin choices will lead to different estimations of mutual information
  • Difference with Pearson correlation:
    • MI can measure non-linear relation
    • MI is very effective for use with discrete features

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档
http://www.vxiaotou.com