Mutual information

原创

403 Forbidden

发布于 2021-05-19 14:24:17

2900

发布于 2021-05-19 14:24:17

文章被收录于专栏：hsdoifh biuwedsyhsdoifh biuwedsy

Lecture 10: Mutual information

Mutual Information(MI)
- detect non-linear, discrete features
- preprocessing: continuous features are first discreted into bins (by domain knowledge or equal width or equal frequency)
- Entropy is a measuer used to assess the amount of uncertainty in an outcome
- about how to calculate MI is on the formula sheet
- mutual information indicate that the amount of information about X we gain from knowing Y
- Normalised mutual information: NMI can be used to provide a more interpretable measure of correlation than MI
- difference with Pearson correlation:
  - MI can measure non-linear relation
  - MI is very effective for use with discrete features

-understand the meaning of the variables in the (normalised) mutual information and how they can be calculated. Be able to compute this measure on a pair of features. The formula for (normalised) mutual information will be provided on the exam.

Normalised Mutual Information (NMI)
- Range [0,1]
- Large = high correlation
- Small = low correlation

-understand the role of data discretization in computing (normalised) mutual information

Variable discretization
- Doman knowledge: assign thresholds manually
- Equal-width bin
  - Divide the range of continuous feature into equal length intervals
  - Width =
- Equal frequency bin
  - Divide range of continuous feature into equal frequency intervals
  - Sort the values & divide -> each bin has same number of objects
  - Freq =

-understand the meaning of the entropy of a random variable and how to interpret an entropy value. Understand its extension to conditional entropy

Entropy
- A measure used to assess the amount of uncertainty in an outcome
  - quantify degree of uncertainty
  - amount of randomness
- Low entropy = more certain
- Higher entropy = less certain
X = feature
pi = proportion of points in the i-th bin
Example
H(x) >= 0
Entropy is maximized for uniform distribution
conditional entropy H(y|x)
Measures how much information needed to describe outcome Y, given that outcome X is known
The amount of information shared between two variables X & Y
- The amount of information about X gained by knowing Y
- The amount of information about Y gained by knowing X
MI (X, Y)
- Large -> highly correlated (more dependent)
- Small -> low correlation (more independent)
- >=0
0 <= MI(X,Y) <= min( H(X),H(Y) )

-be able to interpret the meaning of the (normalised) mutual information between two variables

Where X and Y are features (columns) in a dataset. MI (mutual information) is a measure of correlation
the amount of information about X we gain by knowing Y, or the amount of information about Y we gain by knowing X
MI(X,Y) is always at least zero, may be larger than 1
A correlation measure that can detect non-linear relationships
Operates with discrete features
Pre-processing: continues features are discretised into bins
In fact, one can show it is true that
- 0 ≤ MI(X,Y) ≤ min(H(X),H(Y)) (where min(a,b) indicates the minimum of a and b)
- Thus if want a measure in the interval [0,1], we can define normalized mutual information (NMI)
NMI(X,Y) = MI(X,Y) / min(H(X),H(Y))
NMI(X,Y)
- large: X and Y are highly correlated (more dependent)
- small: X and Y have low correlation (more independent)

-understand the use of (normalised) mutual information for computing correlation of some feature with a class feature and why this is useful. Understand how this provides a ranking of features, according to their predictiveness of the class

NMI is a good measure for determining the quality of clustering.
It is an external measure because we need the class labels of the instances to determine the NMI.
Since it’s normalized we can measure and compare the NMI between different clusterings having different number of clusters.

-understand that normalised mutual information can be used to provide a more interpretable measure of correlation than mutual information. The formula for normalised mutual information will be provided on the exam

-understand the advantages and disadvantages of using (normalised) mutual information for computing correlation between a pair of features. Understand the main differences between this and Pearson correlation.

Advantage
- Can detect both linear & non-linear dependencies
- Applicable and very effective for use with discrete features
Disadvantage
- If feature is continuous, it first must be discretised to compute mutual information
- Making choices about what bins to use
- Different bin choices will lead to different estimations of mutual information
Difference with Pearson correlation:
- MI can measure non-linear relation
- MI is very effective for use with discrete features

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

python

linux

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

python

linux

登录后参与评论

0 条评论

热度

Mutual information

Mutual information

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐