统计学习方法-第4章-朴素贝叶斯法

基本方法

输入空间:$\mathcal{X} \subseteq \mathbf{R}^{n}$

输出空间:$\mathcal{Y} = { c_{1},c_{2},\cdots, c_{K} }$

输入:$x \in \mathcal{X}$

输出:$y \in \mathcal{Y}$

$X$ 是定义在输入空间 $\mathcal{X}$ 上的随机变量

$Y$ 是定义在输入空间 $\mathcal{Y}$ 上的随机变量

训练数据集:$T = { \left( x_{1},y_{1} \right),\left( x_{2},y_{2} \right),\cdots,\left( x_{N},y_{N} \right) }$

先验概率分布:

\[P \left( Y = c_{k} \right),\quad k = 1,2,\cdots,K\]

条件概率分布:

\[P \left( X = x \mid Y = c_{k}\right) = P \left( X^{\left( 1 \right)} = x^{\left( 1 \right)},\cdots,X^{\left( n \right)} = x^{\left( n \right)}\mid Y = c_{k} \right), \quad k = 1,2,\cdots,K\]

条件独立性假设:

$$ \begin{align*} P \left( X = x \mid Y = c_{k}\right) & = P \left( X^{\left( 1 \right)} = x^{\left( 1 \right)},\cdots, X^{\left( n \right)} = x^{\left( n \right)}\mid Y = c_{k}\right) \\ & = \prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} \mid Y = c_{k}\right) \end{align*} $$

后验概率分布(贝叶斯定理):

$$ \begin{align*} P \left( Y=c_{k} \mid X=x \right) & = \frac{P \left( X=x \mid Y=c_{k} \right) P \left( Y=c_{k} \right)}{ P \left( X=x \right)} \\ & = \frac{P \left( X=x \mid Y=c_{k} \right) P \left( Y=c_{k} \right)}{\sum_{k} P \left( X=x \mid Y=c_{k} \right) P \left( Y=c_{k} \right)} \\ & = \frac{P \left( Y=c_{k} \right) \prod_{j} P \left( X^{\left( j \right)} = x^{\left( j \right)} \mid Y = c_{k}\right)}{\sum_{k} P \left( Y=c_{k} \right) \prod_{j} P \left( X^{\left( j \right)} = x^{\left( j \right)} \mid Y = c_{k}\right)}, \quad k = 1,2,\cdots,K \end{align*} $$

朴素贝叶斯分类的基本公式:后验概率最大的类作为 $x$ 的类输出

\[y = f\left( x \right) = \arg \max_{c_{k}} \frac{P \left( Y=c_{k} \right) \prod_{j} P \left( X^{\left( j \right)} = x^{\left( j \right)} \mid Y = c_{k}\right)}{\sum_{k} P \left( Y=c_{k} \right) \prod_{j} P \left( X^{\left( j \right)} = x^{\left( j \right)} \mid Y = c_{k}\right)}, \quad k = 1,2,\cdots,K\]

上式中分母对于所有 $c_{k}$ 都是相同的,所以,

\[y = \arg \max_{c_{k}} P \left( Y=c_{k} \right) \prod_{j} P \left( X^{\left( j \right)} = x^{\left( j \right)} \mid Y = c_{k}\right) \quad k = 1,2,\cdots,K\]

后验概率最大化的含义——朴素贝叶斯法原理

假设选择 0-1 损失函数:

$$ L \left( Y,f \left( X \right) \right) = \begin{cases} 1 & Y \neq f \left( X \right) \\ b & Y = f \left( X \right) \end{cases} \qquad f \left( X \right) \text{为决策函数} $$

期望风险函数:

$$ \begin{align*} R_{exp} \left( f \right) & = E_{\left( Y, X \right)} \left[ L \left( Y,f \left( X \right) \right) \right] = \sum_{\left( Y, X \right)} \left[L \left( Y,f \left( X \right) \right) P \left( Y, X \right) \right] \\ & = \sum_{\left( Y, X \right)} \{ \left[ L \left( Y,f \left( X \right) \right) \right] P \left( Y \mid X \right) P \left( X \right) \} \\ & = E_{X} \{ \sum_{ y \in Y } \left[ L \left( y,f \left( X \right) \right) \right] P \left( y \mid X \right) \} \\ & = E_{X} \{ \sum_{k=1}^{K} \left[ L \left( c_{k},f \left( X \right) \right) \right] P \left( c_{k} \mid X \right) \} \end{align*} $$

为了使期望风险最小化,只需对 $X =x$ 逐个极小化

$$ \begin{align*} f \left( x \right) & = \arg \min_{y \in \mathcal{Y}} \sum_{k=1}^{K} L \left( c_{k},y \right) P \left( c_{k} \mid X = x \right) \\ & = \arg \min_{y \in \mathcal{Y}} \sum_{k=1}^{K} P \left( y \neq c_{k} \mid X = x \right) \\ & = \arg \min_{y \in \mathcal{Y}} \left( 1 - P \left( y = c_{k} \mid X = x \right) \right) \\ & = \arg \max_{y \in \mathcal{Y}} P \left( y = c_{k} \mid X = x \right) \end{align*} $$

因此,根据期望风险最小化准则得到了后验概率最大化准则。


参数估计

朴素贝叶斯法中,学习意味着估计先验概率 $\color{blue}{P \left( Y = c_{k} \right)}$条件概率 $\color{green}{P \left( X^{\left( j \right)} = x^{\left( j \right)} \mid Y = c_{k} \right)}$

极大似然估计

先验概率的极大似然估计

\[P \left( Y = c_{k} \right) = \frac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)}{N}, \qquad k = 1,2,\cdots,K\]

条件概率的极大似然估计

\[P \left( X^{\left( j \right)} = a_{jl} \mid Y = c_{k}\right) = \frac{\sum_{i=1}^{N} I \left( x^{\left( j \right)}_{i} = a_{jl},y_{i} = c_{k} \right)}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)},\] \[\quad j = 1,2,\cdots,n ; \quad l = 1,2,\cdots,S_{j} ; \quad k = 1,2,\cdots,K\]

贝叶斯估计

对于 $x$ 的某个特征的取值没有在先验中出现的情况 ,如果用极大似然估计,这种情况的可能性就是 0。出现这种情况的原因通常是因为数据集不能全覆盖样本空间,出现未知的情况处理的策略就是做平滑

先验概率的贝叶斯估计

\[P_{\lambda} \left( Y = c_{k} \right) = \frac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right) + \lambda}{N + K \lambda}, \qquad \lambda \geq 0 ; \quad k = 1,2,\cdots,K\]

条件概率的贝叶斯估计

\[P_{\lambda} \left( X^{\left( j \right)} = a_{jl} \mid Y = c_{k}\right) = \frac{\sum_{i=1}^{N} I \left( x^{\left( j \right)}_{i} = a_{jl},y_{i} = c_{k} \right) + \lambda}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right) + S_{j} \lambda},\] \[\lambda \geq 0 ; \quad j = 1,2,\cdots,n ; \quad l = 1,2,\cdots,S_{j} ; \quad k = 1,2,\cdots,K\]
Table of Contents