统计学习方法-第7章-支持向量机

线性可分支持向量机

学习算法

输入:线性可分训练集 $T = { \left(x_{1},y_{1}\right),\left(x_{2},y_{2}\right), \cdots ,\left(x_{N},y_{N}\right) }$,其中 $x_{i} \in \mathcal{X} = \mathbf{R}^{n},y_{i} \in \mathcal{Y} = { -1,+1 },i = 1,2,\cdots,N$;

输出:分离超平面和分类决策函数。

  1. 构造并求解约束最优化问题

    $$ \begin{align*} \min_{\alpha} \quad & \frac{1}{2} \sum_{i = 1}^N \sum_{j = 1}^N \alpha_{i} \alpha_{j} y_{i} y_{j} \left(x_{i} \cdot x_{j}\right) - \sum_{i=1}^m \alpha_{i} \\ \text{s.t.} \quad & \sum_{i = 1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha \geq 0,\quad i = 1,2,\cdots,N \end{align*} $$

    求得最优解 $\alpha^{\ast} = \left(\alpha^{\ast}_{1},\alpha_{2}^{\ast},\cdots,\alpha^{\ast}_{N} \right)^{T}$。

  2. 计算

    \[w^{\ast} = \sum_{i = 1}^{N} \alpha_{i}^{\ast} y_{i} x_{i}\]

    并选择 $\alpha^{\ast}$ 的一个正分量 $\alpha_{j}^{\ast} > 0$,计算

    \[b^{\ast} = y_{j} - \sum_{i = 1}^{N} \alpha_{i}^{\ast} y^{\ast} \left( x_{i} \cdot x_{j} \right)\]
  3. 求得分离超平面

    \[w^{\ast} \cdot x + b ^{\ast} = 0\]

    分类决策函数:

    \[f \left( x \right) = sign \left( w^{*} \cdot x + b ^{\ast} \right)\]

算法推导

定义函数间隔

$$ \begin{align*} \text{样本点:} & \qquad \hat{\gamma_{i}} = y_{j} \left( w \cdot x_{i} + b\right)\\ \text{数据集:} & \qquad \hat{\gamma} = \min_i \hat{\gamma_{i}} \end{align*} $$

几何间隔

$$ \begin{align*} \text{样本点:} & \qquad \gamma_{i} = y_{j} \left( \frac{w}{\parallel w \parallel} \cdot x_{i} + \frac{b}{\parallel w \parallel} \right) \\ \text{数据集:} & \qquad \gamma = \min_i \gamma_{i} \end{align*} $$

函数间隔与几何间隔的关系

$$ \begin{align*} \gamma_{i} & = \frac{\hat{\gamma_{i}}}{\parallel w \parallel} \\ \gamma & = \frac{\hat{\gamma}}{\parallel w \parallel} \end{align*} $$

间隔最大化——几何间隔最大化

$$ \begin{align*} { \max_{w,b} } \quad & {\gamma} \\ { s.t. } \quad & { y_{i} \left( \frac{w}{\parallel w \parallel} \cdot x_{i} + \frac{b}{\parallel w \parallel} \right) \geq \gamma} \end{align*} $$

根据集合间隔与函数间隔的关系,得

$$ \begin{align*} { \max_{w,b} } \quad & {\frac{\hat{\gamma}}{\parallel w \parallel}}\\ { s.t. } \quad & { y_{i} \left(w \cdot x_{i} + b \right) \geq \hat{\gamma}} \end{align*} $$

令 $\hat{\gamma} = 1$,得

$$ \begin{align*} { \max_{w,b} } \quad & {\frac{1}{\parallel w \parallel}}\\ { s.t. } \quad & { y_{i} \left( w \cdot x_{i} + b \right) \geq 1} \end{align*} $$

最大化 $\frac{1}{\parallel w \parallel}$ 等价于最小化 $\frac{1}{2} {\parallel w \parallel}^{2}$。转为凸优化问题,$\frac{1}{2} {\parallel w \parallel}^{2}$ 为凸函数。

$$ \begin{align*} { \min_{w,b} } \quad & {\frac{1}{2} {\parallel w \parallel}^{2}} \tag{7.1}\\ { s.t. } \quad & 1 - { y_{i} \left( w \cdot x_{i} + b \right) \leq 0} \tag{7.2} \end{align*} $$

引入拉格朗日乘子 $\alpha_{i} \geq 0, i = 1,2,\cdots,N$,构造拉格朗日函数

\[L \left( w,b,\alpha \right) = \frac{1}{2} {\parallel w \parallel}^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i}\]

原问题

\[\min_{w,b} \max_{\alpha} L \left( w,b,\alpha \right)\]

对偶问题

\[\max_{\alpha} \min_{w,b} L \left( w,b,\alpha \right)\]
  1. 求 $\min_{w,b} L \left( w,b,\alpha \right)$

    将 $L \left( w,b,\alpha \right)$ 分别对 $w,b$ 求偏导数并令其等于 $0$。

    $$ \begin{align*} \triangledown_{w} L \left( w,b,\alpha \right) = 0 & \Rightarrow w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i}\\ \triangledown_{b} L \left( w,b,\alpha \right) = 0 & \Rightarrow \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{align*} $$

    带入拉格朗日函数

    $$ \begin{align*} L \left( w,b,\alpha \right) &= \frac{1}{2} {\parallel w \parallel}^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \\ & = \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \\ & = - \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*} $$

    \[\min_{w,b} L \left( w,b,\alpha \right) = - \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i}\]
  2. 求 $\max_{\alpha} \min_{w,b} L \left( w,b,\alpha \right)$,即对偶问题

    $$ \begin{align*} \max_{\alpha} \quad & - \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i}\\ {s.t.} \quad & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i = 1,2,\cdots,N \end{align*} $$

    等价于

    $$ \begin{align*} \min_{\alpha} \quad & \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \tag{7.3} \\ {s.t.} \quad & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \tag{7.4}\\ & \alpha_{i} \geq 0, \quad i = 1,2,\cdots,N \tag{7.5} \end{align*} $$

设 $\alpha^{\ast} = \left( \alpha_{1}^{\ast}, \alpha_{2}^{\ast}, \cdots \alpha_{l}^{\ast} \right)$ 是对偶最优化问题 (7.3)~(7.5) 的解,则存在下标 $j$,使得 $\alpha_{j}^{\ast} \gt 0$,则原始最优化问题 (7.1)~(7.2) 的解 $w^{\ast},b^{\ast}$:

\[w^{\ast} = \sum_{i=1}^{N} \alpha_{i}^{\ast} y_{i} x_{i}\\ b^{\ast} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{\ast} y_{i} \left( x_{i} \cdot x_{j} \right)\]

因此可得

$$ \begin{align*} \text{分离超平面:} \qquad & w^{\ast} \cdot x + b ^{\ast} = 0\\ \text{分类决策函数:} \qquad & f \left( x \right) = sign \left( w^{*} \cdot x + b ^{\ast} \right) \end{align*} $$
Table of Contents