统计学习方法-第7章-支持向量机
- 适用问题:二分类
- 模型特点:分离超平面,核技巧
- 模型类型:判别模型
- 学习策略:极小化正则化合页损失,软间隔最大化
- 损失函数:合页损失
- 学习算法:序列最小最优化算法(SMO)
线性可分支持向量机
学习算法
输入:线性可分训练集 $T = { \left(x_{1},y_{1}\right),\left(x_{2},y_{2}\right), \cdots ,\left(x_{N},y_{N}\right) }$,其中 $x_{i} \in \mathcal{X} = \mathbf{R}^{n},y_{i} \in \mathcal{Y} = { -1,+1 },i = 1,2,\cdots,N$;
输出:分离超平面和分类决策函数。
-
构造并求解约束最优化问题
$$ \begin{align*} \min_{\alpha} \quad & \frac{1}{2} \sum_{i = 1}^N \sum_{j = 1}^N \alpha_{i} \alpha_{j} y_{i} y_{j} \left(x_{i} \cdot x_{j}\right) - \sum_{i=1}^m \alpha_{i} \\ \text{s.t.} \quad & \sum_{i = 1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha \geq 0,\quad i = 1,2,\cdots,N \end{align*} $$求得最优解 $\alpha^{\ast} = \left(\alpha^{\ast}_{1},\alpha_{2}^{\ast},\cdots,\alpha^{\ast}_{N} \right)^{T}$。
-
计算
\[w^{\ast} = \sum_{i = 1}^{N} \alpha_{i}^{\ast} y_{i} x_{i}\]并选择 $\alpha^{\ast}$ 的一个正分量 $\alpha_{j}^{\ast} > 0$,计算
\[b^{\ast} = y_{j} - \sum_{i = 1}^{N} \alpha_{i}^{\ast} y^{\ast} \left( x_{i} \cdot x_{j} \right)\] -
求得分离超平面
\[w^{\ast} \cdot x + b ^{\ast} = 0\]分类决策函数:
\[f \left( x \right) = sign \left( w^{*} \cdot x + b ^{\ast} \right)\]
算法推导
定义函数间隔
- $w,b$ 等比例缩放,函数间隔会随之改变
几何间隔
函数间隔与几何间隔的关系
间隔最大化——几何间隔最大化
根据集合间隔与函数间隔的关系,得
令 $\hat{\gamma} = 1$,得
最大化 $\frac{1}{\parallel w \parallel}$ 等价于最小化 $\frac{1}{2} {\parallel w \parallel}^{2}$。转为凸优化问题,$\frac{1}{2} {\parallel w \parallel}^{2}$ 为凸函数。
引入拉格朗日乘子 $\alpha_{i} \geq 0, i = 1,2,\cdots,N$,构造拉格朗日函数
\[L \left( w,b,\alpha \right) = \frac{1}{2} {\parallel w \parallel}^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i}\]原问题
\[\min_{w,b} \max_{\alpha} L \left( w,b,\alpha \right)\]对偶问题
\[\max_{\alpha} \min_{w,b} L \left( w,b,\alpha \right)\]-
求 $\min_{w,b} L \left( w,b,\alpha \right)$
将 $L \left( w,b,\alpha \right)$ 分别对 $w,b$ 求偏导数并令其等于 $0$。
$$ \begin{align*} \triangledown_{w} L \left( w,b,\alpha \right) = 0 & \Rightarrow w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i}\\ \triangledown_{b} L \left( w,b,\alpha \right) = 0 & \Rightarrow \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{align*} $$带入拉格朗日函数
$$ \begin{align*} L \left( w,b,\alpha \right) &= \frac{1}{2} {\parallel w \parallel}^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \\ & = \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \\ & = - \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*} $$即
\[\min_{w,b} L \left( w,b,\alpha \right) = - \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i}\] -
求 $\max_{\alpha} \min_{w,b} L \left( w,b,\alpha \right)$,即对偶问题
$$ \begin{align*} \max_{\alpha} \quad & - \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i}\\ {s.t.} \quad & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i = 1,2,\cdots,N \end{align*} $$等价于
$$ \begin{align*} \min_{\alpha} \quad & \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \tag{7.3} \\ {s.t.} \quad & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \tag{7.4}\\ & \alpha_{i} \geq 0, \quad i = 1,2,\cdots,N \tag{7.5} \end{align*} $$
设 $\alpha^{\ast} = \left( \alpha_{1}^{\ast}, \alpha_{2}^{\ast}, \cdots \alpha_{l}^{\ast} \right)$ 是对偶最优化问题 (7.3)~(7.5) 的解,则存在下标 $j$,使得 $\alpha_{j}^{\ast} \gt 0$,则原始最优化问题 (7.1)~(7.2) 的解 $w^{\ast},b^{\ast}$:
\[w^{\ast} = \sum_{i=1}^{N} \alpha_{i}^{\ast} y_{i} x_{i}\\ b^{\ast} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{\ast} y_{i} \left( x_{i} \cdot x_{j} \right)\]因此可得