📅 发布时间：2026/6/21 15:43:33

凸函数小记

定义

仿射集：\(\forall x_{1}, x_{2} \in S\)，\(x = \theta x_{1} + (1 - \theta)x_{2} \in S\)，其中 \(\theta \in \R\)。

凸集：\(\forall x_{1}, x_{2} \in S\)，\(x = \theta x_{1} + (1 - \theta)x_{2} \in S\)，其中 \(\theta \in [0, 1]\)。

凸包：\(S = \{ x_{1}, x_{2}, \dots, x_{n} \}\) 的凸包可以表示为 \(\{ \sum\limits_{i = 1}^{n}{\theta_{i}x_{i}} | \sum\limits_{i = 1}^{n}\theta_{i} = 1, \theta_{i} \ge 0 \}\)。

凸函数：设函数 \(f : D \to \R\)，其中 \(D \subset \R^{n}\) 是一个凸集。若 \(\forall x, y \in D, \theta \in [0, 1]\)，都有：

\[f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 - \theta)f(y) \]

则称 \(f\) 为凸函数。对于 \(x \neq y, \theta \in (0, 1)\)，当 \(\le\) 替换为 \(<\) 时，称 \(f\) 为严格凸函数。

联合凸函数：设函数 \(F : D \to \R\)，其中 \(D \subset \R^{n} \times \R^{m}\) 是一个凸集。若 \(\forall (x_{1}, y_{1}), (x_{2}, y_{2}) \in D, \theta \in [0, 1]\)，都有：

\[\begin{align} F(\theta x_{1} + (1 - \theta)x_{2}, \theta y_{1} + (1 - \theta)y_{2}) \le \theta F(x_{1}, y_{1}) + (1 - \theta)F(x_{2}, y_{2}) \\ \end{align} \]

则 \(F\) 是 \((x, y)\) 这个整体变量的联合凸函数。

非负加权求和

若 \(f_{1}, f_{2}\) 是 \(D\) 上的凸函数，\(w_{1}, w_{2} \ge 0\)，则函数 \(g(x) = w_{1}f_{1}(x) + w_{2}f_{2}(x)\) 也是 \(D\) 上的凸函数。

证明：任取 \(x, y \in D, \theta \in [0, 1]\)。

\[\begin{align} g(\theta x + (1 - \theta)y) &= w_{1}f_{1}(\theta x + (1 - \theta)y) + w_{2}f_{2}(\theta x + (1 - \theta)y) \\ & \le w_{1}\theta f_{1}(x) + w_{1}(1 - \theta)f_{1}(y) + w_{2}\theta f_{2}(x) + w_{2}(1 - \theta)f_{2}(y) \\ &= \theta(w_{1}f_{1}(x) + w_{2}f_{2}(x)) + (1 - \theta)(w_{1}f_{1}(y) + w_{2}f_{2}(y)) \\ &= \theta g(x) + (1 - \theta)g(y) \\ \end{align} \]

故 \(g(x)\) 是凸函数。类似地可知多个凸函数的非负加权和也是凸函数。

仿射变换

凸集 \(S \in \R^{n}\)，\(A \in \R^{m \times n}\)，\(b \in \R^{m}\)，则 \(f(S) = \{ Ax + b | x \in S \}\) 是凸集。

证明：任取 \(y_{1}, y_{2} \in f(S)\)，存在 \(x_{1}, x_{2} \in S\)，使得 \(y_{1} = Ax_{1} + b, y_{2} = Ax_{2} + b\)。\(\forall \theta \in [0, 1]\)，有：

\[\begin{align} \theta y_{1} + (1 - \theta)y_{2}&= \theta(Ax_{1} + b) + (1 - \theta)(Ax_{2} + b) \\\ &= A(\theta x_{1} + (1 - \theta)x_{2}) + b \\ \end{align} \]

由于 \(S\) 是凸集，\(x_{1}, x_{2} \in S\)，所以 \(\theta x_{1} + (1 - \theta)x_{2} \in S\)，所以 \(\theta y_{1} + (1 - \theta)y_{2} \in f(S)\)，所以 \(f(S)\) 是凸集。

凸集 \(S \in \R^{m}\)，\(A \in \R^{m \times n}\)，\(b \in \R^{m}\)，则 \(f^{-1}(S) = \{ x | Ax + b \in S \}\) 是凸集。

证明：任取 \(x_{1}, x_{2} \in f^{-1}(S)\)，存在 \(y_{1}, y_{2} \in S\)，使得 \(y_{1} = Ax_{1} + b, y_{2} = Ax_{2} + b\)。\(\forall \theta \in [0, 1]\)，有：

\[\begin{align} \theta y_{1} + (1 - \theta)y_{2}&= \theta(Ax_{1} + b) + (1 - \theta)(Ax_{2} + b) \\\ &= A(\theta x_{1} + (1 - \theta)x_{2}) + b \in S \\ \end{align} \]

若 \(\theta x_{1} + (1 - \theta)x_{2} \not\in f^{-1}(S)\)，则 \(A(\theta x_{1} + (1 - \theta)x_{2}) + b \not\in S\)，不符。所以 \(\theta x_{1} + (1 - \theta)x_{2} \in f^{-1}(S)\)，所以 \(f^{-1}(S)\) 是凸集。

设 \(f : \R^{n} \to \R\) 是凸函数。定义 \(g(x) = f(Ax + b)\)，其中 \(A \in R^{m \times n}\)，\(b \in \R^{m}\)，则 \(g : \R^{m} \to \R\) 是凸函数。

证明：

\(\text{dom}{g} = \{ x | Ax + b \in \text{dom}{f} \}\)，因此 \(\text{dom}{g}\) 是 \(\text{dom}{f}\) 的逆仿射变换。又 \(f\) 是凸函数，所以 \(\text{dom}{f}\) 是凸集，所以 \(\text{dom}{g}\) 是凸集。

任取 \(x, y \in \text{dom}{g}, \theta \in [0, 1]\)，有 \(Ax + b, Ay + b \in \text{dom}{f}\)，且：

\[\begin{align} g(\theta x + (1 - \theta)y) &= f(A(\theta x + (1 - \theta)y) + b) \\ &= f(\theta(Ax + b) + (1 - \theta)(Ay + b)) \\ &\le \theta f(Ax + b) + (1 - \theta)f(Ay + b) \\ &= \theta g(x) + (1 - \theta)g(y) \\ \end{align} \]

因此 \(g(x)\) 是凸函数。

逐点取最大值

设有一族凸函数 \(f_{1}, f_{2}, \dots, f_{m}\)，每个 \(f_{i} : D \to \R\)，\(D \subset \R^{n}\)，则由这些函数定义的逐点取最大值函数 \(f : \R^{n} \to \R\)：\(f(x) = \max\limits_{i = 1}^{m}{f_{i}(x)}\) 也是一个凸函数。

证明：任取 \(x, y \in D, \theta \in [0, 1]\)，令 \(z = \theta x + (1 - \theta)y\)。假设 \(f(z) = \max\limits_{i = 1}^{m}f_{i}(z)\) 的最大值在 \(i = k\) 时取得。

即：\(f(z) = f_{k}(z) = f_{k}(\theta x + (1 - \theta)y) \le \theta f_{k}(x) + (1 - \theta)f_{k}(y)\)。

又因为 \(f_{k}(x) \le \max\limits_{i = 1}^{m}f_{i}(x) = f(x)\)，\(f_{k}(y) \le \max\limits_{i = 1}^{m}f_{i}(y) = f(y)\)，

故 \(f(z) \le \theta f_{k}(x) + (1 - \theta)f_{k}(y) \le \theta f(x) + (1 - \theta)f(y)\)，所以 \(f\) 为凸函数。

凸集的笛卡尔积

设 \(C_{1} \subset \R^{n}\) 和 \(C_{2} \subset \R^{m}\) 是两个凸集，它们的笛卡尔积定义为 \(C_{1} \times C_{2} = \{ (x, y) | x \in C_{1}, y \in C_{2} \}\)，则 \(C_{1} \times C_{2}\) 是凸集。

证明：取任意 \((x_{1}, y_{1}), (x_{2}, y_{2}) \in C_{1} \times C_{2}, \theta \in [0, 1]\)，则：

\[\begin{align} \theta(x_{1}, y_{1}) + (1 - \theta)(x_{2}, y_{2}) = (\theta x_{1} + (1 - \theta)x_{2}, \theta y_{1} + (1 - \theta)y_{2}) \\ \end{align} \]

由于 \(C_{1}, C_{2}\) 都是凸集，因此 \(\theta x_{1} + (1 - \theta)x_{2} \in C_{1}\) 且 \(\theta y_{1} + (1 - \theta)y_{2} \in C_{2}\)，因此 \(\theta(x_{1}, y_{1}) + (1 - \theta)(x_{2}, y_{2}) \in C_{1} \times C_{2}\)，根据凸集的定义可知 \(C_{1} \times C_{2}\) 是一个凸集。

凸集的投影性质

设 \(S \subset \R^{n} \times \R^{m}\)，定义：

\(S\) 到 \(\R^{n}\) 的投影：\(S_{x} = \{ x \in \R^{n} | \exists y \in \R^{m}, (x, y) \in S \}\)。
\(S\) 到 \(\R^{m}\) 的投影：\(S_{y} = \{ y \in \R^{m} | \exists x \in \R^{n}, (x, y) \in S \}\)。

若 \(S \subset \R^{n} \times \R^{m}\) 是凸集，则 \(S_{x} \subset \R^{n}, S_{y} \subset \R^{m}\) 是凸集。

证明：这里只证 \(S_{x}\) 是凸集，\(S_{y}\) 同理。

任取 \(x_{1}, x_{2} \in S_{x}, \theta \in [0, 1]\)，由定义知，\(\exists y_{1}, y_{2} \in \R^{m}\) 使得 \((x_{1}, y_{1}), (x_{2}, y_{2}) \in S\)。

又由 \(S\) 是凸集，故 \(\theta (x_{1}, y_{1}) + (1 - \theta)(x_{2}, y_{2}) = (\theta x_{1} + (1 - \theta)x_{2}, \theta y_{2} + (1 - \theta)y_{2}) \in S\)。

由定义知 \(\theta x_{1} + (1 - \theta)x_{2} \in S_{x}\)。故 \(S_{x}\) 是凸集。

部分最小化

设 \(F : \R^{n} \times \R^{m} \to \R\) 是一个联合凸函数，则 \(f(x) = \inf\limits_{y}F(x, y)\) 是凸函数。

证明：

由凸集的投影性质知 \(\text{dom}{f}\) 是凸集。

取任意 \(x_{1}, x_{2} \in \text{dom}{f}, \theta \in [0, 1]\)，令 \(x_{\theta} = \theta x_{1} + (1 - \theta)x_{2}\)。\(\forall \epsilon > 0\)：

由下确界定义知：\(\exists y_{1}, y_{2} \in \R^{m}\)，使得 \(F(x_{1}, y_{1}) \le f(x_{1}) + \epsilon, F(x_{2}, y_{2}) \le f(x_{2}) + \epsilon\)。令 \(y_{\theta} = \theta y_{1} + (1 - \theta)y_{2}\)。

由于 \(F\) 是联合凸的，因此有：

\[\begin{align} F(x_{\theta},y_{\theta}) &= F(\theta x_{1} + (1 - \theta)x_{2}, \theta y_{1} + (1 - \theta)y_{2}) \\ &\le \theta F(x_{1}, y_{1}) + (1 - \theta)F(x_{2}, y_{2}) \\ &\le \theta f(x_{1}) + (1 - \theta)f(x_{2}) + \epsilon \end{align} \]

由下确界定义知：\(f(x_{\theta}) \le F(x_{\theta}, y_{\theta}) \le \theta f(x_{1}) + (1 - \theta)f(x_{2}) + \epsilon\)。

假设 \(f(x_{\theta}) > \theta f(x_{1}) + (1 - \theta)f(x_{2})\)，令 \(\delta = \theta f(x_{1}) + (1 - \theta)f(x_{2}) - f(x_{\theta})\)，则取任意 \(\epsilon < \delta\)，上式不成立。

故 \(f(x_{\theta}) \le \theta f(x_{1}) + (1 - \theta)f(x_{2})\)，即 \(f(x)\) 是凸函数。

一些说明

为什么凸函数要求定义域为凸集？

第一点，凸函数的性质需要 \(x, y \in D\) 时 \(\theta x + (1 - \theta)y \in D\)。

第二点，当定义域为凸集时，能够保证凸函数具有局部最优=全局最优的良好性质。

比如考察一个山谷的最低陆地位置，如果中间有一个湖，则你站在湖边的某个位置可能是附近最低的，但湖的对面可能有更低的位置。但如果没有湖，山谷中全是陆地，则你站在一个局部最低的陆地位置，一定是整个山谷中最低的陆地位置。

这样才能保证凸函数的良好最优性。
凸集的投影性质：这个很好理解，比如你拿一个球从各个方向上观察，投影出来的二维图形一定是凸的。
两个变量 \(x, y\) 的定义域 \(D_{x}, D_{y}\) 均为凸集不能说明它们联合凸函数的定义域 \(D\) 为凸集。

具体地说，\(x, y\) 之间可能存在定义域的制约关系，我们只能说 \(D \subset D_{x} \times D_{y}\)。

但反过来，联合凸函数的两个变量的定义域一定是凸集，这一点基于凸集的投影性质。

Convex