Classic Vision I - 图片、卷积和 Canny Edge Detection
#data-science #图像处理基础和传统 CV #CV 导论 4

这是 2025 春 计算机视觉导论的笔记

Todo:

  • 内容杂乱无章,十分愚蠢,或许未来会润色语言使它便于阅读

Images as Functions

Image as a function

我们把图片看作一个函数 f,它从 \mathrm{R}^2 映到 \mathrm{R}^M, 其中 f(x, y) 给出了这个位置像素的 intensity.

Image contains discrete pixels

具体分析一张图片的要素:

  • 色彩空间 grayscale: [0, 255], or rgb: Vector3;
  • 分辨率 resolution: w \times h => matrix
  • 像素空间 w \times h \times 3 => tensor, 左上角为(0, 0, Color)

Gradient

从这个意义上来说,我们可以求图片的 gradient \displaystyle \nabla f = \left[ \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right], in practice, use \displaystyle \frac{\partial f}{\partial x} |_{x = x_0} = \frac{f(x_0+1, y_0) - f(x_0-1, y_0)}{2}, 指定 gradient magnitude \displaystyle||\nabla f|| = \sqrt{(\frac{\partial f}{\partial x})^2 + (\frac{\partial f}{\partial y})^2}, 指向与edge垂直的方向

Filter

form a new image from original pixels, extract useful information. Modify properties

  1. 1D Filter: 线性, 噪声抑制 h = G(f), h[n] = G(f)[n], 一个例子是 \displaystyle h[n] = \frac{\sum_{i=-2}^{i=2} f(x+i)}{5}

  2. 卷积 (convolution, 信号与系统) \displaystyle h[n] = (f*g)[n] = \sum_{m = -\infty}^{+\infty} f[m]g[n-m], 一个解释是 g 翻转再与 f 对位求积
    Pasted image 20250226161429.png

  3. 最重要的卷积定理: For Fourier Transform F, F(f*g) = F(f)F(g), 时域的卷积等于频谱的乘积.

  4. Pasted image 20250226161926.png
    Fourier Transform gives F(g)[m] that mainly concentrates around 0, it's its main feature
    g act as a 低通滤波器 low-pass filter
    从 Convolution Theorem 解释就能发现卷积提取特征的原因: 滤去高频, 留下低频, 滤去噪声 =>
    Pasted image 20250226162612.png
    filtering G is linear.

  5. if says loss is 结果频率的纯性, 我们通过学习找到最好的g的weight

2D Discrete Filter

\displaystyle (f*g)[m, n] = \sum_{k, l} f[k, l]g[m-k, n-l], says g as kernal or filter

  1. Pasted image 20250226163828.png
    使用平均的 kernal 去除了 90 的高频信号但模糊了不是特别应该模糊的边界, 边界一定是高频的
  2. 再进行一次二值化, 定义 threshold 阈值\tau, \displaystyle h[m, n] = \left\{ \begin{aligned} & 1, f[n, m] > \tau \\ & 0, \text{otherwise} \end{aligned} \right., non-linear system

Edge Detection

  1. Define edge: formulation (研究范式: Definition first, MathOP而不是DOP)
    a region that has significant intensity change along one direction but low change in its orthogonal direction
  2. Evaluate (评价标准 Evaluation matrix)
    思考一下不好的情况Pasted image 20250226165224.png
    Example: Low precision's \displaystyle Precision = \frac{1}{3}, Recall = 1.
    思考如何评定对齐: \displaystyle localization = \max_{distanceT\to P} < \varepsilon.
    目标: good precision, good recall, low localization;
  3. Smooth:
    1. Problem. Use derivative. Pasted image 20250226171730.png
      But 处处不平滑的 noises 可以很好的 hacking gradients => smoothing first.
      • Gaussian Filter as g to smooth noises: \displaystyle g = \frac{1}{\sqrt{2\pi \sigma^2}} \exp{- \frac{x^2}{2\sigma^2}}, the better is \displaystyle F(g) = \exp{-\frac{\sigma^2 \omega^2}{2}}.
      • The bigger \sigma is, the sharper F(g) is. 那么它就是更强的low-pass filter.
      • The smaller, the filter weaker.
        Pasted image 20250226172552.png
    2. Optimize: theorem \displaystyle \frac{d}{dx} (f*g) = f* \frac{d}{dx} g, 将两步合为一步
    3. 2D Convolution: Gaussian Filter \displaystyle g = \frac{1}{2\pi \sigma^2} \exp{-\frac{x^2+y^2}{2 \sigma^2}}. Use Optimize again.
    4. Binaryzation
      结果: Pasted image 20250226173233.png
  4. NMS: 问题是存在宽度大于1的成分, 我们要去除 Non-Maximal 的成分, 只留下最好的
    1. Strategy
      1. Get choice q and its gradient g(q)

      2. Find neighbors (Another two choices): r = q + g(q), p = q - g(q).

        Problem: r and p (大多时候)都是非格点, 没有函数值, 怎么办? 进行双线性插值 bilinear interpolation.

        Pasted image 20250226174310.png
        很好的是插值结果与投影方向无关. 对线性插值进行了性质很优的延拓.

        Another approach: 直接对齐到格点上, 性能很优.

      3. Get g(p) and g(r).

      4. g(p) < g(q) > g(r) proves q is a maxium.

    2. 结果 Pasted image 20250226174902.png
  5. Edge linking: Hysteresis Thresholding 滞回阈值(乱搞)
    1. gradient > maxVal => begin
    2. gradient < minVal => remove
    3. gradient between min and max => connect but no begin

Canny Edge Detector

Use:

  • first derivative
  • Gaussian kernal
  • optimize signal-to-noise ratio to maxium precision & recall

超参\sigma:

  • bigger \sigma, bigger precision
  • smaller \sigma, smaller recall
Classic Vision I - 图片、卷积和 Canny Edge Detection
http://localhost:8090/archives/os04VK7N
作者
酱紫瑞
发布于
更新于
许可协议