Leftover Hash Lemma and Noise Smudging

Posted on 2020-07-11 In Course Note Disqus:

All info comes from David Wu’s Note, Purdue’s cryptography course note, some MIT note and Stanford CS355 note

This note will be focusing on Leftover Hash Lemma and Noise Smudging in Homomorphic Encryption.

graph LR;

subgraph Measure of Randomness
GP(Guess Probability) --> ME(Min Entropy)
CP(Collision Probability) --> RE(Renyi Entropy)
end
ME --> LHL(Leftover Hash Lemma)
RE --> LHL

subgraph Hash Function
KI(k-independence) --> UHF(Universal Hash Function)
CLBD(Collision Lower Bound) --> UHF
end

UHF --> LHL

LHL --> SLHL(Simplified LHL versions)

LHL -->|?| NS(Noise Smudging)
SS(Subset Sum) -->|?| NS

Hash Function Part

Let $\mathcal{H}$ be a family of $X \to Y$, for distinct $x_1,\dots, x_k \in X$ and any $y_1 ,\dots, y_k \in Y$, the class of hash function $\mathcal{H}$ satisfies
$$
\Pr\lbrack h \overset{R}{\leftarrow} \mathcal{H} : h(x_1) = y_1,\dots, h(x_k) = y_k\rbrack = \frac{1}{|Y|^k}
$$
is $k$-wise independence.

For any $i \in \lbrack k\rbrack$, the $i$-th input is uniformly randomly answered, thus the proof follows.

Let $\mathcal{H}$ be a family of $X \to Y$ hash functions. $\mathcal{H}$ is a $\epsilon$-universal hash function ($\epsilon$-UHF) if $\forall x, x’ \in X, x \ne x’$,
$$
\Pr\lbrack h \overset{R}{\leftarrow} \mathcal{H}: h(x) = h(x’)\rbrack < \epsilon
$$
2-wise independent hash function family implies $\frac{1}{|Y|}$-universal hash function family.

But universal hash function is not necessarily 2-wise independent hash function. See construction.

Collision Lower Bound

Let $\mathcal{H}$ be a hash function family such that $X \to Y$, supposing $|X| > |Y|$, there exists distinct $x_1^\ast, x_2^\ast \in X$ such that
$$
\Pr\lbrack h \overset{R}{\leftarrow} \mathcal{H} : h(x_1^\ast) = h(x_2^\ast)\rbrack \geq \dfrac{\frac{|Y|}{|X|} - 1}{|Y| - 1}
$$
Fix a hash function $h \in \mathcal{H}$, suppose the range $Y = \lbrace y_1,\dots,y_m\rbrace$. Then we have a bunch of sets $\lbrace x \in X: h(x) = y_i\rbrace$, and we denote $n_i$ be the size.

We denote the entries of $\lbrace x_1, x_2 \rbrace$ for $x_1, x_2 \in X, x_1 \neq x_2$ such that $h(x_1) = h(x_2)$, $\sharp \text{col}_h = \sum\limits^m_{i = 1}C^2_{n_i}$.

Noticing that $\sharp\text{col}_h = \sum\limits^m_{i = 1} \dfrac{n_i^2}{2} - \dfrac{|X|}{2} \ge \dfrac{1}{2} (\dfrac{|X|^2}{|Y|} - |X|)$. The lower bound is reached if $n_i = \dfrac{|X|}{|Y|}$.

For $\mathcal{H}$, we have $\sharp\text{col}_{\mathcal{H}} \ge \dfrac{|\mathcal{H}|}{2} (\dfrac{|X|^2}{|Y|} - |X|)$. Then perform experiment sampling $(x_1,x_2)\overset{R}{\leftarrow} X,h\overset{R}{\leftarrow} \mathcal{H}$ and outputs 1 if $h(x_1) = h(x_2)$.
$$
\Pr\lbrack h(x_1) = h(x_2) \mid (x_1, x_2)\overset{R}{\leftarrow} X,h\overset{R}{\leftarrow} \mathcal{H}\rbrack = \dfrac{\sharp\text{col}_{\mathcal{H}}}{|\mathcal{H}| \cdot C^2_{|X|}}\geq \dfrac{\frac{|Y|}{|X|} - 1}{|Y| - 1}
$$

Fact: $\Pr\lbrack h \overset{R}{\leftarrow} \mathcal{H} : h(x_1^\ast) = h(x_2^\ast)\rbrack > \dfrac{1}{|Y|} - \dfrac{1}{|X|}$.

Measure of Randomness

Suppose $R$ is a random variable over set $\Omega$ where $|\Omega|= n$. Let $U$ denote the uniform distribution over $\Omega$.

Statistical Distance

The statistical distance between two random variables $X,Y$ is defined as $\Delta(X,Y) = \dfrac{1}{2} \sum\limits_{u \in \Omega} |\Pr\lbrack X = u \rbrack - \Pr \lbrack Y = u \rbrack|$. We denote $\epsilon$-close for two distribution
$$
\Delta(X ,Y) \le \epsilon \Leftrightarrow X \approx_\epsilon Y
$$

$\Delta(X,Y) = \max\limits_{T \subset \Omega}(\Pr\lbrack X \in T\rbrack - \Pr \lbrack Y \in T\rbrack)$

$\forall T \subset U, \Pr\lbrack Y \in T\rbrack \le \Pr \lbrack X \in T \rbrack + \Delta (X,Y)$

For every randomized function $F$, $\Delta(F(X), F(Y)) \le \Delta(X,Y)$. Moreover, equality is achieved if each realization $f$ of $F$ is one-to-one.

Let $F$ be a random variable over functions $f: \Omega \to V$
$$
\begin{aligned}
\Delta(F(X),F(Y)) &= \dfrac{1}{2}\sum_{u \in \Omega} | \Pr \lbrack F(X) = u\rbrack -\Pr \lbrack F(Y) = u\rbrack |\newline
&=\sum_{f \in F} \Pr\lbrack F = f\rbrack \left(\dfrac{1}{2}\sum_{u \in \Omega} | \Pr \lbrack F(X) = u \mid F = f\rbrack -\Pr \lbrack F(Y) = u\mid F = f \rbrack |\right)\newline
&= \sum_{f \in F} \Pr\lbrack F = f\rbrack \Delta(f(X), f(Y))\newline
&\le \sum_{f \in F} \Pr\lbrack F = f\rbrack \Delta(X, Y)\newline
&= \Delta(X,Y)
\end{aligned}
$$

$R$ is $\delta$-uniform if $\delta = \Delta(U,R) = \dfrac{1}{2} \sum\limits_{x \in \Omega} | \Pr\lbrack R = x \rbrack - \dfrac{1}{n} |$.

Randomness Extractor

The guessing probability $\gamma(R)$ is defined by $\gamma(R) = \max\limits_{x \in \Omega}\lbrace \Pr\lbrack R = x\rbrack \rbrace$. The min-entropy $H_\infty(R) = -\log \gamma(R) = -\log \max\limits_{x \in \Omega}\lbrace \Pr\lbrack R = x\rbrack \rbrace$.

We say that $R$ is $k$-source if $H_\infty(R) \geq k$.

Let the seed $U_d$ be uniformly distributed over $\lbrace0,1\rbrace^d$. We say that a function $f_e: \lbrace 0,1 \rbrace^n \times \lbrace 0,1 \rbrace^d \to \lbrace 0,1 \rbrace^m$ is a $(k,\epsilon)$-extractor if $\forall X$ $k$-source variables on $\lbrace 0,1 \rbrace^n$ independent of $U_d$
$$
(f_e(X, U_d), U_d) \approx_\epsilon (U_m, U_d)
$$
The collision probability $\kappa(R) = \sum\limits_{x \in \Omega} \Pr \lbrack R = x \rbrack^2$. The Renyi entropy $H_2(R) = -\log \kappa(R) = -\log \sum\limits_{x \in \Omega} \Pr \lbrack R = x \rbrack^2$.

Define $p_R \in \lbrack 0,1\rbrack^{n}$ to be a vector of probabilities for values of $R$.

The guessing probability is $l_\infty$-norm and collision probability is $l_2$-norm.

Fact: for a uniform distribution $U$ over $\Omega$, $\gamma(R) = \kappa(R) = \dfrac{1}{n}$.

Fact: for $\delta$-uniform $R$ on $\Omega$, then $\dfrac{1}{n} + \dfrac{4\delta^2}{n} \le \kappa(R) \le \gamma(R) \le \dfrac{1}{n} + \delta$.

$R$ is $\delta$-uniform, then $\sum\limits_{x \in \Omega} |\Pr \lbrack R = x \rbrack - \dfrac{1}{n}| = 2\delta$.

Suppose $\gamma(R) > \dfrac{1}{n} + \delta$, then let $x^\ast = \max_x \lbrace \Pr\lbrack R = x\rbrack \rbrace$ we have
$$
\begin{aligned}
\sum\limits_{x \in \Omega} |\Pr \lbrack R = x \rbrack - \dfrac{1}{n}| &= \sum_{x \in \Omega, x \ne x^\ast} |\Pr \lbrack R = x \rbrack - \dfrac{1}{n}| + |\Pr \lbrack R = x^\ast \rbrack - \dfrac{1}{n}|\newline
&\ge | \sum_{x \in \Omega, x \ne x^\ast} (\Pr \lbrack R = x \rbrack - \dfrac{1}{n})| + |\gamma(R) - \dfrac{1}{n}|\newline
&> | 1 - \Pr \lbrack R = x^\ast \rbrack - \dfrac{n - 1}{n} | + \delta = 2 \delta
\end{aligned}
$$
Thus contradiction, we have $\gamma(R) \le \dfrac{1}{n} + \delta$.

For $\kappa(R) \le \gamma(R)$, we have
$$
\begin{aligned}
\sum_{x \in \Omega} \Pr \lbrack R = x \rbrack^2 &\le \max_{x \in \Omega} \Pr \lbrack R = x\rbrack (\sum_{x \in \Omega} \Pr \lbrack R = x \rbrack)\newline
&= \max_{x \in \Omega} \Pr \lbrack R = x\rbrack
\end{aligned}
$$
For $\dfrac{1}{n} + \dfrac{4\delta^2}{n} \le \kappa(R)$, we have
$$
\begin{aligned}
\kappa(R) &= \sum\limits_{x \in \Omega} \Pr \lbrack R = x \rbrack^2 = \sum_{x \in \Omega} (\dfrac{1}{n} + \delta_x)^2\newline
&=\sum_{x \in \Omega} (\dfrac{1}{n^2} + \dfrac{2\delta_x}{n} + \delta_x^2) = \dfrac{1}{n} + \sum_{x \in \Omega} \delta_x^2\newline
&\ge\dfrac{1}{n} + \dfrac{1}{n}(\sum_{x \in \Omega}|\delta_x|)^2 = \dfrac{1}{n} + \dfrac{4\delta^2}{n}
\end{aligned}
$$

Leftover Hash Lemma

Let $\mathcal{H} = \lbrace h : X \to Y \rbrace$ be an $\epsilon$-universal hash family. $h \overset{R}{\leftarrow} \mathcal{H}$. Let $R$ be a random variable over $X$. $(h, R)$ are independently chosen. Then $(h, h(R))$ is $\delta$-uniform over $\mathcal{H}\times Y$ by
$$
\delta \le \dfrac{1}{2}\sqrt{|Y| \cdot (\kappa(R) + \epsilon) - 1}
$$

Some Simplified Version

In CS355 course note we have a simplified version of LHL, where $\mathcal{H}$ still follows. Let $\mathcal{D_X}$ be some distribution on $\mathcal{X}$ with $\gamma(\mathcal{X}) = \gamma$. Then we have
$$
\left| \Pr\lbrack \mathcal{A}(\text{pk}, H(\text{pk}, x)) = 1 \mid \text{pk} \overset{R}{\leftarrow} \mathcal{K}, x\leftarrow \mathcal{D_X} \rbrack - \Pr\lbrack \mathcal{A}(\text{pk}, y) = 1 \mid \text{pk}\overset{R}{\leftarrow}\mathcal{K}, y \overset{R}{\leftarrow}\mathcal{Y} \rbrack \right| = \gamma\cdot |\mathcal{Y}|
$$
Then if we sample $A \overset{R}{\leftarrow} \mathbb{Z}^{n \times m}, x \overset{R}{\leftarrow} \lbrace 0,1 \rbrace^m, y \overset{R}{\leftarrow} \mathbb{Z}^m$, we have $(A, A\cdot x) \approx_{\text{stat}} (A, y)$.

Noise Smudging

This is written in Gentry’s Thesis, which is used for the (Statistical) Circuit Privacy.

(Statistical) Circuit Privacy: Let $\mathcal{E}$ be an encryption scheme, $\mathcal{E}$ is circuit private for circuits $C_{\mathcal{E}}$.

If for any keypair $(\text{sk},\text{pk})$ output by $\text{KeyGen}_{\mathcal{E}}(\lambda)$, any circuit $C\in C_{\mathcal{E}}$, and any fixed ciphertexts $\Psi = \langle \psi_1,\dots,\psi_k\rangle$, that are in the image of $\text{Encrypt}_{\mathcal{E}}$ for plaintext $\pi_1,\dots,\pi_k$, we have
$$
\text{Encrypt}_{\mathcal{E}}(\text{pk},C(\pi_1,\dots,\pi_t)) \approx_{\text{stat}} \text{Evaluate}_{\mathcal{E}}(\text{pk},C,\Psi)
$$
meaning the distributions are statistically indistinguishable.

The main idea here is that, for every ciphertext in $\Psi$ there is a noise $\rho$. Denote $\rho_f$ to be the noise of $C(\pi_1,\dots,\pi_k)$ encrypted.

We add one more significantly larger noise $\rho^\ast \gg \rho_f > \rho$, so that the distribution should look just the same.