CS5234 --Algorithm at Scale

Lec1 mainly talked about two toy problem solving by sampling.

Array all 0's ?

Given the algorithm,

1
2
3
4
Repeat s times:
Choose random i in [1,n]
if A[i] = 1 then return False
return True

We’d like to guarantee the following statement:

  • if all 0’s : return true
  • if $\ge \epsilon n$ 1’s : return false (We can promise that >= 2/3 probability to return false and we can adjust the probability by adjusting the sampling times)
  • otherwise: return true or false

Proof:

if there are more than $\epsilon n$ 1’s, we can get $Pr(A[i]=1) \ge \epsilon$
$$
\begin{aligned}
Pr(all\ samples\ are\ 0)&\le\ (1-\epsilon)^s\
&\le\ (1-\epsilon)^{2/\epsilon}\ (let\ s\ =\ 2/\epsilon)\
&\le\ e^{-2}\
&\le\ 1/3
\end{aligned}
$$
So this is the error rate, if we want correct rate, it’s >= 2/3, as desired.

A useful Lemma:
$$
e^{-x}\ =\ 1 -x + x^2/2-… \ \ \ \ \ \ \ for(0<x<1)—————-(1)\
1/e^2\le(1-1/x)^x\le1/e\ \ \ \ \ \ \ for(x>2)—————–(2)
$$

Follow up

What if we want the algorithm to be correct with probability ≥ 1 – δ?

  • Just let error rate inequality right hand be δ

Fraction of 1's?

Definition of $\epsilon-close$, the answer of fraction within the $\pm\epsilon$ . And like above, we give the algorithm:

1
2
3
4
5
sum = 0
Repeat s times:
Choose random i in [1,n]
sum = sum + A[i]
Return sum/s

Then, same with the former question, we want to give the guarantee that:

  • if all 0’s : return 0
  • if $\ge \epsilon n$ 1’s : return the true fraction (We can promise that >= 2/3 probability to return true fraction and we can adjust the probability by adjusting the sampling times)
  • otherwise: return arbitrary fraction.

To give a proof, we introduce Hoeffding Bound, it mainly describes a set of variables which satisfy independent, random variable, bounded, denoted as X_i
$$
Let\ Z = X_1+X_2+X_3+…X_i\
Pr(|Z-Z[E]|\ge\delta)\le2^{-\delta^2/s}
$$
In the problem case, we have
$$
\begin{aligned}
Pr(|V-E[V]|\ge\epsilon)=&Pr(|sV-sE[V]|\ge\epsilon s)\
&(We\ need\ to\ do\ this\ beacuse\ nV\ is\ the\ sum\ of\ sampling\ values)\
&\le Pr(|sum - sf|\ge \epsilon s)\
&\le 2e^{-(\epsilon s)^2/s}=2e^{-\epsilon^2 s}\ (let\ s=\ 1/\epsilon^2)\
&\le1/3
\end{aligned}
$$