week01.tex

\Lecture{Jayalal Sarma}{Sep 9, 2020}{01}{Pigeon Hole Principle and Basic Applications}{Jayalal Sarma}{$\gamma$}{JS}

We start course with the simplest but surprising powerful tool in combinatorial arugments which is the pigeon hole principle. Through this principle as an example, we will also quick review the methods of proof. 

\section{Quick Recap on Proof Techniques} 

A formal mathematical proof system in our context has axioms about various mathematical objects that we are using, like numbers, graphs which describes them through their properties. Then, there are rules of inferences such as modus ponens, modus tollens, resolution, syllogisms etc which helps us derive new statements from these axioms. 

The peculiarity of these rules of inferences are that they "conduct truth"  and forms building blocks for huge "truth conducting" structures called mathematical proofs. That is, if for any object\footnote{a little more formally, the assignment in the propositional logic, and model in general first oder logic}, the premises of the rules of inference are true, then the conclusion is also true for them.   Hence, suppose we derive a statement $\phi$ starting with the axioms, applying the rules of inferences in various combinations. Since the individual rules of inferences "conduct truth", the resulting structure also conducts truth and is called the mathematical proof of the statement $\phi$ from the axioms. Note that the truth of the statement $\phi$ for the object under consideration can be stated on relative to the truth of the axioms that we used. However, this is not a concern, since we are intending to use the mathematical proof systems to derive statements about objects which we know would satisfy the axioms (in fact, we wrote down axioms as properties of those objects.
\begin{curiosity}
It is an amusing question to ask, whether there are other objects, which we did not intend to, which also satisfies the axioms that we wrote, by accident. Say for example, we wrote the axioms for graphs, but "strings" also satisfies them. If so, the theorems that we prove for graphs using only those axioms will also be true for strings, automatically !!. Quite interestingly this is true for natural numbers. The mathematical theory of natural numbers is axiomatized by what are called the Peano's axioms. There are numbers that one can define which are different from natural numbers for which any theorem that we prove for natural numbers also are true (because they satisfy the Peano's axioms). Then one might ask, are we not trying to represent exactly natural numbers? So should we not augment Peano's axioms with more properties of natural numbers such that we remove such {\em unwanted} parallel models from satisfying the axioms we write. Even more interestingly, one can argue that this is not even possible. No matter, what extra formula we write the existence of such "parallel models" us inevitable. In fact, not just one "parallel model", there will be infintiely many of them. You should read about {\em L\"owenheim–Skolem theorem}.
\end{curiosity}

Writing down mathematical proofs explicitly by using rules of inference may seem to be a mechanical way of proving statements. While it avoids any chance of mistakes because of the mathematical precision and rigor it affects quick readability and communication of ideas. Hence, one would like to have more "human readable" ways of representing these proofs by writing some of the steps in English, while ensuring that we do not lose the mathematical rigor. This brings in some subjectivity about how "formal" a proof is - that is, how close is it to the formal mathematical framework of rules of inferences in terms of notations, presentation etc.  Sometimes, very rigorous proofs tend to hide the intuitive idea behind the proof which one tends to (and sometimes need to) describe separately for easy communication. The more formal your proof is, the less chances of you making a logical error in the proof. It is a good idea to start writing proofs with the mindset of "rigor extremist" and once you are comfortable and see through the mathematically rigorous steps of a statement, you can rely more in English sentences. This course particularly would do it in the latter way, but ensuring that mathematical rigor is kept in tact. The beauty of the combinatorial proofs lies in the elegance and the combinatorial insight and intuition. Balancing the intuition with rigor in presentations and descriptions lies in the art of presentations.

Suppose that we have to prove a statement $\gamma$ of the form $p \rightarrow q$.  We quickly recall the different ways of proof in the above described form. 

\begin{description}
\item{\bf Direct Proof:} Assume $p$ and then derive $q$ using the assumption and the axioms by applying the rules of inferences. The is considered as a proof of the statement $p \implies q$ since it can be associated with a valid argument form by itself.
\item{\bf Indirect Proof:} Assume $\lnot q$ and then derive $\lnot p$. Again, this is also considered as a proof of the statement $p \implies q$ since it can be associated with a valid argument form by itself. This is also called proof by {\em contrapositive}.
\item{\bf Proof by Contradiction:}
A proof by contradiction, assumes the negation of the statement to be proven (that is, $\lnot \gamma$) and then defined a statement $r$ (this forms a part of creativitiy of the proof), and then derives $r \land (\lnot r)$ from the assumption and axioms using the rules of inferences. By an associated valid argument form, this shows that $\gamma$ must be true, again, by associating the definition of a valid argument form.
\end{description}
In addition, while proving quantified statements, there are a few additional ideas that are used which we quickly review below:
\begin{description}
\item{\bf Proof by Exhaustive Cases:} Suppose we want to derive a statement $\Gamma$ of the form $\forall \alpha P(\alpha)$ where $\alpha$ comes from domain of discourse $\cal{D}$ (say, for example, $\alpha$ is a natural number, that is, $\cal{D} = \N)$.  We can partition $\calD = \calD _1 \cup \calD_2 \ldots \cup \calD_k$ into several subdomains and prove the statement $\forall \alpha \in \calD_i,~P(\alpha)$ separately. Each part of the proof  $\forall \alpha \in \calD_i P(\alpha)$ is said to be a "case" of the proof. The fact that, $\calD = \calD_1 \cup \calD_1 \ldots \cup \calD_k$  is what is meant by the statement that the case analysis is {\em exhaustive}.
\item{\bf Proof by "Counter Example":} Suppose we want to disprove statements of the form $\forall \alpha P(\alpha)$. That is, we want to derive $\lnot\left(\forall \alpha P(\alpha)\right)$ which is logically equivalent to $\exists \alpha \lnot P(\alpha)$. Hence it suffices to demonstrate an $\alpha$ in the domain for which we can show $P(\alpha)$ is false.
\item{\bf Proof by Mathematical Induction:} This is a technique to prove statements of the form $\forall \alpha P(\alpha)$ where the domain $\calD$ is countably infinite. That is, the domain $\calD$ can be put in bijection with the set of natural numbers. The technique forms part of the Peano's axioms that define the natural numbers and hence is a valid proof technique. If $\phi : \N \to \calD$ is a bijection, in order to prove $\forall \alpha P(\alpha)$, we can equivalently prove $\forall n \in \N, P(\phi(n))$. In particular, it takes the following form:
\textit{If we can prove $P(\phi(0))$ and the implication $\left[ \forall  n \in \mathbb{N}, P(\phi(n)) \implies P(\phi(n+1) \right]$  then we can conclude  $\forall n~ P(\phi(n)) $}.
There are versions of this proof techniques such as strong induction, structural induction, spiral induction, double induction etc which are adaptations of the above basic idea.
\end{description}

Most of the proofs that we do in the courses will follow one of the above frameworks. We will not do examples of these techniques since that is already covered in the basic discrete mathematics course. 

\section{The Pigeon Hole Principle (PHP)}

With the quick recap done in the previous part, we now plunge into the actual business in this lecture. We first prove the following basic version of the Pigeon hole principle.
\begin{theorem}
Let $n,k \in \N$, such that $n > k$. Suppose we place $n$ identical balls in $k$ identical bins, then there is a bin that has  at least  two balls in it.
\end{theorem}
\begin{proof}
Let $n, k \in \N$ and $n > k$. Assume for the sake of contradiction that when we placed the balls into the bins as indicated in the theorem, there was no bin with at least two balls in it. 

As such the bins are identical, but number them from $1$ to $k$ now. Using this notation, let us define $b_i$ to be the number of balls that went into the bin number $i$. Clearly $\forall i, b_i \ge 0$. Since we did distribute all the balls into the bins, we have :
$$\calR : \sum_{i=1}^k b_i = n $$
Using the assumption, we have that: $\forall i, 0 \le b_i \le 1$. Summing up for $i$:
$\sum_{i=1}^k b_i \le \sum_{i=1}^k 1 = k < n$. Hence we have derived the statement :
$$\lnot \calR : \sum_{i=1}^k b_i \ne n$$
Hence we have derived $\calR \land \lnot \calR$. This is a contradiction and hence the original assumption that we started out with must be false and hence there has to exist a bin which has two balls in it.
\end{proof}

\begin{curiosity}
The formal proof of PHP as simple as it sounds is still a subject of substantial research in an area called \textit{proof complexity}. To demonstrate this, let us write the principle itself in more rigorous notations. Let $n > k$, and $\{ x_{ij} \mid  i \in [n], j \in [k] \}$ be propositional variables (which can be called, say {\em pigeon hole variables}). Following our original notation, where there are $n$ pigeons and $k$ holes, the basic Pigeon Hole Principle is the following Disjunctive normal form formula : 
$$\textrm{\sc PHP}_k^n \defn \left( \bigvee_{i \in [n]} \bigwedge_{j \in [k]} \bar{x_{ij}} \right) \lor \left( \bigvee_{j \in [k]} \bigvee_{r \ne s \in [n]} (x_{rj} \land x_{sj}) \right) $$
To prove this, one possibility is to derive the contradiction from the negation of 
$\textrm{\sc PHP}_k^n$. This is an expression in conjunctive normal form, with clauses:
$$ \textrm{For $i \in [n]$ the clauses : } Q_i \defn \bigvee_{j=1}^k x_{ij} $$
$$\textrm{ and for $s \ne t \in [n], j \in [k]$ the clauses } Q_{s,t,j} \defn \bar{x_{sj}} \lor \bar{x_{tj}}$$
Intuitively,  these say that there is a function from $[n] \to [k]$ (which is represented by $x_{ij}=1$ to mean that the function takes $i$ to $j$) which is well defined (for every $i$, there exists a $j$ such that $x_{ij} = 1$) and also injective (for two different $s$ and $t$, it is not the case that $x_{si}$ is $1$ and $x_{tj}$).
Since $n > k$, there cannot be an injection, and hence the negation of the conjunction of these clauses $\textrm{\sc PHP}_k^n$ must be true.

Suppose we ask, starting from these clauses as axioms, and applying rules of inferences (say the resolution principle) alone, how many steps of proof does one need to do to derive the contradiction ($r \land \lnot r$ for some $r$). \footnote{Notice that this sounds exactly like computation, how many steps of computation is required in order to certain tasks in terms of input parameters}. We measure this in terms of $n$ and $k$ which determines the number of variables in the system. The area which studies the complexity of proofs in the above is called {\em proof complexity theory}. It turns out the the basic PHP itself is one of the tautologies for which one requires exponentially long proofs if we are restricting ourselves to resolution? What if we relax this? The area has several interesting open questions related to this and they have close connections to computational complexity theory too.
\end{curiosity}

\subsection{A Quick Example: }

We will now demonstrate the application of the principle itself by a quick example. This is meant to be a revision of the topic from the previous courses and hence it is very much possible that you have seen the application earlier.

\begin{theorem}
If you consider any five points placed inside the unit square then there must necessarily exist two points are at most $0.75$ unit away from each other.
\end{theorem}
\begin{proof}
Firstly, to make it sound less magical, let us comment that  theorem is actually true for $0.75$ units replaced by $0.707$ units which is actually $\frac{1}{\sqrt{2}}$. The application of PHP goes as follows. Consider four small squares which are obtained by the midpoint of the square as one of the corners. These small squares form the bins and the five points that we place forms the balls. By applying PHP, we conclude that there must be two points which falls into the same small square. Now the argument can be completed by the fact that the maximum distance between any two points which are in the same small square is at most $\frac{1}{\sqrt{2}}$ since the sides of the square are $\frac{1}{2}$ each.
\end{proof}
\begin{remark}[{\bf Tightness}]
Is the above theorem tight? Can it be improved? Improvement can be in terms of two parameters. Firstly, can we make the same claim for 4 points? Secondly, even for 5 points, can we make an improved claim about the minimum distance being, say 0.7 units? The answer to both these questions are no. For the first, we can demonstrate  4 points in which every pair is at least one distance away - the four corners themselves will serve as a counter example. For the second question, we can demonstrate 5 points which are actually only pairwise at least $\frac{1}{\sqrt{2}}$ distance away. 
\end{remark}

\begin{remark}[{\bf Glimpse of Extremals in Combinatorics}]
The above example theorem, while is a classical application of Pigeon Hole Principle, it also demonstrates a curious phenomenon. In spirit it says that \textit{if there are large number of objects in a collection, then there must be some structure}. Question is how large? And what is structure? The answers to these vary and forms the foundations of this area. We will see more of this when we see Ramsey Theory.
\end{remark}

\section{Numbers and Remainders}

It is customary to do an example of PHP from numbers and division under remainders. We will do a slightly unusual example. 

\begin{theorem}
Consider the infinite sequence $7, 77, 777, \ldots ,7777777, \ldots $ - there must necessarily exist a number in this sequence that is divisible by 2003.
\end{theorem}
\begin{proof}
As weird as it sounds, one might wonder how does PHP play a role. There does not seem to be any place to apply PHP directly in the statement of the problem. Indeed, infinitude seems to indicate that we are allowed to take large numbers in the sequence. A usual trick is the division, and then consider the remainders.

As a start, consider  first 2003 numbers in the sequence. Denote them by $n_1, n_2, \ldots n_{2003}$. Divide them by 2003 and collect the remainders that we see. Denote them by $a_1, a_2, \ldots, a_{2003}$. If any of the $a_i$s are 0, then we are done since that  Indeed, we have that $1 \le a_i \le 2002$. Clearly, now the pigeons and holes are visible now. The numbers $n_i$s are the pigeons and the reminders are the holes. There are only 2002 holes but there are 2003 pigeons and hence by PHP, there must exists $1 \le i < j \le 2003$ such that $a_i = a_j$. This gives:
\begin{eqnarray}
n_i  \mod 2003 = n_j \mod 2003 \\
(n_i - n_j) \mod 2003 =  0\\
2003 \textrm{ divides } (n_i - n_j) \\
\end{eqnarray}

That is good progress. We managed to show 2003 divides $(n_i - n_j)$. However, $n_i - n_j$ unfortunately, will not be in the sequence at all. How will this number look like? By the structure of the numbers, subtracted, this difference will be a number of $7$s and then several zeros. More precisely computing these number, we have that:
$$(n_i - n_j) = n_{j-i} 10^{j-i}$$
So we have that $2003$ divides the product of $n_{j-1}$ and $10^{j-i}$. However, 2003 being an odd number which is not a multiple of $5$ will not have a common factor with any power of 10. Hence 2003 must necessarily divide $n_{j-i}$ which should be there in the sequence. This completes the proof.
\end{proof}

\section{Graphs}

Our third application is related to problems that can be modelled as graphs.

\begin{theorem}
In any chess tournament, where there are $n$ participants, at any point of time there must be two participants who finished the same number of games in the tournament.
\end{theorem}
It is natural to model this situation as a graph with $n$ vertices where each vertex represents a participant and we put an edge between two vertices if player $i$  and player $j$ have played a game with each other. The number of games played by a player is exactly the degree of the vertex in this graph. Rewriting the above theorem in the new language now:
\begin{theorem}
In any undirected graph $G$, there must be two vertices which are having the same degree.
\end{theorem}
\begin{proof}
The proof is by an exhaustive case analysis. We need to argue the above for all graphs. We divide this domain into two based on whether there is an isolated vertex or not.
\begin{description}
\item{\bf Case 1 : $G$ has an isolated vertex} - In this case, there is a vertex of degree $0$, and hence there cannot be a vertex of degree $n-1$. Thus we have $n$ vertices, and only $n-1$ possible degree values $\{10,,2,\ldots n-2\}$. By the PHP, we must see two vertices which has the same degree.
\item{\bf Case 2: $G$ does not have an isolated vertex} - In this case, there is no vertex of degree $0$, and hence the degree values of vertices can only be in the set $\{1, 2, \ldots n-1\}$. Again we have $n$ vertices whose degrees take only $n-1$ possible values. Again, by PHP, we must see two vertices having the same degree.
\end{description}
\end{proof}

\begin{exercise-prob}[See Problem Set 1~(Problem \ref{mutual-friends})]
\begin{show-ps1}{mutual-friends}
A social network is said to be symmetric if the relation between users that is maintained as a part of the network, is symmetric. Consider a symmetric social network  and let the symmetric relation maintained be that of ``user $A$ and $B$ are {\em friends}" (like in the case of facebook). A user $C$ is said to be a \textit{mutual friend} of users $A$ and $B$ if, $C$ is a friend of both $A$ and $B$. Prove that - for any user $A$ of the network who has at least two friends, there must exist two friends of $A$ who has the same number of mutual friends with $A$. 

Comment on whether symmetry is critical for your argument. Take the example of {\em instagram} where the symmetric relation of {\em friends} is replaced by {\em followers}. Generalize the definition of mutual friends to {\em mutual followers}. Comment on whether a similar statement for followers can be established in this case.
\end{show-ps1}
\end{exercise-prob}

\section{Discussion Session}

Just to get started, we considered the following question - \textit{how many people do we need to choose so that we can be assured that two among the set of people we have chosen will have their birthday on the same day?} The answer to this question is given by Pigeon Hole Principle immediately by considering the people to be pigeons and the day of the year on which their birthday falls to be the pigeonhole. Hence to be guaranteed that out of the 366 holes, at least one contains two pigeons (people), and hence having the same birthday, we need to choose 367 people. Just to test our understanding, we asked "is the theorem tight?" in terms of number of people. It is indeed is, since there is a set of 366 people whom you can choose all of whom have different birthdays. That is, 367, is the smallest number for which the above statement can be proposed. Hence the theorem is tight.

The first discussion point that was raised was a comparison with Birthday paradox. If we do not choose 367 people we are not given the guarantee that there are two people in the set with same birthday. What if we don't want this guarantee with certainty - but instead, we will need to get a probabilistic guarantee. To formalize this one has to imagine an experiment where the people are chosen uniformly at random. More rigorously, the property of the distribution is that for every date of the year, the probability that the chosen person has a birthday on that day is $\frac{1}{365}$. Let us say, we are talking about a non-leap-year. The questions is then of the form {\em what is the minimum number of people we need to choose, as per the above experiment, such that we are guaranteed at least 99.999\% chance of getting two people with the same birthday in the set?}. A natural number to choose is 364 which is slightly less than 365. But what is the minimum? The answer beats our usual intuition and is surprisingly low - we need to choose only 70 people to achieve this !!. The surprise goes even further if we ask for 50\% success, then the number is just 23 !! - and hence this is called \textit{the birthday paradox}.

\subsection{Impossibility of Perfect Lossless Compression}

PHP has a variety of applications. A first application outside discrete math course, that we usually encounter is in the automata theory where we use it to prove the pumping lemma. In fact, this one principle is pivotal in showing that there {\em cannot be} finite automaton accepting certain languages.

We then turned into a practically related application of PHP, in the context of file compression. We all have used file compression programs - say like zip or tar. They compress our files into smaller sizes and they usually provide compression ratio too. Here is a question out of curiosity. Can we give compression algorithm that is guaranteed provide compression for all the files? This is a natural requirement. Interestingly, the actual situation is worse, any compression algorithm, not only cannot reduce the size of all files, but also has to increase the size of some file. The argument uses PHP. 

Compression algorithms are nothing but programs which translates files (which are interpreted as strings) to strings. That is, they are functions of the form $C : \Sigma^* \to \Sigma^*$. A compression algorithm is said to {\em lossless} if this function is injective. That is, given a compressed strings (element in the RHS) we have a unique file that we can decompress it to. Indeed, compression algorithms that are not lossless are practically useless since there cannot exist decompression algorithms which can recover the compressed file.

\begin{theorem}
For any lossless data compression algorithm that makes at least one file smaller, there will be at least one file that it makes larger.
\end{theorem}
\begin{proof}
Let $C$ be the compression function from $\Sigma^* \to \Sigma^*$. Let us fix $\Sigma = \{0,1\}$ without loss of generality.
Suppose it makes at least one file smaller than its size as a result of the compression. In addition, for the sake of contradiction, suppose that the algorithm does not make any file larger than their respective sizes.  Let $w \in \Sigma^*$ be the shortest string (say, $|w| = \ell$) which the algorithm makes smaller. That is, by the assumption, for $w' \in \Sigma^*$ such that $|w'| < |w|$, $|C(w')| = |w'|$. Thus, consider the following set :
$$ \Gamma = \left\{ w \in \Sigma^* \mid | C(w) |  < \ell \right\} $$
From the above assumptions, we have that $|\Gamma| \ge 2^\ell+1$, but then by definition $\Gamma \subseteq \{0,1\}^{\ell-1}$. Hence by Pigeon Hole Principle, $\exists w,w' \in Gamma$ with $w \ne w'$, such that $C(w) = C(w')$. This contradicts the lossless property.
\end{proof}
%
%\subsection*{Computation of $M$ - from discussion on Sep 28th}
%
%\paragraph{Harmonic Number Bound:}
%
%\begin{eqnarray*}
%H_n & = & 1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4} \ldots +\frac{1}{n} \\
%& = & \bigsum_{k=0}^{\lfloor \log n \rfloor} \sum_{j=0}^{2^k -1} \left( \frac{1}{2^k+j} \right) \\
%& \le & \bigsum_{k=0}^{\lfloor \log n \rfloor} \sum_{j=0}^{2^k -1} \left( \frac{1}{2^k} \right) \textrm{ since the denominator decreased} \\
%& = &  \sum_{k=0}^{\lfloor \log n \rfloor} (1) \le \log n 
%\end{eqnarray*}
%We wanted to compute :
%
%\[ M = \sum_{d=1}^\infty \frac{\mu(d)}{d^2} \]
%
%By using prime decomposition theorem, if $p_1,p_2, \ldots $ are list of primes:
%
%\[ M = \left( 1- \frac{1}{p_1^2} \right) \left( 1- \frac{1}{p_2^2} \right)  \left( 1- \frac{1}{p_3^2} \right)  \ldots \]
%
%The next idea is to compute $\frac{1}{M}$ as follows: since we have written above as the product -
%
%\begin{eqnarray*}
%\frac{1}{M} & = & \frac{1}{\left( 1- \frac{1}{p_1^2} \right) }  \frac{1}{\left( 1- \frac{1}{p_2^2} \right) }  \frac{1}{\left( 1- \frac{1}{p_3^2} \right) } \ldots \\
%\end{eqnarray*}
%Using the infinite series expansion ... $\left(1-\frac{1}{x^2}\right() = 1+x^2+x^4+\ldots$
%\begin{eqnarray*}
%\frac{1}{M} & = & \left( 1+\frac{1}{p_1^2} + \frac{1}{p_1^4} \ldots \right)
%\left( 1+\frac{1}{p_2^2} + \frac{1}{p_2^4} \ldots \right)
%\left( 1+\frac{1}{p_3^2} + \frac{1}{p_3^4} \ldots \right) \\
%& = & 1+\frac{1}{2^2}+\frac{1}{3^2}+ \ldots \\
%& = & \frac{\pi^2}{6} \textrm{   {\tt By Euler sum that Gautam mentioned in the session}}
%\end{eqnarray*}


\Lecture{Jayalal Sarma}{Sep 9, 2020}{02}{More on PHP}{Jayalal Sarma}{$\gamma$}{JS}

\section{Warm up and  Generlizations of PHP}

We start with a usual application of PHP to numbers to warm up in the lecture.
\begin{theorem}
In any set of $n+1$ positive integers each at most $2n$, there must exist at least one number which divides the other. 
\end{theorem}
\begin{proof}
Let $S = \{a_1, a_2, \ldots , a_{n+1}\}$ be the set of $n+1$ positive integers such that $a_i \le 2n$. Each number can be written in the form $a_i = 2^{k_i}q_i$ where $k_i$ is the maximum power of $2$ that divides $a_i$ and $q_i$ hence is an odd number.

Now consider the numbers $q_1, q_2, \ldots q_{n+1}$. Can they be distinct? Since they all are in the range $1 \le q_i \le 2n$, where there are only $n$ odd numbers - By an application of PHP, we have that there must exist $i, j$ such that $1 \le i \ne j \le n+1$ with $q_i = q_j$.  Hence, we have that $k_i \ne k_j$. This gives the two exhaustive cases:

\begin{description}
\item{{\bf Case 1:} $k_i > k_j$:} Since $2^{k_j}$ divides $2^{k_i}$ and this gives $q_j2^{k_j}$ divides $q_i2^{k_i}$. Hence $a_j$ divides $a_i$.
\item{{\bf Case 2:} $k_j > k_i$:} Same as previous case, just swapping the role of $i$ and $j$.
\end{description}
In either case, we have that there exists two numbers in the set where one divides the other. This concludes the proof.
\end{proof}

We now state a usual generalization of PHP as a recap.

\begin{theorem}[{\bf Generalized of PHP}]
Let $n,m ,r$ be positive integers and let $n > mr$. If we distribute $n$ balls into $m$ bins, then there must be a bin which has at least $r+1$ balls.
\end{theorem}

Indeed, the generalization comes handy when the combinatorial statement that we want to explore is not about a "conflict" but about multiple elements getting to same bag. A simple recap example to demonstrate this is the following question - {\em how many students do we need to be in the course, such at the end of the semester, no matter how the performance of the students is, that at least five students get the same letter grade (out of the five grades $S,A,B,C,D,E$)?} This is also an extremal question. Applying generalized PHP, with $r+1 = 5$ and $m=6$, it is sufficient to have 25 students in the class.  And with 24 we cannot guarantee this since there is way to distributed 4 students each to each grade so that there are not 5 students having each grade.

\section{Example 2 : Erd\"os-Szekeres Theorem}

This is about a pattern that appears in sequence of distinct numbers first proved by Erd\"os and Szekeres in 1939. The theorem its has a geometric interpretation too. The theorem itself is a creative use of Pigeon Hole Principle and is a case of extremal combinatorics.

\begin{theorem}
In any Sequence of $n^2+1$ distinct real numbers there must necessarily exist either a strictly increasing subsequence  of $n+1$ numbers  or a strict decreasing subsequence of $n+1$ numbers.
\end{theorem}
Before we begin to prove this, let us play around with an example.
$8, 11, 9, 1, 4, 6, 12, 10, 5, 7$. Here $n=3$ and there are 10 numbers in the sequence. There must be at least one strictly increasing subsequence of length 4. Indeed, there is - the subsequence $1,4,6,12$. In fact, there are more, $1,4,6,10$ etc. But anyways there is at least one. In fact, in this case, it so happens that there is a strictly decreasing sequence also of length $4$. This is the subsequence, $11, 9, 6, 5$. There are more subsequences.

\begin{proof}
The proof is an elegant and intuitive one. A perfect example of how such proofs are discovered. Suppose $a_1, a_2, \ldots a_{n^2+1}$ forms the given sequence of numbers.

Suppose we checked for the increasing subsequence of numbers of length $n+1$ and for decreasing subsequence of length $n+1$ but did not find it in the above sequence. How do we formally represent this data? Here is an idea, for each index $k$, let us associate a pair of numbers $(i_k,d_k)$, which are defined as : the length of longest increasing (for $i_k$ and respectively decreasing for $d_k$) subsequence of numbers starting from the number $a_k$ in the given sequence. The reason we failed to find the subsequence indicates that these pairs must satisfy, for every $k$, $1 \le i_k \le n$ and $1 \le d_k \le n$.

Thus we have $n^2+1$ tuples in hand where value for each component can be only between $1$ and $n$. Hence there are only $n^2$ different distinct such pairs possible. But now we have a scenario for PHP, which gives that there must exist $s,t \in [n^2+1]$ such that $s \ne t$ (say without loss of generality that $s > t$) such that the tuples for both these indices are the same. That is $i_s = i_t = i$ (say) and $d_s = d_t = d$ (say).

We know that the numbers are distinct in the sequence. Hence $a_s \ne a_t$. Thus we have the following two exhaustive cases:

\begin{description}
\item{{\sf Case 1:} $a_s > a_t$ :} Let $(t_1, t_2, \ldots t_d)$ be the decreasing sequence of length $i$ that starts from $a_t$ with $t_1 = a_t$. But then $(a_s, t_1, t_2, \ldots t_d)$ is also decreasing and it starts with $a_s$ and is of length $i+1$. This contradicts the fact that $d_s = d$.
\item{{\sf Case 2:} $a_s < a_t$ :} Same as the above case, where we replace decreasing with increasing and the final contradiction is for the fact that $i_s=i$, because we can demonstrate a length $i+1$ increasing subsequence of length $i+1$ in the given sequence.
\end{description}
Hence the proof.
\end{proof}

\begin{remark}
The above theorem can also be generalized, and in fact is the original form of the Erd\"os-Szekeres theorem. For given natural numbers $r$, $s$ they showed that any sequence of distinct real numbers with length at least $(r-1)(s-1)+1$ contains a monotonically increasing subsequence of length $r$ or a monotonically decreasing subsequence of length $s$. 
\end{remark}

\section{Example 3: People at Party}

If $6$ people are invited to a party, something interesting happens. Let us say some pairs of them are friends and some pairs of them are strangers with each other. There will always be some set of three people who are pairwise strangers with each other or there will be a set of three people who are pairwise friends with each other. This phenomenon can be easily mistaken to be a sociological or behavioural psychological fact that humans seem to be behave this way. However, it turns out that it can be seen as a simple result of combinatorics and is a nice applciation of PHP in disguise. We demonstrate this now.

\begin{theorem}
If $6$ people are invited to a party, then there must exist three of them who are pairwise strangers each other, and there must be three of them who are pairwise  friends with each other. 
\end{theorem}
There are many equivalent ways of formulating this. One can talk about graphs to model the facts stated above. We defer these to a later point when we get to Ramsey numbers. We now get to the proof of the above in the same language as we discussed above.
\begin{proof}
Let $P$ be the set of people who joined the party. Let $\alpha \in P$ be one of the attendees. We do the following case analysis based on how many people does $\alpha$ are friends with in the party. We want to demonstrate a set $\Gamma \subseteq P$ such that $|\Gamma| = 3$ and the members of $\Gamma$ are either pairwise friends or pairwise strangers.
\begin{description}
\item{{\bf Case 1:} $\alpha$ has atleast three friends in $P$:} Let $\beta, \gamma, \delta$ be the three friends. We ask the question, are $\beta, \gamma, \delta$ friends amongst themselves? The answer to this will give the following exhaustive subcases.
\begin{description}
\item{{\bf Case 1a:} $\beta, \gamma, \delta$ are pairwise strangers among each other:} In this case  we can simply set $\Gamma = \{ \beta, \gamma, \delta \}$ which has the required property for $\Gamma$ as desired.
\item{{\bf Case 1b:} there is a pair among $\beta, \gamma,$ and $\delta$ who are friends :} Without loss of generality let us say $\beta$ and $\gamma$ are friends (the other cases are similar). In this case, define $\Gamma = \{\alpha, \beta,\gamma\}$ and it has the desired properties.
\end{description}
\item{{\bf Case 2:} $\alpha$ has at most two friends in $P$}. In this case, there are three strangers for $\alpha$ in $P$, and let us name them $\beta, \gamma, and \delta$. We ask the question, are $\beta, \gamma, \delta$ friends amongst themselves? The answer to this will give the following exhaustive subcases.
\begin{description}
\item{{\bf Case 2a:} $\beta, \gamma, \delta$ are pairwise friends among each other:} In this case  we can simply set $\Gamma = \{ \beta, \gamma, \delta \}$ which has the required property for $\Gamma$ as desired.
\item{{\bf Case 2b:} there is a pair among $\beta, \gamma,$ and $\delta$ who are strangers :} Without loss of generality let us say $\beta$ and $\gamma$ are the strangers (the other cases are similar). In this case, define $\Gamma = \{\alpha, \beta,\gamma\}$ and it has the desired properties.
\end{description}
\end{description}
Since we argued in both the cases, this completes the proof of the theorem.
\end{proof}
\begin{remark}
A more elegant way to handle case 2 is to reduce it to case 1 itself. Consider the complement of the friends/stranger relation. Note that the result required for the theorem does not change since we just need $\Gamma$ to be either pairwise strangers or pairwise friends and they just get complemented. Now, if $\alpha$ has at most two friends in $P$, it has at least three friends in $P$ in the complementary relation and hence we can reuse Case 1 in this case.
\end{remark}

\begin{exercise}
Is the above theorem tight? Indeed, one can construct 5 people going to a party and associate a friends/stranger relation among them such that there does not exist three people who are friends with each other and  there does not exist three people who are strangers with each other. The exercise is to explicitly write down this counter example relation.
\end{exercise}

\begin{exercise-prob}[See Problem Set 1~(Problem \ref{php-square})]
\begin{show-ps1}{php-square}
The set $M$ consists of nine positive integers, none of which has a
prime divisor larger than six. Prove that $M$ has two elements whose
product is the square of an integer. Is the bound $9$ in the above statement tight?
\end{show-ps1}
\end{exercise-prob}

\Lecture{Jayalal Sarma}{Sep 10, 2020}{03}{PHP for Dirichlet's Approximation Principle}{Jayalal Sarma}{$\gamma$}{JS}

We now discuss a very old and different application of PHP which predates the name PHP itself. This is to show the approximation principle of irrational numbers. This application got this principle the name as {\em Dirichlet Box Principle}.

\section{Approximation of irrationals by rationals}
The task we have at hand is about approximating irrational numbers using rationals to a given accuracy. As a concrete example, suppose we want to approximate $\sqrt{2}$ by $\frac{p}{q}$ up to a given accuracy $\epsilon$ such that
$$ \left| \sqrt{2} - \frac{p}{q} \right| \le \epsilon $$
the driving question for us, is how large should $p$ and $q$ be? In fact they are related and hence, we need to ask how large the denominator $q$ should be? The larger the $q$ is, the more the granularity of the representation is, and the larger the storage cost is for the number to be represented. Hence, for a fixed irrational number and a given $\epsilon$ we want the value of $q$ to be as small as possible.

Indeed, if we want to do this for an arbitrary irrational number, here is a simple idea: let $q \in \N$ (which we will choose later). Divide the number line into intervals of length $\frac{1}{q}$ each. Consider the number $\alpha$ and see which interval it belongs to. Choose the nearest multiple of $q$ to be the $\frac{p}{q}$ to be the approximation of $\alpha$. A quick thought will convince you that the error introduced by this method is at most half of the interval size which is $\frac{1}{2q}$. That is, for any $q$ that we choose, if we choose $p$ also accordingly as above,
$$ \left| \alpha - \frac{p}{q} \right| \le \frac{1}{2q} $$
Thus, for a given $\epsilon$, we should choose $\frac{1}{2q} < \epsilon$ to get the required accuracy. In other words, $q$ linearly changes with $\frac{1}{\epsilon}$. Just to get a sense of this growth, if $\epsilon$ is given to be $0.0001$, then we should choose $q$ to be roughly 5000. 

\section{Dirichlet's Approximation Principle}

Indeed, we would have probably preferred a smaller $q$, due to the above mentioned representation cost. Dirichlet approximation principle, exactly improves the above and is a nice application of the pigeon hole principle.

\begin{theorem}[{\bf Dirichlet's Approximation Principle}]
For every irrational number $\alpha$, there is a $p, q \in \Z$ such that:
$$\left| \alpha - \frac{p}{q} \right| < \frac{1}{q^2}$$
\end{theorem}

A few remarks about the improvement  is due. Indeed, now $q$ changes quadratically with $\epsilon$. To check the numbers, suppose $\epsilon$ is given to e $0.0001$, then we can afford to choose $q$ to be just 100, as opposed to 5000.  We will now prove the above theorem:

\begin{proof}
Let $\alpha$ be the irrational number that we are interested in approximating.
First observation is that it is sufficient to prove that $\exists p,q \in \Z$,
$$\left| q\alpha - p \right| < \frac{1}{q}$$
In other words, we need to understand the nearest integer to the quantity $q\alpha$. Intuitively, the fractional part of $q \alpha$ plays a roles in this and which we will study now.

Fix a positive integer $N$ (we will choose this later) and consider the numbers $0, \alpha, 2\alpha, \ldots N\alpha$. Ideally we will choose $q$ to be less than $N$, hence one of these numbers is $q \alpha$. However, since we are interested in the fractional parts, let us distribute them into intervals as we did in the naive case.

Consider the interval from $[0,1)$ divided into subintervals of the form:
$$ \left[\left.0,\frac{1}{N}\right.\right),\left[\left.\frac{1}{N},\frac{2}{N}\right.\right) \ldots ,\left[\left.\frac{N-1}{N},1\right.\right) $$

There are $N$ intervals in this list. If we distribute the fractional part of $N+1$ numbers $0, \alpha, 2\alpha, \ldots N\alpha$ to this list, by PHP, we have that there must be two of the multiples of $\alpha$ which falls within the same interval. In other words, there exists $a,b \in \{0,1, \ldots N\}$ such that:
$$ \{a\alpha\} - \{b \alpha\} < \frac{1}{N}$$
Just to fast forward, the idea is to demonstrate that the choice of $q = a-b$ actually works for our purpose. To do this, we will show that the nearest integer to $a \alpha  - b \alpha$ is at most $\frac{1}{a-b}$ away from it and that integer will be our $p$. Since $a-b$ is at most $N$, it is sufficient to show that there is an integer close to $a \alpha  - b \alpha$ is at most $\{a \alpha \} - \{b \alpha\}$ away. By the above, we have that this is at most $\frac{1}{N}$ which in turn is at most $\frac{1}{a-b}$. This is in fact a general statement which we can prove as follows:

\begin{lemma}
\label{lem:a-b}
Let $A$ and $B$ be two real numbers, there is  an integer $p$ close to $|A-B|$ such that:
$$d(p,|A-B|) \le \{A\} - \{B\} $$
where $d(s,t)$ denotes $|s-t|$.
\end{lemma}
\begin{proof}
The idea is very simple, let us write:
$A = A_1+A_2 \textrm{  and } B = B_1+B_2$
where $A_1$ and $A_2$ are integral and fractional part respectively. If $A_2 > B_2$, then the distance to the integer $|A_1 -B_1|$ is at most $A_2-B_2$. The other case works in a similar way.
\end{proof}

Applying Lemma~\ref{lem:a-b} to the case when $A = a\alpha$ and $B = b \alpha$, gives us that there is an integer $p$ such that 
$$d(p,|a\alpha - b\alpha|) \le |\{a\alpha\} - \{b\alpha\}| < \frac{1}{N} \le \frac{1}{a-b} = \frac{1}{q}$$
Thus, there is $p$ (as claimed by Lemma~\ref{lem:a-b}) and $q$ (which is equal to $a-b$ which exists as per PHP application), such that:
$$|q\alpha - p| \le \frac{1}{q}$$
This completes the proof of the theorem.
\end{proof}


\begin{exercise-prob}[See Problem Set 1~(Problem \ref{simultaneous-approx})]
\begin{show-ps1}{simultaneous-approx}
Let $\alpha_1, \alpha_2, \ldots \alpha_k$ be $k$ rational numbers. Generalizing the Dirichlet's approximation principle argument that we did in class, using PHP again, prove that there must exist integers $p_1, p_2, \ldots p_k$ and $q$ such that:
$$\forall i,~\left| \alpha_i - \frac{p_i}{q} \right| < \frac{1}{q^{1+\frac{1}{k}}}$$
\end{show-ps1}
\end{exercise-prob}

\begin{remark}
The proof of the theorem proves something stronger. That is, it actually can give a way to get many $q$'s which acheive the error bound. Notice that the choice of $N$ was free in the proof and $q$ that we end up choosing is $a-b$ which is at most $N$. Hence suppose that we already have a $p$ and $q$ in hand, if we run the proof by choosing $N$ to be large enough such that:
$$\frac{1}{N} < |q\alpha - p|$$
then necessrily the new $p$ and $q$ that the proof gives will have to be different (and $q$ needs to be larger). By repeating this, we can produce a new $p$ and $q$ and so on and so forth. This gives us a way to produce infinitely many $p$ and $q$ such that:
$$\left| \alpha - \frac{p}{q} \right| < \frac{1}{q^2} $$
\end{remark}

We were clearly motivated to improve the denominator of $2q$ in the naive attempt to $q^2$ in the denominator in the rational approximation principle. Can this be improved further? The following curiosity remark says otherwise.

\begin{curiosity}[{\bf Tightness of Dirichlet's Approximation Principle - Roth's Theorem}]
Let $\alpha$ be any algebraic number (which can be expressed as the root of a polynomial with coefficients from $\mathbb{Q}$. For every $\epsilon$ the inequality,
$$\left| \alpha - \frac{p}{q} \right| < \frac{1}{q^{2+\epsilon}}$$
can hold true only for finitely many co-prime pairs $(p,q)$. This says that the Dirichlet's approximation principle cannot be improved (for infinitely many $p$ and $q$) with a larger order denominator.
\end{curiosity}

The above remark says that we cannot improve in the exponent for Dirichlet's approximation principle. Can we improve by having a large multiplier for the $q^2$ in the denominator? Even this has a limit, and leads to classifying irrational numbers using what is called the \textit{Lagrange measure} of the number.

\begin{curiosity}[{\bf Hurwitz Theorem and Irrationality Measures}]
This is an improvement of the above principle. For every irrational number $\alpha$, there are infinitely many relatively prime integers $p$ and $q$ such that:
$$\left| \alpha - \frac{p}{q} \right| \le \frac{1}{\sqrt{5}q^2}$$
The $\sqrt{5}$ in the denominator is the best possible. If we let it greater than $\sqrt{5}$, then there is a counter example - consider the irrational number $\frac{1+\sqrt{5}}{2}$ (the golden ratio). It can be shown that this can have only finitely many relatively prime integers $p$ and $q$ with the above formula holding (this is done through arguments about continued fraction representations). For example, if we avoid \textit{golden ratio} and some similar irrational numbers, then we can improve the denominator to $\sqrt{8}$. If we avoid \textit{silver ratio} ($1+\sqrt{2}$) and associated irrational numbers, then we can improve this to $\frac{\sqrt{221}}{5}$. In general, the bound is of the form:
$$\left| \alpha - \frac{p}{q} \right| \le \frac{1}{L_nq^2}$$
where $L_n$ (called the \textit{Lagrange numbers}) steadily increases if some bad irrational numbers are included. These also are viewed as measures of "how much irrational the number is". 
\end{curiosity}

\Lecture{Jayalal Sarma}{Sept 14, 2020}{04}{Counting by Bijections and Double Counting Principle}{Jayalal Sarma}{$\gamma$}{JS}

We now quickly review the basic tools from counting. Permutations and combinations forms the basics from discrete mathematics that we rely upon. We will stress on the aspects that are critical for the rest of the course. The first tool that we will demonstrate in detail is the power of counting by using bijections.

\section{Basic Examples of Counting by Bijections}

The cardinality of two sets is said to be the same if there is a bijection between the two. Indeed, for finite sets the notion of cardinality matches with that of size while it can be deceiving for infinite sets\footnote{For infinte sets, there are notions of countability and uncountability of sets which we will not discuss here.}. We will concentrate on finite sets for this part and use bijections to establish combinatorial counting.

We start with something that we are all familiar with, in order to bring out the nuances involved with proof by bijections. Notice that we know how to count this object even otherwise, by other means, but this is just as a starting example.

\begin{proposition}
The number of subsets of a set is of $n$ elements is exactly $2^n$.
\end{proposition}
\begin{proof}
Let $S$ be the given set of $n$ elements. Without loss of generality let us assume that $S = \{1,2,\ldots, n\}$. The bijection is nothing but the well-known idea of characteristic vector of a set,

We establish a bijection between the following two sets.
$$ \phi : \left\{ A : \begin{array}{c} A \textrm{ is a subset of } \\ \textrm{ the set $S$ } \end{array} : \right\} \to \left\{ x : \begin{array}{c} x \textrm{ is a string of length $n$} \\ \textrm{ over alphabet $\{0,1\}$ } \end{array} : \right\}$$
We first define the function as follows. Let $A$ be any subset of $S$, define the string $w = \phi(x)$ as the $n$-bit string where for every $1 \le i \le n$:
\[
w_i = 
\begin{cases}
0 & \textrm{ if $i \notin A$ }\\
1 & \textrm{ if $i \in A$ }
\end{cases}
\]

Notice that the function $\phi$ is well-defined (this may have to be checked explicitly in certain bijections when we define) since we are defining the bit $w_i$ for every $i \in [n]$. 

We now argue that it is an injection. Suppose that two sets $A, B \subseteq S$, but $A \ne B$. That is there is an $i \in S$ for which $i \in A$, but $i \notin B$. By the above definition, the $i$-th but of $\phi(A)$ will be $1$ while the $i$-th bit of $\phi(B)$ will be $0$. This implies $\phi(A) \ne \phi(B)$.

We also show that $\phi$ is a surjection. Given any $w \in \{0,1\}^n$, we can define a pre-image $A \subseteq S$ as $A = \{ i \mid w_i = 1 \}$. By definition, $\phi(A) = w$ and hence $w$ has a pre-image. This shows $\phi$ is a surjection.

\end{proof}

Let us argue a slight variant of the above example now. While there are other ways to establish this, we insist on using the method of counting by bijections.

\begin{proposition}
The number of even sized subsets of $[n]$ is equal to the number of odd sized subsets, and both are equal to $2^{n-1}$.
\end{proposition}
\begin{proof}
by observing that the bijection that we defined in the previous proof has the additional feature that the number of $1$s in $\phi(A)$ is exactly the cardinality of $A$, we can conclude that it is sufficient to establish a bijection between the following two sets:
$$ \psi : \left\{ x : \begin{array}{c} x \in \{0,1\}^n \textrm{ having } \\ \textrm{even no. of 1s in it}  \end{array} \right\} \to \left\{ w : \begin{array}{c}  \textrm{ $w \in \{0,1\}^n$ having} \\ \textrm{odd no. of 1s in it}  \end{array} \right\}$$
Fix any $i \in [n]$, we define a bijection with respect $i$ (this says there are actually $n$ bijections between the two sets above, not just one !). Technically, we should be writing $\psi_i$ but we drop the subscript since it is not critical for the representation. 
\begin{description}
\item{\bf Definition:} We define the bijection as follows : let $e_i$ denote the string which has $1$ in the $i$-th position and $0$ elsewhere.
$$\psi(x) = x \oplus e_i $$
where $\oplus$ denotes bitwise xor to produce an $n$ bit string.
\item{\bf well-defined:} We explicitly check whether the function is well-defined. Indeed, consider any $x \in \{0,1\}^n$ which has even number of $1$s in it. By the operation $x \oplus e_i$ produces a string in $w \in \{0,1\}^n$. Since the $i$-th bit is flipped, $w$ must necessarily have odd number of 1s in it.
\item{\bf injection:} 
We show that $\psi$ is an injection. Consider $x, x' \in \{0,1\}^n$ such that $x \ne x'$.  There must exist an index $j$ in which they differ. We have two cases:
\begin{description}
\item{{\bf Case 1:} $j = i$ :} Indeed, since the $i$-th bit is flipped by the mapping, the images $w = \phi(x)$ and $w' = \phi(x')$ must also have their $j$-th bit to  be different. Hence $\phi(x) \ne \phi(x')$.
\item{{\bf Case 2:} $j \ne i$ :} Since the operation does not change any other bit. The images $w = \phi(x)$ and $w' = \phi(x')$ must also have their $j$-th bit to  be different. Hence $\phi(x) \ne \phi(x')$.
\end{description}
Hence, we conclude that $\phi$ is injective.
\item{\bf surjection:} Given any $w \in \{0,1\}^n$ which has odd weight, we show $x \in \{0,1\}^n$ such that $\phi(x) = w$. Indeed, defining $x = w \oplus e_i$ will meet the requirement. Hence $\phi$ is surjective.
\end{description}
Hence we conclude that $\phi$ is a bijection and that the two sets must be of same cardinality. Since the two sets are disjoint and their union is of size $2^n$, it must be that both of them are of size $2^{n-1}$. This concludes the proof.
\end{proof}

Note that in the above proof, we wrote down the steps in proving the bijection explicitly. It is somewhat standard to skip over the one which are obvious from the definitions, but it is a good practice to write these down in a formal proof so that the argument is not prone to errors.

\subsection{Discussion Session - Counting Cyclic Triplets}

We considered the following counting question in the discussion session to demonstrate that simple bijection and the idea of associating combinatorial objects is a very powerful combinatorial technique.

\begin{problem}
Imagine there are $2n+1$ players in a round-robin tournament. That is, each player plays against every other player. Assume that there are no ties in any match. We say that players $\{a,b,c\}$ forms a cyclic triplet if $a$ beats $b$, $b$ beats $c$, and $c$ beats $a$ (note that the order does not matter). We want to count the maximum number of cyclic triplets that is possible in the tournament.
\end{problem}
\begin{proof}
It is natural to model the above using graphs where each vertex is a player and two vertices $(i,j)$ is a directed edge if the player $i$ beats player $j$. This gives a directed graph with $2n+1$ vertices whose underlying undirected graph is a complete graph on $2n+1$ vertices. It is also called a tournament. Cyclic triplets can naturally be associated with triangles in these directed graphs. So the statement we are address is also equivalently stated as {\em in any tournament on $2n+1$ vertices, what is the number of triangles?}.

The idea is to look at associated objects called "corners" of the triangles. There are ${2n+1 \choose 3}$ triangles, and hence there are $3 {2n+1 \choose 3}$ many corners. The corners can be classified into three types.
\begin{description}
\item{\bf Type 1:} A corner where both edges are incoming edges.
\item{\bf Type 2:} A corner where both edges are outgoing edges.
\item{\bf Type 3:} A corner where one edge is incoming and the other is outgoing.
\end{description}

What happens in a cyclic triplet? It forms a triangle, hence the three corners has to be of Type 3. A non-cyclic triplet will have one corner of each Type. This also establishes a bijection between the Type 1 and Type 2 corners as follows.

$$ \psi : \left\{ \begin{array}{c} \textrm{Type 1} \\ \textrm{Corners}  \end{array} \right\} \to \left\{ \begin{array}{c} \textrm{Type 2} \\ \textrm{Corners}  \end{array} \right\} $$

defined as follows: given any corner $c$ of type 1, define $\psi(c)$ as the type 2 corner that appears in the (undirected) triangle that this corner appears in. This is well defined because (1) $c$ cannot appear in a triangle corresponding to the cyclic triplet (2) exactly only type $2$ triplet can appear in a triangle corresponding to non-cyclic triplet. This uniquely assigns a type 2 corner with $c$ and makes the function, well-defined, injective and surjective (since the process can be reversed too). In fact, the same bijection also proves that the number of non-cyclic triplets is exactly the number of Type 1 corners and also exactly equalt to the number of Type 2 corners. Hence it sufficient to count the number of Type 1 corners. 

So now the strategy is clearer. To obtain an upper bound on the number of cyclic triplets, we express it ${2n+1 \choose 3}$ minus the number of non-cyclic triplets. The it suffices to obtain a lower bound for the number of non-cyclic triplets. Hence it suffices to obtain a lower bound on the number of type 1 corners (or equivalently the number of type 2) corners.

One natural attempt is to aim to count type 1 corners alone. If $d_i$ is the in-degree of the $i$-th vertex, then the number of type 1 corners can be estimated as :

$$t_1 = \bigsum_{i=1}^{2n+1} {d_i \choose 3}$$

However, it is unclear how to get a reasonably tight lower bound for this quantity. 
Instead, the trick is to count the type 1 and type 2 simultaneously as :
$$t = t_1 + t_2 = \bigsum_{i=1}^{2n+1} \left[ {d_i \choose 2}+ {2n-d_i \choose 2} \right] $$
Since we know that $t_1 = t_2 = t$, we have the number of non-cyclic triplets is:

$$\frac{1}{2}\bigsum_{i=1}^{2n+1} \left[ {d_i \choose 2}+{2n-d_i \choose 2} \right] $$
Since we need a lower bound for this, we can use the fact that the summation is minimised when $d_i$ is equal in both the terms in the multiplication. Hence number of non-cyclic triplets is at least:
\begin{eqnarray*}
\frac{1}{2}\bigsum_{i=1}^{2n+1} \left[ {n \choose 2}+{n \choose 2} \right] & \ge & \frac{n(n-1)(2n+1)}{2} \\
\textrm{\# of Cyclic Triplets} & \le & {2n+1 \choose 3}  - \frac{n(n-1)(2n+1)}{2} \\
& \le & \frac{n(n+1)(2n+1)}{6}
\end{eqnarray*}

A natural question is whether this is tight? Or did we have any slackness in the counting? The count will be tight if the $d_i = n$ can be achieved for all vertices. This can be done by the following explicit graph for which the number of cyclic triplets will be hence exactly equal to $\frac{n(n+1)(2n+1)}{6}$ and shows that the above bound is tight.

The construction is as follows. Consider the sequence $1, 2, \ldots ,2n+1$. Define $(i,j) \in E$ if $i-j \mod 2n+1$ is in $[n]$. This ensures that the in-degree and the out-degree of all the vertices is exactly $n$ and hence the graph will achieve the above bound.
\end{proof}

\section{From Bijections to Double Counting}

We will now introduce a new technique called {\em double counting} which has the method of bijections as its backbone. 

\paragraph{Double Counting Method:} The method can be presented as follows. There is one combinatorial (mostly counting) question that we will design which we will answer in two distinct (but provably correct) ways. Since the two answers are for the same counting problem, it is logical to equate them and such an equality gives relations that are otherwise no apparent. 

The whole idea can be viewed as a method of bijection itself. In many situations, the double counting may also reveal an implicit bijection between the two different ways of answering the question. While this is not necessary for the double counting method, it is revealing to think about the underlying bijection.

This is a very elegant and powerful tool. The creativity in the proof is in designing the right question. Indeed, \textit{asking the right question is mostly more than half way thorugh into constructing mathematical proofs} !!.
We demonstrate this by a simple example first.

\begin{proposition}
For any $k \le n$,
$${n \choose k} = {n \choose n-k}$$
\end{proposition}
\begin{proof}
The combinatorial counting question in this case can be the following:
\begin{description}
\item{\bf Q:} In how many ways can we form a committee of size $k$ from a set of $n$ people.
\item{\bf A1:} Directly choose the $k$ committee members from $n$ people. By definition, there are ${n \choose k}$ ways of doing this.
\item{\bf  A2:} Directly choose the $n-k$ non-members of the committee from $n$ people and declare the remaining to be the committee members. There are ${n \choose n-k}$ ways of doing this too.
\end{description}
This completes the argument. Although not required for the proof, for the curious mind, the underlying bijection revealed is the complementation of the set.
\end{proof}

Note that there is an easy algebraic way of arguing the above identity. But as the expressions get more complicated, this proof technique is more revealing and elegant.

\begin{proposition}
For any $n$ and $k$,
$${n \choose k} = {n-1 \choose k} + {n-1 \choose k-1}$$
\end{proposition}
\begin{proof}
We can reuse the question itself from the proof of the earlier proposition.
\begin{description}
\item{\bf Q:} In how many ways can we form a committee of size $k$ from a set of $n$ people.
\item{\bf A1:} Directly choose the $k$ committee members from $n$ people. By definition, there are ${n \choose k}$ ways of doing this.
\item{\bf  A2:} Let the potential members be $\{1,2, \ldots n\}$. Classify the ways of choose $k$ committee members into two. Ones that includes $n$ and the ones that does not include $n$. Since these two kinds of committees are never the same, we can count both types and add them. More formally, this is expressed as, \textit{condition on the fact whether $n$ is in the committee or not}. If $n$ is in the committee, then there are only $k-1$ remaining members of the committee needs to be chosen from the remaining $n-1$ members available to choose from - which gives ${n-1 \choose k-1}$ ways of doing it. On the other hand, if $n$ is not in the committee, then there are still $k$ members to be chosen from $n-1$ potential members to choose from - this gives ${n-1 \choose k}$ as the number of possible ways. Adding these two, gives the RHS as the second answer to the counting question.
\end{description}
This completes the argument.
\end{proof}

\begin{proposition}
For $k \le n$,
$$ k{n \choose k} = n {n-1 \choose k-1}$$
\end{proposition}
\begin{proof}
We can almost reuse the question itself from the proof of the earlier proposition.
\begin{description}
\item{\bf Q:} In how many ways can we form a committee of size $k$ from a set of $n$ people, and then choose a chair of the committee (who is also  a part of the committee).
\item{\bf A1:} Directly choose the $k$ committee members from $n$ people. By definition, there are ${n \choose k}$ ways of doing this. And then among the members chosen, choose a chair for the committee which can be done in $k$ different ways. This gives $k {n \choose k}$ ways of completing the task which is equal to the LHS.
\item{\bf  A2:} First choose the chair from the potential members of the committee. This can be done in $n$ ways. And then choose the remaining $k-1$ members of the committee from the remaining potential members of the committee. This can eb done in ${n-1 \choose k-1}$ ways.
\end{description}
This completes the argument.
\end{proof}

\begin{exercise}
Prove the following identities using double counting method:
$$\sum_{k=0}^n k {n \choose k} = n 2^{n-1} \textrm{\hspace{15mm}} {m+n \choose k} = \sum_{i=0}^k {m \choose i}{n \choose k-i} \textrm{\hspace{15mm}} {n \choose k}{k \choose m} = {n \choose m}{n-m \choose k-m}$$
\end{exercise}

\begin{proposition}
$$\sum_{m=k}^n {m \choose k} = {n+1 \choose k+1}$$
\end{proposition}
\begin{proof}
We need to modify the question slightly here.
\begin{description}
\item{\bf Q:} In how many ways can we choose $k+1$ numbers from the set $\{1, 2, \ldots, n+1\}$?
\item{\bf A2:} The RHS is immediate by definition.
\item{\bf  A1:} Count conditioning on the largest element to be chosen in the set. Note that a subset cannot be counted against two largest elements since largest element of a given set is uniquely defined. Now, for a fixed largest element $m+1$, the numeber of ways of choosing remaining elements is given by the number of ways of choosing $k$ elements from the set $\{1, 2, \ldots m\}$ since $m+1$ is the largest. This gives ${m \choose k}$ ways of completing the task when the largest element is $m+1$. Since $m$ has to be atleast $k$ and can be at most $n$, this gives the number of ways of choosing a set of $k+1$ numbers from the set to be :
$$\sum_{m=k}^n {m \choose k}$$
which matches with the LHS.
\end{description}
This completes the argument.
\end{proof}
Use a similar argument to do the following:


\begin{exercise-prob}[See Problem Set 1~(Problem \ref{double-counting})]
\begin{show-ps1}{double-counting}
Use a double counting argument to establish the following identity : \\
$$ \sum_{m=k}^{n-k} {m \choose k} {n-m \choose k} = {n+1 \choose 2k+1} \textrm{ ~~~where~~ $0 \le k \le \frac{n}{2}$}
$$
Generalize the idea to prove :
$$ \sum_{j=r}^{n+r-k} {j-1 \choose r-1} {n-j \choose k-r} = {n \choose k} \textrm{ ~~~where~~ $1 \le r \le k$}
$$
\end{show-ps1}
\end{exercise-prob}