The problem above could be solved by padded decomposition. A \((\beta,\Delta)\)-padded decomposition of \(G = (V,E,w)\) is a stochastic decomposition of \(V\) into a set \(\mathcal{P}\), clusters of diameter at most \(\Delta\) such that for every \(v\in V\), the probability that the ball of radius \(\gamma\Delta\) from \(v\) is entirely contained in the cluster of \(v\), denoted by \(\mathcal{P(v)}\) is small. More precisely:
\[\mathrm{Pr}[B_G(v,\gamma \Delta)\subseteq \mathcal{P(v)}] \leq e^{-\beta\gamma} \quad \text{for every }\gamma\ll 1\]In padded decomposition, our goal is to minimize \(\beta\), called the padding parameter. It is not hard to see that a \((\beta,\Delta)\)-padded decomposition gives a solution for the clustering problem with the same value \(\beta\). Furthermore, padded decomposition has applications to other problems, such as uniform sparsest cut, flow-cut gap, multicut, embedding into \(\ell_{\infty}\).
For general graph, Bartal construct a padded decomposition with \(\beta = O(\log n)\). The famous KPR theorem implies that \(\beta = O(r^3)\) for \(K_r\)-minor-free graphs. This result was then improved by a sequence of papers to \(\beta = O(r)\). (See here for a simple proof of the KPR theorem by James Lee.) A long-standing open problem is:
Open Problem: Can we construct a padded decomposition with a padding parameter \(\beta = O(\log r)\) in \(K_r\)-minor-fre graphs?
While the problem is very difficult, there are two important special cases that we could try to solve: (1) graphs embedded on a surface of genus \(g\) and (2) graphs of treewidth at most \(k\). Why these two special cases? Because they are building blocks of the Robertson-Seymour decomposition of \(K_r\)-minor-free graphs. (Though it does not seem that one could resolve the one problem above by using the Robertson-Seymour decomposition as the dependency on \(r\) in the decomposition is galactic.)
The special case (1), graphs with genus \(g\), was studied by several papers, achieving better and better bounds, from \(\beta = 2^{O(g)}\) to \(O(g^2)\) and finally \(O(\log g)\) by Sidiropoulos.
Our recent preprint solves the special case (2), graphs with treewidth \(k\), achieving \(\beta = O(\log k)\). Prior to our work, there was no progress on special case (2); the best known bound was \(\beta = O(k)\), which follows from the bound of \(K_r\)-minor-free graphs.
About the techniques. In a STOC 23 paper, a subset of us — Tobias Friedrich, Davis Issac, Nikhil Kumar, Nadym Mallek, and Ziena Zeif— designed a clever ball carving procedure to solve the multicut problem on graphs with treewidth \(k\). This paper shows that multiflow-multicut gap is \(O(\log k)\). We then realize that the technique of the STOC 23 paper could be interpreted as constructing a certain kind of net, called tree-ordered net.
A (classical) \(r\)-net is a subset \(N\) of vertices such that any two vertices in \(N\) has distance at least \(r\) (packing), while any vertex not in \(N\) has distance at most \(r\) from a vertex in \(N\) (covering). A tree-ordered net one could associate the vertex set \(V\) with a partial order represented by a tree \(T\) such that for two vertices \(x,y\) in \(V\), \(x\preceq y\) if \(x\) is a descendant of \(y\) in a tree \(T\). Loosely speaking, on a \((\tau,r)\)-tree-ordered net \(N\), the covering property is: every vertex \(v\) has an ancestor \(x\in N\) such that the distance to \(x\) is at most \(r\). The covering property is: for every \(v\), the number of ancestors at a distance \(O(r)\) is at most \(\tau\).
The technique of the STOC 23 paper could be used to construct a \((\tau,\Delta)\)-tree-ordered net with \(\tau = \mathrm{poly}(k)\) for graphs with treewidth \(k\). (Well, to be more precise, we are only able to construct the net for graphs with tree-partition width \(k\); there is a subtle difference between treewidth and tree-partition width, which readers should refer to the manuscript for details.)
We observe that if one can construct a \((\tau,\Delta)\)-tree-ordered net, then one can construct a \((\beta,\Delta)\)-padded decomposition with \(\beta = O(\log \tau)\). Combining this with the aforementioned tree-ordered net result, we obtain \(\beta = O(\log k)\).
]]>Given an edge-weighted tree \(T\) and a subset of terminals \(K\subseteq V(T)\), a non-Steiner tree cover with stretch \(\alpha \geq 1\) is a collection of trees \(\mathcal{T}\) such that:
Figure 1: A Steiner-free tree cover with 2 trees for \(T\) on the left. The distortion is \(\alpha = 4/3\) realized by the vertex pair \({e,g}\).
The goal is to construct a non-Steiner tree cover with a small size (number of trees) and small stretch \(\alpha\) for any given tree \(T\) and the terminal set \(K\).
The tree \(T\) can be seen as a Steiner tree for the terminal set \(K\): we often refer to vertices in \(T\setminus K\) as Steiner points. In some applications, we want to remove the Steiner points while preserving (approximately) the tree structure and the distances between terminals. One way to achieve this goal is a non-Steiner tree cover: no Steiner vertices, having a few trees (approximately preserving the tree structure), and low stretch (approximately preserving distances).
The discussion above may remind you of the Steiner point removal problem introduced by Gupta: given an edge-weighted tree \(T\), a subset of terminals \(K\subseteq V(T)\), construct another tree \(X\) that only contains \(K\) and \(\delta_T(t_1,t_2)\leq \delta_X(t_1,t_2) \leq \alpha\cdot \delta_X(t_1,t_2)\) for every two terminals \(t_1,t_2\). The goal is to minimize \(\alpha\). The solution to the Steiner point removal problem can be seen as a non-Steiner tree cover with a single tree. Here, we allow more than one tree and hope to achieve a smaller stretch \(\alpha\). More generally, we are interested in understanding the trade-off between two parameters: stretch and the number of trees.
For a single tree case (Steiner point removal problem), Gupta achieved stretch \(\alpha = 8\), and a matching lower bound of \(8-o(1)\) was shown later by Chan, Xia, Konjevod, and Richa. There is a much easier lower bound for \(\alpha \geq 2\): the star graph. Specifically, \(T\) is a star with terminals being the leaves, and every edge has weight 1. For this case, the best solution with stretch \(2\) is to construct a star rooted at a terminal with leaves being other terminals.
The star graph also points to a simple lower bound for non-Steiner tree cover: for \(\alpha < 2\), the number of trees must be \(n - 1 = \Omega(n)\) where \(n\) is the number of terminals. But for the star graph, a single tree achieves \(\alpha = 2\). Can we achieve stretch \(\alpha = 2\) with a constant number of trees? If not, with a constant number of trees, could we break the stretch \(8\) known for a single tree? In trying to understand these questions, we stumbled upon a surprising answer: for stretch \(2\), \(\Omega(\log n)\) trees are both necessary and sufficient. Furthermore, with a constant number of trees (depending on \(\epsilon\)), one can achieve stretch \(2+\epsilon\) for any constant \(\epsilon \in (0,1)\). This provides a sharp transition around stretch \(2\).
Theorem: Let \(\alpha\geq 1\) be the stretch parameter, and \(\epsilon\in (0,1)\) be any given constant. Let \(T\) be an edge-weighted tree and \(K\subseteq V(T)\) be any set of \(n\) terminals:
A few words about technical ideas: Item 3 is relatively straightforward: the lower bound was discussed above, and the upper bound is by taking a single-source shortest path (SSSP) tree rooted at each terminal. (By single-source shortest path tree rooted at a terminal \(t\), I mean the star obtained by connecting every other terminal \(t’\) to \(t\) via a direct edge with weight \(d_T(t,t’)\).)
For stretch \(2\), the upper bound is obtained by recursively taking the terminal \(t\) closest to the centroid vertex of \(T\) and making an SSSP tree rooted at \(t\). The recursion depth is \(O(\log n)\), giving \(O(\log n)\) trees. Proving the lower bound, that one needs \(\Omega(\log n)\), is more complex. Indeed, we proved a stronger lower bound. We show that there exists a tree \(T\) and \(n\) terminals such that any graph with stretch \(2\) that contains only the terminals must have \(\Omega(n \log n)\) edges. As the union of a non-Steiner tree cover with \(k\) trees is a graph with \(O(n\cdot k)\) edges, we get \(k = \Omega(\log n)\). The lower bound instance \(T\) is a comb graph (with appropriate edge weights), and the terminals are the leaves of \(T\).
For stretch \(2+\epsilon\), the construction of the tree cover is rather complicated. The basic idea is to root the tree \(T\), chop it into multiple layers, and appropriately group these layers. In the paper, we show how one can achieve stretch \(3+\epsilon\) with just chopping and grouping, and then we show how to improve the stretch to \(2+\epsilon\) with more aggressive grouping. If you find this vague but intriguing, check out the paper.
]]>The study of tree cover for general graph metrics dates back to 1992, starting from the paper by Awerbuch and Peleg on communication-space trade-off in routing; refer to a short survey on tree conver in the introduction of our paper for more detail.
Tree covers have many algorithmic applications, but my favorite is in constructing an approximate distance oracle, which got me to think more about tree covers in the first place. Say we want to construct an approximate distance oracle for a graph \(G\) with a stretch factor (i.e., distance error) of \(t\), we first construct a tree cover with stretch \(t\) for (the shortest path metric of) \(G\). Then, we construct an exact distance oracle for each tree, that can be reduced to LCA data structure. To answer a distance query between two vertices \(x\) and \(y\), all we have to do is go through each tree in the cover, query the \(x\)-to-\(y\) distance in the tree, and then return the minimum distance among all the trees. The number of trees in the cover will dictate the space of the oracle and the query time; more precisely, \(O(n \lvert{\mathcal T}\rvert)\) space and \(O(\lvert{\mathcal T}\rvert)\) query time. The distance approximation factor is the stretch of the tree cover.
As we have seen, in distance oracles as well as other applications of tree cover, the stretch and the number of trees will be important parameters, and determining the precise trade-off between the two parameters is the most fundamental question.
I am personally interested in constructing tree covers for special metric spaces, such as Euclidean, planar and minor-free, and doubling. In these metrics, the stretch can be made \((1+\epsilon)\) for any \(\epsilon \in (0,1)\), which makes tree covers attractive for various applications. The main question is: Can the number of trees be independent of \(n\), the number of points? It turned out that in all these settings, the number of trees can be made independent of \(n\). (This fact was known for Euclidean metrics a long time ago but only recently known for doulbing, planar and planar metrics. )
Then the next question is: What is the precise dependency on \(\epsilon\) of the number of trees? In the recent preprint, we answered this question.
Theorem: The number of trees is \(\Theta(\epsilon^{1-d})\) for non-Steiner tree covers and \(\Theta(\epsilon^{(1-d)/2})\) for Steiner tree covers for point sets in \(\mathrm{R}^d\).
In \(\mathrm{R}^2\), the number of non-Steinter trees is \(O(1/\epsilon)\) while the number of Steiner trees is \(O(1/\sqrt{\epsilon})\). But what is the difference between non-Steiner and Steiner?
Recall in the definition (of dominating property) that we only require \(X\subseteq V(T)\). If \(X = V(T)\) for every tree \(T\), then we get a non-Steiner tree cover, and if we allow \(V(T)\setminus X \not=\emptyset\), we get a Steiner tree cover since \(T\) contains (Steiner) points that are not in \(X\). Things get a bit complicated when we consider Steiner points in \(V(T)\setminus X\): they could be anything. But here and in our paper, we restrict Steiner points to those in the ambient metric space \((M,\delta_M)\).
In the rest of the post, I will say a few words about the technique.
For the lower bound of \(\Omega(\epsilon^{1-d})\) trees for non-Steiner tree covers, the observation is that by taking the union of all the trees in the cover, we obtain a \((1+\epsilon)\)-spanner with \(\lvert{\mathcal T}\rvert n\) edges. On the other hand, there is a known lower bound of \(\Omega(\epsilon^{1-d})n\) on the number of edges of \((1+\epsilon)\)-spanners. Therefore, one gets \(\lvert{\mathcal T}\rvert = \Omega(\epsilon^{1-d})\). The same argument applies to get the lower bound for Steiner tree covers, but this time, we use known lower bounds for Steiner \((1+\epsilon)\)-spanners.
For the upper bound, the essential idea is to reduce to the construction of a tree cover for far points: given a point set \(P\) with diameter \(D\) and a constant \(\mu\), construct a tree cover that only needs to preserve (up to \(1+\epsilon\) factor) pairs whose distances are in \([D/\mu, D]\). (We call this type of tree cover a partial tree cover). Our reduction is based on the shifting technique in a beautiful paper by Chan.
Once the reduction is in place, we only need to construct a partial tree cover. If we are allowed to use Steiner points, then the construction is relatively simple: we basically divide the bounding box of \(P\) into vertical and horizontal slabs of width \(\Theta(D)\), and then for any two pairs of slabs, place Steiner points in the middle. The construction of non-Steiner tree covers is significantly more involved; the introduction of our paper has a high-level overview of the proof. We also obtain a substantially stronger result than what is stated in the theorem above: every tree in our non-Steiner tree cover has a degree \(O(1)\), an absolute constant that does not depend on \(d\) and \(\epsilon\).
]]>A unit disk graph with \(n\) vertices is the intersection graph of \(n\) disks on the plane: two vertices have an edge if their two corresponding disks intersect. Here is an example of a unit disk graph; the figure is by David Eppstein on wikipedia:
A unit disk graph could have \(\Omega(n^2)\) edges and hence we often work with its disk representation that only has \(O(n)\) space, hoping to also design algorithms with \(O(n\mathrm{polylog}(n))\) time. For example, single-source shortest path in unit disk graphs can be solved in \(O(n\log n)\) time. Could we design an algorithm for computing diameter with the same running time? This problem currently seems out of reach. An easier problem is to compute the diameter of unit disk graphs in truly subquadratic time. This turns out to be a long-standing open problem.
Open Problem: Can the diameter of \(n\)-vertex unit disk graphs be computed in \(O(n^{2-\epsilon})\) time for any constant \(\epsilon > 0\)?
Our recent paper does not resolve this problem, but makes non-trivial progress. We showed that:
We also obtained several related results, such as computing the diameter of similar-size pseudo-disk graphs and approximate distance oracles; do check out the paper for details.
Note that there were several negative results. The analog of unit disk graphs in higher dimensions is unit ball graphs. However, in dimension 3 or above, computing diameter cannot be computed in truly subquadratic time, assuming the Orthogonal Vector (OV) conjecture by this paper. They also showed that the intersection graphs in dimension 2 of several other types of objects, such as congruent equilateral triangles or axis-parallel line segments, do not have truly subqudratic time algorithm for diamter, assuming OV conjecture. So, something is very special about unit disk graphs.
A key new idea in our paper is VC dimension. VC dimension has been used to compute diameter of planar graphs, and minor-free graphs, including my own work in SODA24 along this line. There are several different ways to define VC set systems, but two of them are of special interest here.
Distance VC dimension: The ground set is the vertex set \(V\) of the graph, and the family of subsets \(\mathcal B\) are balls.
\[\mathcal B = \{B_r(v): v\in V, r\in \mathbb{R}^+\}\]Here \(B_r(v) = {u: d_G(u,v)\leq r}\) is the ball of radius \(r\) centered at \(v\). We show that:
Theorem 1: If \(G\) is an intersection graphs of pseudo-disks, then \((V, \mathcal B)\) has VC dimension at most 4.
Our result holds for intersection graphs of pseudo-disks of any size, which vastly generalize unit-disk graphs. The proof uses a topological argument. And this proof is the most intersecting part of the paper in my opinion.
An independent result by Duraj, Konieczny, and Potępa showed the same VC dimension bound for a special case of unit disk graphs, and, in general, geometric intersection graphs of objects that are closed, bounded, convex, and center symmetric. Their proof is different and based on geometry.
Distance encoding VC dimension: The defintion is somewhat hard to parse.
Definition: Let \(M\subseteq \mathbb{R}\) be a set of real numbers. Let \(S={s_0, s_1, \ldots, s_{k-1}}\) be a sequence of \(k\) vertices in an undirected weighted graph \(G=(V, E)\). For every vertex \(v\) define
\[X(v) = \{ (i, \Delta) : 1\leq i \leq k-1, \Delta \in M, d(v, s_i)-d(v, s_0)\leq \Delta \}.\]\(\mathcal{LP} = {X(v) : v\in V}\) be a set of subsets of the ground set \([k-1]\times M\).
It’s a mouthful. Intuitively, the set \(X(v)\) captures the distance from \(v\) to vertices in \(S\) compared to the distance to \(s_0\). One can think of \(X(v)\) as encoding the distance vector from \(v\) to vertices in \(S\): each pair \((i, \Delta)\) tells us the (approximate) distance from \(v\) to \(s_i\) relative to the distance from \(v\) to \(s_0\). A variant of this set system was introduced for planar graphs in a beautiful paper of Li and Parter. The one defined above is a slight modification of the set system of Li and Parter, which appeared in my paper with Christian. This paper showed that the set system has bounded VC dimension in minor-free graphs. Here, we show that the VC dimension remains bounded in pseudo-disk graphs. The proof is almost the same as the proof of Theorem 1.
Theorem 2: If \(G\) is an intersection graphs of pseudo-disks, then \(([k-1]\times M, \mathcal{LP})\) has VC dimension at most 4.
There are two different techniques for computing diameter using VC dimension. The first technique constructed a spanning path with a low stabbing number using the distance VC dimension (the system of balls). This technique was introduced by Ducoffe, Habib, and Viennot to compute the diameter of graphs with sublinear separators. The second technique combines the distance encoding VC dimension and \(r\)-division. This technique was used to compute the diameter of minor-free graphs.
Here, we follow the second technique, using the distance encoding VC dimension. Since unit disk graphs do not have \(r\)-division, we develop an analogous version called clique-based \(r\)-clustering. The error +2 is due to the clique-based \(r\)-clustering step. If one somehow eliminates the clique-based \(r\)-clustering, then one solves the open problem mentioned above.
]]>The constant \(\alpha\) is called the distortion of \(M\). We want to construct \(M\) such that \(\alpha\) is as small as possible, ideally \(\alpha = 1\). We call \(M\) the distance preserving minor. Since \(M\) only contains the terminals, by constructing \(M\), we effectively remove Steiner points from \(G\) and hence the name Steiner Point Removal (SPR).
We have the freedom to set the weights of the edges of \(M\) arbitrarily as long as the terminal distances are not contracted. In all known constructions, we often set the weight of each edge \((t_1,t_2) \in E(M)\) to be \(d_G(t_1,t_2)\).
So, what is a minor? The formal definition is: a minor of \(G\) is a graph obtained from \(G\) by deleting vertices, deleting edges, and contracting edges. However, this formal definition is often not useful in constructing a distance-preserving minor. A more intuitive way to think about minor is: every terminal \(t\in K\) has a corresponding subgraph \(H_t\) in \(G\) such that (a) all the graphs \(\{H_t: t\in K\}\) are pairwise vertex-disjoint and (b) there exists an edge \((x,y) \in E(M)\) if there is an edge \((u,v)\) in \(G\) connecting a vertex \(u\in H_x\) and a vertex \(v\in H_y\).
Figure 1: (a) Four terminals in a graph, (b) the disjoint subgraphs associated with each terminal, and (c) a distance-preserving minor. The distortion is \(4/3\) realized by the terminal pair \((t_2,t_3)\).
Why a distance-preserving minor? There are several ways one could construct a graph preserving distances, such as those studied in spanners and emulators literature. However, these graphs often do not preserve structural properties of \(G\): for example, if \(G\) is planar, then these graphs might not be planar. A distance-preserving minor aims to preserve both distances and minor structures: if \(G\) is planar or excludes a fixed minor, then \(M\) also has the same property.
Gupta was the first to study the problem of removing Steiner points in trees: given a set of terminals \(K\) in a tree \(T\), find another tree \(T_K\) spanning the terminals such that the terminals distances in \(T_K\) are approximately the same as the terminal distances in \(T\) up to a small distortion \(\alpha\). Gupta showed that \(\alpha = 8\) is achievable for trees and showed a lower bound \(\alpha\geq 4(1-o(1))\). Chan, Xia, Konjevod, and Richa observed that the tree \(T_K\) in Gupta’s construction is indeed a minor of \(T\), and further improved the lower bound of Gupta to \(\alpha\geq 8(1-o(1))\), matching the upper bound.
Since preserving the minor structure is central in distance-preserving minor, a natural question is:
Question 1: Does every \(K_r\)-minor-free graph for any fixed \(r\) admit a distance-preserving minor with a constant distortion?
This question was raised in the work of Chan, Xia, Konjevod, and Richa as well as an unpublished manuscript by Basu and Gupta.
One could ask same question for general graphs:
Question 2: What is the smallest distortion achieved for the SPR problem in general graphs?
Both questions have attracted significant research interest over the last decade. Until recently, Question 1 remained wide open: no constant upper bound, even for planar graphs. On the other hand, there has been steady progress on the upper bound of Question 2. Filtser showed the distortion \(O(\log \lvert K\rvert)\) for general graphs. However, the best lower bound remains 8 in both cases.
In this post, I am happy to report very recent progress on both questions. Our recent paper resolved Question 1 positively, while Chen and Tan showed the first non-constant lower bound of \(\tilde{\Omega}(\sqrt{\log \lvert K\rvert})\) for Question 2, narrowing the gap with the upper bound of \(O(\log \lvert K\rvert)\) by Filtser. Coincidentally, both papers will appear in SODA 2024.
SPR in minor-free graphs is one of my favorite problems in grad school. I could not make any significant progress back then despite investing a non-trivial amount of time. An important intermediate step to Question 1 is planar graphs. This problem remains very difficult: we only know the answers for very special cases of planar graphs. Basu and Gupta showed a constant distortion for outer-planar graphs, in which every vertex is on the outerface. More than a decade later, Hershkowitz and Li solved the problem for series-parallel graphs. Series-parallel graphs are planar graphs of treewidth 2.
I largely forgot about the SPR problem in minor-free graphs due to the previous futile attempt. In recent years, I focused more on developing geometric techniques for designing algorithms in planar and minor-free graphs, as described in my recent talk. One key problem in this direction is the tree cover problem: covering planar metrics by as few trees as possible. Together with amazing co-authors — Hsien-Chih Chang, Jonathan Conroy, Lazar Milenkovic, Shay Solomon and Cuong Than— we are able to show that \(O(1)\) trees suffice; the paper will be presented in upcoming FOCS 23.
How does tree cover relate to SPR? To solve the tree cover problem, we developed a new kind of partition that we dubbed shortcut partition. Intuitively, a shortcut partition is a partition of the vertex set into clusters such that for every two vertices \(u\) and \(v\), there exists a low-hop path in the cluster graph connecting the cluster containing \(u\) and the cluster containing \(v\). The inspiration for the shortcut partition is the scattering partition introduced by Filtser. We could show that a shortcut partition exists in planar graphs, while it remains an open problem to construct a scattering partition. In the same paper introducing scattering partition, Filtser showed that a good scattering partition implies a good solution for the SPR problem. This is our starting point, and thanks to Hsien-Chih Chang and Jonathan Conroy, who suggested that shortcut partition may be used in place of scattering partition in the SPR problem. This turned out to be true, and we resolved the SPR problem in planar graphs.
At this point, it is clear to us that to resolve Question 1 fully, we only need to construct a shortcut partition for minor-free graphs. And this is exactly what we did in our latest paper mentioned above.
In the tree cover paper, we constructed a shortcut partition for planar graphs by working with a planar embedding, starting from the outer surface and working toward the inner part of the graph. For minor-free graphs, we have to get around this “planarity barrier”, and we do so by constructing a buffered cop decomposition, a variant of the cop decomposition of minor-free graphs. Our decomposition is inspired by the work of Abraham, Gavoille, Gupta, Neiman, and Talwar. The final shortcut partition construction is rather delicate, and the heavy lifting is largely due to Jonathan Conroy.
Before closing this section, I would like to mention that the shortcut partition has several other algorithmic applications: distance oracles, tree covers, and embedding into bounded treewidth graphs. Exploring these applications should be the topic of another post. In the meantime, do check out our papers for details.
Kamma,Krauthgamer, and Nguyễn were the first to make a significant headway on SPR for general graphs: they showed that the distortion is \(O(\log^5 \lvert K\rvert)\). Their result was later simplified and improved to \(O(\log^2 \lvert K\rvert)\) by Cheung. Filtser then improved the distortion to \(O(\log \lvert K\rvert)\).
Distortion \(O(\log \lvert K\rvert)\) seems to be the right answer for Question 2; problems that are somewhat related, such as padded decomposition or stochastic embeddings into trees, have logarithmic lower bounds. Of course, it is one thing to speculate on the lower bound; it is another thing to prove it formally. Until recently, the best distortion lower bound is \(8\), which holds for graphs as simple as trees. This is why the result of Chen and Tan is exciting: they showed a distortion lower bound \(\tilde{\Omega}(\sqrt{\log \lvert K\rvert})\). I take their result as a strong indication that the right answer to Question 2 is \(\Theta(\log \lvert K\rvert)\). What’s more interesting is that the proof of Chen and Tan is very clever and simple; the main argument is just about 3 pages. Any summarization of their argument will not be shorter than their full proof, so go ahead and read it.
I must give a big shout-out to Arnold Filtser, who has consistently worked on SPR on both general and minor-free graphs for years. Arnold’s series of papers inspire and pave the way for the recent development. There are two interesting open problems:
Nailing down the exact distortion for the SPR problem in planar graphs. We showed \(O(1)\) distortion, but the constant is big, up to thousands. The current best lower bound is 8. I will stick my neck out to conjecture that 8 is the right distortion bound. I am happy to see either a proof or a disproof.
Improving the lower bound of Chen and Tan to \(\Omega(\log k)\). This will completely resolve Question 2.
I missed several talks, mostly on the 1st and the last day of the workshop, due to travel constraints. The talks I attended were all excellent. Here I give my brief take on each talk, which is likely erroneous and incomplete.
The first two talks I attended were given by Rasmus Kyng. Rasmus gave two tutorials on the breakthrough maxflow result. The 1st tutorial was about formulating the framework of the interior point method (IPM) for solving maxflow and min-cost flow: write down an LP, choose an appropriate barrier and potential function, find the gradient, update the current solution, and repeat. The number of iterations is \(m^{1+o(1)}\), and the (amortized) time to perform the update in each iteration is \(m^{o(1)}\), leading to nearly linear time overall. One key step is to solve the undirected min-ratio cycle problem, a combinatorial problem, in each iteration. Thus, IPM can be seen as reducing solving a problem in directed graphs to undirected ones. The second tutorial is about designing a data dynamic structure for solving min-ratio cycle problems quickly in \(m^{o(1)}\) amortized time in each iteration. The basic idea is using a low-stretch spanning tree: given a spanning tree that has a low average stretch, there exists a fundamental cycle of the tree that is a good approximate min-ratio cycle. The data structure then has to maintain (a few) low-stretch spanning trees under updates on the edge lengths of the graph. Many interesting ideas go into the data structure; two major ideas are: maintaining a hierarchy of graphs (vertex sparsification) and dynamic spanners (edge sparsification). See part 1 and part 2 of Ramus talks.
The talk by Aaron Bernstein is a gentle introduction to matching sparsifiers. As the name suggests, a matching sparsifier of a graph is a sparse subgraph that approximately preserves the maximum matching. The talk started off with some possible definitions of a matching sparsifier and analyzed why they are too weak or too strong, despite being very intuitive. The talk suggested that a good definition should be robust, composable, easy to work with, and of course, having a good approximation guarantee. Aaron gave several definitions and analyzed their properties. I particularly like the flow of this talk: it progressively built up a good notion of a matching sparsifier. All in all, a great talk. Check out here.
The blackboard talk by Sepehr Assadi on Ruzsa-Szemeredi graphs was a hit. To quote Sepehr:
Ruzsa-Szemeredi Graphs (RS graphs henceforth) are a family of extremal graphs with a magical property: they are “locally sparse” and yet “globally dense.”
I’ve seen RS graphs popping up in a couple of algorithm papers, but never had a chance to see RS graphs “in action .”The talk of Sepehr Assadi taught me a great deal; see it here.
Sayan Bhattacharya and Soheil Behnezhad gave two different talks on matching in two interrelated settings: dynamic and sublinear, respectively. Matching has been studied for ages, and it is unsurprisingly to see so many different kinds of results on matching. Sayan’s talk gave visual guidance to navigate this jungle of results on dynamic matching. To quote Sayan, the big open problem in dynamic matching is:
Big Open Question: \((1+\epsilon)\)-approximation in \(O(\mathrm{poly}(\log n, 1/\epsilon))\) update time.
And it seems we are still very far from having the solution to the problem. Recent progress has made an intimate connection to sublinear algorithms for maximum matching, which is the topic of Soheil’s talk.
Soheil surveyed recent developments in estimating the size of the maximum matching in sublinear time in the adjacency list model. Classical algorithms assume the maximum degree of the graph \(\Delta\) is small and seek a running time depending only on \(\Delta\) and \(\epsilon\), the approximation error. But none of these algorithms has a subquadratic running time when \(\Delta = \Omega(n)\). Modern algorithms have sublinear time even when \(\Delta = \Omega(n)\). Check out the talk by Sayan and Soheil.
Hsien-Chih Chang and I talked about planar graphs. My talk focused on the similarity between the geometry of planar graphs and point sets on the Euclidean plane. A well-known fact is that shortest-path metrics of planar graphs do not embed well into the Euclidean plane (and higher dimensional Euclidean spaces.) But this is not the end of the story. In my talk, I described several positive results, illustrating the striking similarity between the geometry of planar graphs and the Euclidean plane: embeddings into small treewidth graphs, VC dimension, and tree cover. These results have been applied to solve various algorithmic problems in planar graphs.
Hsien-Chih gave a talk on our recent result joint with others: a tree cover of \(O(1)\) trees for planar metrics. The star of the dish is the new concept of shortcut partition. Hsien-Chih gave a brief intuition of the concept, which in some sense, attempts to capture (ir)regularities of planar graphs. We recently exploit the shortcut partition to solve the Steiner Point Removal (SPR) problem, as well as several other problems, in minor-free graphs. I hope to write more about this in the future. Check out my talk and Hsien-Chih’s talk.
Greg Bodwin’s talk is a gem, introducing the notion of bridge girth as a unification of network design. I saw a version of this talk before in the FOCS 22 metric embedding workshop I co-organized with Arnold Filtser, but it still interested me greatly the second time. There are a vast number of network design problems: spanners, distance oracles, emulators, hopsets, matching covers, and Steiner forest, to name a few. And they are all different. But somehow the mysterious bridge girth unifies them all. If you haven’t seen this talk, it’s time to check it out.
The last talk I was able to attend was given by Jan van den Brand on solving the multi-commodity flow problem with high accuracy. Here “high accuracy” means that one seeks a \((1+\epsilon)\)-approximation with a running time dependency of \(\mathrm{polylog}(1/\epsilon)\). Solving the 2-commodity flow problem in sparse graphs, exactly or approximately with high accuracy, is equivalent to linear programming. This hardness leaves a possibility of a faster algorithm for multi-commodity flows in dense graphs, which is the new result in Jan’s talk: an improved algorithm for dense graphs via graph-based techniques. The main bottleneck is the so-called finding heavy hitters in the \(k\)-commodity incidence matrix. The idea is to reduce the \(k\)-commodity incidence matrix to (graph) incidence matrix, then uses expander decomposition to solve the problem, hence the name graph-based techniques. Here is the link to the talk.
There are many interesting talks that I missed for no good reason. Fortunately, all the videos were posted online.
]]>To be more precise, I look for a proof that only requires the input graph excluding a fixed minor, \(K_5\) for example in the case of planar graphs. There are two proofs that I particularly like: one by Plotkin, Rao, and Smith [2] and another by Alon, Seymour, and Thomas [1], which work for any graph excluding a fixed minor. I will present both. The central concept is a \(K_h\)-minor model.
\(K_h\)-minor model: a \(K_h\)-minor model of a graph \(G\) is a collection of \(h\) vertex-disjoint connected subgraphs \({\mathcal K} = \{C_1,C_2,\ldots, C_h\}\) of \(G\) such that there is an edge between any two subgraphs. Parameter \(h\) is called the model size.
Figure 1: A graph (left) and its \(K_5\)-minor model (right).
A graph is \(K_h\)-minor-free if it does not have a \(K_h\)-minor model. The Kuratowski’s theorem implies that planar graphs are \(K_5\)-minor-free (and \(K_{3,3}\)-minor-free). Herein we assume that the input graph is \(K_h\)-minor-free, and think of \(h\) as a small constant.
A balanced separator of a graph \(G = (V,E)\) is a subset of vertices \(S\subseteq V\) such that every connected component of \(G\setminus S\) has size at most \(2n/3\). There is nothing special about the constant \(2/3\); any constant smaller than \(1\), for example \(4/5\), works. In some places of this post, we will be handwavy on the constant.
Basic ideas: We will maintain a minor model \({\mathcal K}\) of size \(r\) for some \(r\leq h-1\). At each step, we either increase the model size by \(1\) or find a subset of vertices to add to the separator. This could be achieved, say, by considering a BFS tree rooted at some vertex \(v\), say \(T_v\). There are two cases:
In the end, we find either a \(K_h\)-minor model, certifying that the input graph is not \(K_h\)-minor-free, or a balanced separator of small size. We will also add some vertices in the current minor model to the separator and hence, the fact that subgraphs of the minor model have a small size will help.
Figure 2: (a) Found a tree of small depth rooted at a vertex \(v\), then (b) add a new subgraph to the current \(K_3\)-model to obtain a \(K_4\)-model. (c) The tree rooted at \(v\) has large depth, then (d) find a layer \(X\) of small size to add to the separator.
A major subtlety in this basic strategy is to account for the size of the separator in the 2nd case, as the number of iterations could be very large. Two algorithms presented below have very different accounting techniques. Plotkin, Rao, and Smith [2] use charging: they charge the size of the separator (which basically is a BFS layer) to a smaller component after removing the separator, as the algorithm will recurse on the bigger component. Alon, Seymour, and Thomas [1] incorporated the separator directly to the minor model. The end result is that the algorithm by Plotkin, Rao, and Smith produces a larger separator by a factor of \(\sqrt{\log(n)}\) for a constant \(h\).
Let \(H\) be a subgraph of \(G\), and \(C\) be another subgraph of \(G\) such that \(V(C)\cap H = \emptyset\). We define \(N_H(C)\) as the set of vertices in \(H\) with neighbors in \(C\). For a subset of vertices \(X\subseteq V(G)\), denote by \(\kappa_G(X)\) the largest connected component of \(G\setminus V(X)\).
The algorithm, ShallowSeparator, accepts an input graph \(G\) and a parameter \(\ell\). It either outputs a \(K_h\)-minor model where every subgraph in the model has depth \(O(\ell \log(n))\) (and hence are shallow), or produces a separator of small size depending on \(\ell\). The basic ideas were described above. In particular, lines 5-7 implement case 1: adding a subgraph of low diameter to the current minor model \({\mathcal K}\). The condition \(C_v\cap N_G(C)\not=\emptyset \quad\forall C\in {\mathcal K}\) in line 6 guarantees that \(C_v\) has an edge to every subgraph currently in \({\mathcal K}\). Observe that \(C_v\) has size \(O(h \ell \log (n))\) as \(C_v\) includes at most \(\lvert{\mathcal K}\rvert\leq h-1\) paths of length at most \(2\ell \log(n)\) paths. (In the worst case \(C_v\) includes \(\lvert{\mathcal K}\lvert\) paths rooted at \(v\), each connect \(v\) to a subset \(N_G(C)\) for each \(C\in {\mathcal K}\).)
Lines 10-14 implement the second case: find a separator \(X\) of \(H\) of small size. As the algorithm continues on the larger component (line 13), we would like to charge the size of \(X\) to all vertices not in the largest component. The condition in line 11 guarantees that each vertex outside the largest component receives at most \(1/\ell\) charges, which then finally contributes a factor of \(O(n/\ell)\) to the size of the separator. As vertices in the largest components were not charged, we could charge to them in later iterations.
When we recurse on the largest component \(H\) in line 13, we only keep subgraphs in \({\mathcal K}\) that have edges to vertices \(H\); those that do not have edges in \(H\) should be removed from \({\mathcal K}\). The prodedure Trim\(({\mathcal K},H)\) does exactly that: remove subgraphs from \({\mathcal K}\) not having edges to \(H\).
ShallowSeparator\((G,\ell)\)
\(1.\) \({\mathcal K}\leftarrow \emptyset\), \(S\leftarrow \emptyset\), and \(H\leftarrow G\)
\(2.\) while \(V(H)\geq 2n/3\)
\(3.\) \(v\leftarrow\) arbitrary vertex in \(V(H)\)
\(4.\) \(T_{v}\leftarrow\)BFSTree\((v,H)\)
\(5.\) if \(\mathrm{depth}(T_v) \leq 2\ell \ln(n)\)
\(6.\) \(C_v\leftarrow\) minimal subtree of \(T_v\) s.t \(C_v\cap N_H(C)\not=\emptyset \quad\forall C\in {\mathcal K}\)
\(7.\) \({\mathcal K}\leftarrow {\mathcal K}\cup \{C_v\}\)
\(8.\) return \({\mathcal K}\) if \(\lvert{\mathcal K}\rvert = h\)
\(9.\) \(H\leftarrow \kappa_H(V(C_v))\)
\(10.\) else
\(11.\) Find \(X\) s.t \(0 < \lvert X\rvert \leq \frac{\lvert V(H)\rvert - \lvert\kappa_H(X)\rvert} {\ell}\)
\(12.\) \(S\leftarrow S\cup X\)
\(13.\) \(H\leftarrow \kappa_H(X)\)
\(14.\) \({\mathcal K}\leftarrow\)Trim\(({\mathcal K},H)\) \(\qquad \ll \text{remove subgraphs not adjaent to }H \gg\)
\(15.\) \(S\leftarrow S\cup (\cup_{C\in {\mathcal K}} V(C))\)
\(16.\) return \(S\)
Observe that, as long as the algorithm can find \(X\) in line 11, it eventually will terminate since in every step, the size of \(H\) is reduced by at least \(1\).
It is not clear why \(X\) exists in the above algorithm; we will return to this issue later. The important thing to keep in mind is the depth threshold of \(2\ell \log(n)\) in line 5. We now show that the separator has small size. The intuition of the proof was already discussed above. First, we show several properties of \(H\) and \({\mathcal K}\); one of which implies that \({\mathcal K}\) is a clique minor model in every step of the while loop.
Lemma 1: In every iteration of the while loop: (1) no subgraph in \({\mathcal K}\) shares a vertex with \(H\), (2) every subgraph in \({\mathcal K}\) has an edge to \(H\), and (3) every two graphs in \({\mathcal K}\) are vertex-disjoint and have an edge in \(G\) connecting them.
Proof: We prove all four claims by induction; initially, \({\mathcal K} = \emptyset\) and hence the lemma trivially holds. There are two places where \({\mathcal K}\) and \(H\) potentially change in each iteration: lines 7,9,13 and 14.
In line 7, we add \(C_v\) to \({\mathcal K}\). Line 6 and the induction hypothesis on claims (1) and (2) guarantee that (3) holds for \({\mathcal K}\) in the next iteration.
In lines 9 and 13, \(H\) is updated, and in both cases, \(H\) is vertex-disjoint from \({\mathcal K}\), and hence (1) holds in the next iteration.
Finally, in line 14, the Trim procedure guarantees (2).
We now bound the size of the separator.
Theorem 1: For any integer \(\ell\geq 1\), ShallowSeparator\((G,\ell)\) either returns a \(K_h\)-minor model of \(G\) where every subgraph in the model has diameter at most \(O(\ell \log n)\) or a balanced separator \(S\) such that: \(\lvert S\rvert = O\left(\frac{n}{\ell} + h^2\ell\log(n)\right)\)
Proof: Observe that if the algorithm returns \({\mathcal K}\) in line 8, claim (3) in Lemma 1 implies that \({\mathcal K}\) is a \(K_h\)-minor-model. Furthermore, every subgraph in \({\mathcal K}\) has diameter \(O(\ell \log n)\), as the depth of \(C_v\) in line 6 is \(O(\ell \log n)\). Henceforth, we assume that the algorithm does return a separator \(S\) in line 16; this means \(\lvert{\mathcal K}\rvert\leq h-1\) at every iteration.
Whenever we add \(X\) to the separator \(S\) in line 12, we can charge the size of \(X\) to vertices in \(H\setminus \kappa_H(x)\), each gets \(1/\ell\) charge. As we continue on \(\kappa_H(X)\) in line 13, each vertex will be charged at most once during the execution of the algorithm. Thus, the total number of vertices added to \(S\) in line 12 over the course of the while loop is at most \(n/\ell\).
Now we bound the vertices added in line 14. We define \(V({\mathcal K}) = (\cup_{C\in {\mathcal K}} V(C))\). Whenever we add a subgraph \(C_v\) to \({\mathcal K}\), \(C_v\) has size \(O(h\ell \log(n))\) as observed above, and furtermore, \(C_v\) will never change. Thus, in line 14, \(\lvert V({\mathcal K})\rvert = O(h^2 \ell \log(n))\), implying the bound on the size of the separator.
Finally, we show that \(S\) is balanced: every connected component of \(G[V\setminus S]\) has size at most \(2n/3\). Let \(H_0\) be \(H\) in the final iteration; \(V(H_0)\geq 2n/3\). In this iteration, we set \(H = \kappa_{H_0}(X)\) and that \(\lvert V(H)\rvert\leq 2n/3\). By induction, one could show that:
Claim 1: \({\mathcal K}\) and \(S\) (at any iteration) form a separator of \(H\). That is, the neighbor (in \(G\)) of every vertex \(v\in H\) is either in \(H\), \(V({\mathcal K})\) or \(S\).
Let \(S_0\) be \(S\) in the last iteration and before the update in line 12. The final separator will be \(S = V({\mathcal K})\cup S_0 \cup X\).
Let \(Z\) be any connected component of \(G[V\setminus S]\). If \(V(Z)\cap V(H_0) = \emptyset\), then \(\lvert V(Z)\rvert\leq n/3\) as \(\lvert V(H_0)\rvert\geq 2n/3\). If not, then by Claim 1, \(Z\) is a connected component of \(H_0\setminus X\). Since \(H\) is the largest component of \(H_0\setminus X\) and it has size at most \(2n/3\), \(\lvert V(Z)\rvert\leq 2n/3\) as desired.
By choosing \(\ell = \sqrt{n \log(n)}/h\), we obtain the following corollary:
Corollary 1: Any \(K_h\)-minor-free graph of \(n\) vertices admits a balanced separator of size \(O(h\sqrt{n \log n})\).
Note that the proof of Theorem 1 implicitly assumes that \(X\) in line 11 always exists, which we now show.
To get some intuition, consider the following simple algorithm: visit \(T_v\) layer by layer, starting from the root \(v\) (at layer \(0\)), and stop whenever we encounter a layer of small size: its size is at most \(1/\ell\) fraction of all previous layers. More formally, let \(Y_{j}\) be the set of all vertices at layer \(j\). Then we will stop at layer \(j^{\ast}\) where:
\[\lvert Y_{j^{\ast}}\rvert \leq \frac{1}{\ell}\sum_{j \leq j^{\ast}-1}\lvert Y_{j}\rvert \qquad (1)\]If there is no such \(j^{\ast}\), then this means every time we visit a layer \(j\), the total size of all layers from \(0\) to \(j\) grows by a factor of \((1+ \frac{1}{\ell})\) compared to the total size of all layers from \(0\) to \(j-1\). Thus, we have:
\[n \geq \sum_{j \leq \mathrm{depth}(T_v)}\lvert Y_{j}\rvert \geq \left(1+ \frac{1}{\ell}\right)^{\mathrm{depth}(T_v)}\]which gives \(\mathrm{depth}(T_v)\leq \ell\ln(n)\).
We could set \(X = Y_{j^{\ast}}\). If the largest component \(\kappa(X)\) belongs to levels larger than \(j^{\ast}\), that is \(\kappa(X)\subseteq (\cup_{j\geq j^{\ast}+1}Y_j)\), then we are done since \(\sum_{j \leq j^{\ast}-1}\lvert Y_{j}\rvert\leq \lvert V(H)\rvert- \lvert\kappa(X)\rvert\). However, this might not be the case. A simple fix is to find a layer \(j^{\ast}\) such that:
\[\lvert Y_{j^{\ast}}\rvert \leq \frac{1}{\ell}\sum_{j \leq j^{\ast}-1}\lvert Y_{j}\rvert \qquad \& \qquad \lvert Y_{j^{\ast}}\rvert \leq \frac{1}{\ell}\sum_{j \geq j^{\ast}+1}\lvert Y_{j}\rvert \qquad (2)\]Thus, regarless of whether the largest component \(\kappa(X)\) belongs to levels larger or smaller than \(j^{\ast}\), we always have \(\lvert X\rvert =\lvert Y_{j^{\ast}}\rvert\leq (\lvert V(H)\rvert- \lvert\kappa(X)\rvert)/\ell\).
To find \(j^{\ast}\) satisfying Equation (2), we simply check each layer of \(T_v\). We show that if we could not find such a layer \(j^{\ast}\), then the depth of \(T_v\) is small.
Lemma 2: If \(\mathrm{depth}(T_v)> 2\ell\ln(n)\), then there exists a layer \(j^{\ast}\) of \(T_v\) satisfying Equation (2).
Proof: We visit \(T_v\) layer by layer, starting from layer \(0\) (the root). We mark a layer \(j\) red if \(\lvert Y_{j}\rvert \geq \frac{1}{\ell}\sum_{t \leq j-1}\lvert Y_{t}\lvert\) and blue if \(\lvert Y_{j}\rvert \geq \frac{1}{\ell}\sum_{t \geq j+1}\lvert Y_{j}\lvert\). (If a layer could be marked both red or blue, we mark it red only.) Intuitively, whenever we encounter a red layer, then the set of vertices seen so far grows by a factor of \((1+1/\ell)\). In contrast, whenever we encounter a blue layer, the set of the vertices we have not seen so far reduces by a factor of \((1+1/\ell)\).
If there is an uncolored layer, then we found \(j^{\ast}\) satisfying Equation (2). Thus, we assume that every layer is colored. Our goal is to show that \(\mathrm{depth}(T_v)\geq 2\ell\ln(n)\).
First, we claim that there are at most \(\ln(n)/\ell\) red layers. Let \(Z_{j} = \sum_{t \leq j}\lvert Y_{t}\lvert\). For every two consecutive red layer \(a\) and \(b\) where \(b \geq a+1\), then
\[Z_{b} \geq (1+1/\ell)\frac{1}{\ell}\sum_{t \leq b}\lvert Y_{t}\rvert \geq (1+1/\ell) Z_{a}~.\]That is, every time we see a red layer, the size of all the layers up to the red layer increases by a factor of \((1+1/\ell)\), which implies that the number of red layers is at most \(\ell\ln(n)\).
By the same argument, one could show that the number of blue layers is at most \(\ell\ln(n)\): every time we see a blue layer, the number of vertices in the blue and higher layers is reduced by a factor of \((1-1/\ell)\). Thus, the number of blue layers is at most \(\ell\ln(n)\), implying the bound on the depth.
Remark: if one optimizes the constant, the size of the separator is \(n/\ell + 2(h-1)(h-2)\ell \ln(n)\). For planar graphs, \(h = 5\) and the size of the separator is \(2\sqrt{6n\ln(n)}\) for an appropriate choice of \(\ell\).
The Plotkin-Rao-Smith (PRS) algorithm maintains a model \({\mathcal K}\) where each subgraph in \({\mathcal K}\) is a tree of low depth. In every step, the algorithm either adds a subset of vertices to the (current) separator \(S\) or adds new vertices to \({\mathcal K}\). The final separator is the union of the current separator and all the vertices in the minor model \({\mathcal K}\); a charging argument is used to bound \(\lvert S\lvert\).
In contrast, Alon-Seymour-Thomas (AST) algorithm only maintains \({\mathcal K}\). Every subgraph \(C\in {\mathcal K}\) either has low depth ( of \(\ell\)) or has at most \(n/\ell\) neighboring vertices in the current largest component, or more formally, \(\lvert N_H(C)\rvert \leq n/\ell\) where \(H\) is the current largest component. The final separator contains either \(C\) or \(N_H(C)\), whichever has a smaller size. In the former case, \(C\) has size \(O(h\ell)\), and in the latter case, \(N_H(C)\) has size \(O(n/\ell)\). Thus, the size of the separator is \(O(h(h\ell + n/\ell)) = O(h^{3/2}\sqrt{n})\) for the choice of \(\ell = \sqrt{n/h}\).
The following lemma is the key to the algorithm whose proof we will delay. It is an interesting exercise though.
Lemma 2: Let \(\ell \leq 1\) be any parameter. Let \(A_1,A_,2\ldots, A_k\) be \(k\) subsets of vertices in \(V\). There exists either:
Intuitively, Lemma 2 means that either we find a tree of small size that connects every set \(A_i\) or we find a small separator that separates at least two sets among \(\{A_i\}_i\) into two different connected components. We note that \(\{A_i\}_i\) might not be vertex-disjoint.
Remark 1: A special case that helps understand Lemma 2 is when each \(A_i\) is a single vertex. In this case, we could simply find a BFS tree starting from a vertex \(v = A_1\). If \(d_G(A_i,v)\leq \ell\), then we simply truncate the tree at level \(\ell\) to get item (i). Otherwise, the separator is the level of smallest size among \(\ell\) lelvels from \(1\) to \(\ell\), which has size at most \(n/\ell\).
We need more notation to describe the algorithm formally. For a subset of vertices \(A\in V(H)\), let \(\mathrm{Reach}(A,H)\) be the set of all vertices that are reachable from \(A\) in \(H\); \(H\) might be disconnected.
Figure 3: (a) Found small separator \(X\), then (b) find a subgraph \(B\) in line 10 by extending the neighbor set of a subgraph \(C\in {\mathcal K}\), (c) extend \(C\) to include \(B\) and recurse on \(\kappa_H(B)\).
At every iteration, the algorithm applies Lemma 2 to the sets of neighbors \(N_H(C)\) of subgraphs \(C\) in the model \({\mathcal K}\); this is procedure TreeeOrSep\((H, \{N_H(C)\}_{C\in {\mathcal K}})\) in line 3. It will return a tree \(T\) of small size intersecting all these sets or a small separator \(X\) that separates at least two sets into two different components. In the former case, the algorithm will add \(T\) to the current model \({\mathcal K}\) and hence increase the size of \({\mathcal K}\) by \(1\). In the latter case, there will be a subgraph \(C\) such that its neighbor set \(N_H(C)\) is disjoint from the largest component \(\kappa_H(X)\). (Recall that \(\kappa_H(X)\) is the largest component of \(H\) after removing \(X\) from \(H\).) We then extend \(C\) in lines 10-11; the idea here is that after extending \(C\), \(N_H(C)\) will be a subset of \(X\) and hence \(\lvert N_H(C)\rvert\leq \lvert X\rvert \leq n/\ell\). See Figure 3.
Finally, we recurse on the largest component \(\kappa_H(B)\) in line 12 where \(B\) is the set of vertices we added to \(C\). (We do not recurse on \(\kappa_H(X)\) since we do not add \(X\) to the separator.) As discussed above, the algorithm either returns \(C\) if it has small size or its neighbors in the current largest component. Note that some subgraphs in \({\mathcal K}\) might not be adjacent to \(\kappa_H(B)\), and hence we remove these subgraphs (line 13).
ASTSeparator\((G,\ell)\)
\(1.\) \({\mathcal K}\leftarrow \emptyset\), and \(H\leftarrow G\)
\(2.\) while \(V(H)\geq 2n/3\)
\(3.\) \((T,X)\leftarrow\)TreeeOrSep \((H, \{N_H(C)\}_{C \in {\mathcal K}})\)
\(4.\) if \(\lvert V(T)\rvert \leq \ell k\)
\(5.\) \({\mathcal K}\leftarrow {\mathcal K}\cup \{T\}\)
\(6.\) return \({\mathcal K}\) if \(\lvert{\mathcal K}\rvert = h\)
\(7.\) \(H\leftarrow H \setminus V(T)\)
\(8.\) else
\(9.\) \(C\in {\mathcal K}\) be such that \(N_H(C)\cap \kappa_H(X) =\emptyset\)
\(10.\) \(B\leftarrow \mathrm{Reach}(N_C(H),H\setminus \{\kappa_H(X)\cup X\})\)
\(11.\) \(C\leftarrow C\cup B\)
\(12.\) \(H\leftarrow \kappa_H(B)\)
\(13.\) \({\mathcal K}\leftarrow\)Trim\(({\mathcal K},H)\)
\(14.\) return \(\cup\{\arg\min(\lvert V(C)\rvert, \lvert N_H(C)\rvert): C\in {\cal K}\}\)
We now show that ATS algorithm returns a small separator, or correctly certifies that \(G\) contains a \(K_h\)-minor.
Theorem 2: For any integer \(\ell\geq 1\), ASTSeparator\((G,\ell)\) either returns a \(K_h\)-minor model of \(G\) or a balanced separator \(S\) such that \(\lvert S\rvert \leq h^2\ell + \frac{nh}{\ell}\).
Proof: Let \(S\) be the returned separator. We focus on bounding the size of \(\lvert S\lvert\); the proof for its balance is similar to Theorem 1.
We show by induction that the model \({\mathcal K}\) and the graph \(H\) satisfy the following invariant during the execution of the algorithm:
Invariant: For every \(C\in {\mathcal K}\), either \(\lvert V(C)\rvert\leq (h-1)\ell\) or \(\lvert N_H(C)\rvert\leq n/\ell\).
The invariant clearly holds at the beginning since \({\mathcal K} = \emptyset\). Suppose that it holds at a current iteration in the while loop in line 2. In the next iteration, if we add a new subgraph \(T\) to \({\mathcal K}\), then \(T\) has size at most \(\lvert{\mathcal K}\rvert\ell \leq (h-1)\ell\). Otherwise, we extend a component \(C\) of \({\mathcal K}\), and we need to show that \(\lvert N_H(C)\rvert\leq n/\ell\) for the graph \(H\) in the next iteration.
Let \(Z = H\setminus \{\kappa(H\setminus X)\cup X\}\) be the subgraph of \(H\) obtained by removing \(X\) and the largest components \(\kappa_H(X)\); \(Z\) might be disconnected. Consider set \(B\) in line 10; see Figure 3. Observe that for every component \(Y\) of \(Z\) such that \(Y\cap N_H(C)\not= \emptyset\), every vertex of \(Y\) is in \(B\). Thus, \(N_H(B)\subseteq X\) and hence \(N_H(B\cup C)\subseteq X\); note that this remains true even when \(B = \emptyset\) since in this case \(N_H(C)\subseteq X\). Therefore, \(\lvert N_H(B\cup C)\rvert\leq \rvert X\rvert\leq n/\ell\) by Lemma 2, implying the invariant.
We now go back to bounding the size of the separator. The invariant means that \(\arg\min(\lvert V(C)\rvert, \lvert N_H(C)\rvert) \leq (h-1)\ell + n/\ell\). Since \(\lvert{\mathcal K}\rvert\leq h-1\), the separator has size at most \((h-1)((h-1)\ell + n/\ell) \leq h^2\ell + hn/\ell\) as claimed.
By choosing \(\ell = \sqrt{n/h}\), we obtain the following corollary:
Corollary 2: Any \(K_h\)-minor-free graph of \(n\) vertices admits a balanced separator of size \(O(h^{3/2}\sqrt{n})\).
We now return to the proof of Lemma 2.
Proof of Lemma 2: Construct \(k\) copies of \(G\), denoted by \(G_1,G_2,\ldots, G_k\). For each \(i\in [1,k-1]\), and each \(v\in A_i\), we connect \(v\)’s copy in \(G_{i}\) to its copy in \(G_{i+1}\). Let the resulting graph be \(\widehat{G}\). Let \(R\) be the copy of \(A_1\) in \(G_1\) and \(S\) be the copy of \(A_k\) in \(G_{k}\). See Figure 4.
Figure 4: (a) Graph \(G\) with three sets \(A_1,A_2,A_3\), (b) graph \(\widehat{G}\), the path \(\widehat{P}\) and its projectin in (c).
If \(d_{\widehat{G}}(R,S)\leq k\cdot \ell\), then there is a path \(\widehat{P}\) of length at most \(k\cdot \ell\) from a vertex \(x\in R\) to a vertex \(y\in S\) in \(\widehat{G}\). Let \(P\) be a projection of \(\widehat{P}\) in \(G\), which is defined as follows: if \((\widehat{u},\widehat{v})\in \widehat{P}\) where \(\widehat{u}\) and \(\widehat{v}\) are copies of two vertices \(u,v\) in \(G\), respectively, then we add \((u,v)\) to \(P\) (if \(\widehat{u}\) and \(\widehat{v}\) are copies of the same vertex, we do nothing.). Clearly \(P\) is a connected subgraph of \(G\) containing at most \(k\ell\) edges and such that \(P\cap A_i\not=\emptyset\). Then any spanning tree \(T\) of \(P\) will satisfy the lemma.
Otherwise, let:
\[L_{t} = \{\widehat{v}\in \widehat{G}: d_{\widehat{G}}(R,\widehat{v}) = t\}\]for every \(1\leq t\leq k\ell\). We refer to each \(L_{t}\) as a level. Observe that removing any \(L_{t}\) from \({\widehat G}\) will disconnect \(R\) from \(S\). Let \(L^{\star}\) be the level of minimum size; that is, \(\lvert L^{\star}\rvert= \min_{1\leq t\leq k\ell}\lvert L_{t}\rvert\).
Let \(X\) be the vertices of \(G\) corresponding to \(L^{\star}\). Then:
\[\lvert X \rvert \leq \lvert L^{\star}\rvert \leq \frac{\lvert V(\widehat{G})\rvert} {k\ell} = \frac{n}{\ell}\]We claim that no connected component \(C\subseteq G\setminus X\) intersects all \(A_i\). Suppose otherwise, then we form a path \(Q\) composing of \(k\) paths \(Q_1,Q_2,\ldots, Q_{k-1}\) as follows: \(Q_1\) is a path in \(C\) from an arbitrary vertex \(v\in A_1\) to another (arbitrary) vertex in \(A_2\), and for every other \(i\in [2,k-1]\), \(Q_i\) is a path in \(C\) from the endpoint of \(Q_{i-1}\) to an (arbitrary) vertex \(u \in A_{i+1}\). We then can map \(Q\) back to a path \(\widehat{Q}\) from a vertex in \(R\) to a vertex in \(S\) in \(\widehat{G}\) in a natural way. Moreover, \(Q\) does not contain any vertex of \(L^{\star}\), contradicting that removing \(L^{\star}\) from \(\widehat{G}\) disconnects \(R\) and \(S\).
[1] Alon, Noga, Paul Seymour, and Robin Thomas. “A separator theorem for nonplanar graphs.” Journal of the American Mathematical Society 3, no. 4 (1990): 801-808.
[2] Plotkin, Serge A., Satish Rao, and Warren D. Smith. “Shallow Excluded Minors and Improved Graph Decompositions.” In SODA 1994, pp. 462-470. 1994.
]]>A \(d\)-shortcut set of a directed graph \(G=(V,E)\) is a set of directed edges \(E_H\) such that (i) the graph \(H = (V,E\cup E_H)\) has the same transitive closure as \(G\) and (ii) for every \(u\not=v\in V\), if \(v\) is reachable from \(u\), then there is directed \(u\)-to-\(v\) path of at most \(d\) edges (a.k.a. hops).
Figure: Adding two shortcuts (the red dashed edges) reduces the hop distance from \(u\) to \(v\) to 4. For a shortcut set of linear size, DAG is the most difficult instance: one can shortcut any strongly connected graph to diameter \(2\) by \(n-1\) edges.
The shortcut set problem was introduced by Thorup. The main goal is to understand the trade-off between the hop diameter \(d\) and the size of the shortcut set \(E_H\). The most well-studied regime is fixing \(\lvert E_H\rvert = \tilde{O}(n)\), and then minimizing \(d\). (Here, \(n\) is the number of vertices.) We will call this special regime the shortcut set problem.
For the shortcut set problem, a folklore sampling gives \(d = O(\sqrt{n})\): sample each vertex with probability \(\tilde{O}(1/\sqrt{n})\) to get a set \(S\), and add all (permissible) directed edges between vertices in \(S\). The key insight for bounding the diameter is: For any shortest path \(\pi_G(u,v)\) from \(u\) to \(v\) with at least \(2\sqrt{n}\) edges, \(S\) will likely hit both the prefix and suffix of length \(\sqrt{n}\) of \(\pi_G(u,v)\). (This algorithm is attributed to Ullman and Yannakakis.) A long-standing open problem is:
Question 1: Is diameter bound \(\sqrt{n}\) optimal for shortcut sets of nearly linear size?
The answer is no due to the recent breakthrough by Kogan and Parter: using \(\tilde{O}(n)\) shorcuts, they reduced the diameter down to \(n^{1/3}\). Their result deservingly won the best paper award at SODA 22. A key conceptual contribution of their work is a shift in perspective: shortcutting by connecting vertices to paths of length roughly \(n^{1/3}\), instead of only making vertex-to-vertex connections as the folklore sampling. Find this intriguing? Read the paper or watch Merav’s talk or both. The paper has a bunch of other interesting results.
At this point, you might wonder: how far could one go about reducing the diameter? Thorup conjectured that the diameter could be reduced all the way down to \(\mathrm{poly}(\log(n))\) with a nearly linear number of shortcuts. This conjecture was disproved by Hesse, who was a graduate student at UMass Amherst at the time with our Neil Immerman :). Hesse constructed a directed graph with \(m = \Theta(n^{19/17})\) edges such that one has to use \(\Omega(mn^{1/17})\) shortcuts to reduce the diameter to below \(\Theta(n^{1/17})\). This lower bound has been improved further. Notably, Bodwin and Hoppenworth recently obtained a lower bound of \(\Omega(n^{1/4})\) on the diameter, which is very close to the upper bound of \(O(n^{1/3})\) by Kogan and Parter. This result leaves another fascinating open problem:
Open Problem: Closing the gap \(O(n^{1/3})\) vs. \(\Omega(n^{1/4})\) on the diameter for shortcut sets of nearly linear size.
Hopset is very similar to the shortcut set but applied to weighted graphs instead. More formally, a \(d\)-hopset set of a directed, weighted graph \(G=(V,E)\) is a set of directed, weighted edges \(E_H\) such that (i) the graph \(H = (V,E\cup E_H)\) has the same distance metric as \(G\) and (ii) for every \(u\not=v\in V\), there is a shortest \(u\)-to-\(v\) path of at most \(d\) edges in \(H = (V,E\cup E_H)\).
Clearly, the hopset problem is at least as hard as the shortcut set problem: any hopset is a shortcut set with the same diameter bound.
A related notion is approximate hopset, in which we are given an additional parameter \(\epsilon \in (0,1)\), and for every \(u,v\), we would like to have a \((1+\epsilon)\)-approximate \(u\)-to-\(v\) path of at most \(d\) edges. We allow \(d\) to depend on \(\epsilon\)—think of \(\epsilon\) as a small constant.
One could check that the folklore sampling also works for hopset (and hence for approximate hopset as well): by adding \(\tilde{O}(n \log n)\) edges, one could reduce the diameter bound to \(\sqrt{n}\). The same question arises: Is the \(\sqrt{n}\) bound on the diameter optimal for hopsets and approximate hopsets of nearly linear size?
For the approximate hopset, the same paper by Kogan and Parter provided a no answer: They constructed a nearly linear hopset with a diameter bound of \(n^{2/5}\). In follow-up work, Bernstein and Wein improved the diameter to \(n^{1/3}\), matching the known bound for shortcut set; watch the talk here.
How about the (exact) hopset? One might expect that the diameter bound should also be improved to \(n^{1/3}\). Bodwin and Hoppenworth recently showed that \(\sqrt{n}\) is the best one could do (up to some polylog). I personally find this result very surprising. How do they achieve their lower bound? Read their paper; the exposition of the ideas in the paper is so well written that any attempt to summarize better would be in vain.
Somehow, the exact hopset is strictly harder than the approximate hopset and the shortcut set. Could approximate hopset be strictly harder than shortcut set? Bernstein and Wein provided state-of-the-art bounds for approximate hopsets, matching those of shortcut sets. As far as I understand, their proof does not provide a black-box reduction. Is such a reduction possible? I am curious to know the answer.
]]>This post was inspired by my past attempt to track down the detail of the Halperin-Zwick algorithm. Halperin and Zwick never published their algorithm, and all papers I am aware of cite their unpublished manuscript [6]. The algorithm by Halperin-Zwick is a simple modification of an earlier algorithm by Pelege and Schäffer [8], which, according to Uri Zwick, is the reason why they did not publish their result. The spanner by Pelege and Schäffer [8] has stretch \(4k-3\) for the same sparsity. The idea of Halperin-Zwick algorithm was given as Exercise 3 in Chapter 16 of the book by Peleg [7].
First, let’s define spanners. Graphs in this post are connected.
\(t\)-Spanner: Given a graph \(G\), a \(t\)-spanner is a subgraph of \(G\), denoted by \(H\), such that for every two vertices \(u,v\in V(G)\):
\[d_H(u,v)\leq t\cdot d_G(u,v)\]Here \(d_H\) and \(d_G\) denote the graph distances in \(H\) and in \(G\), respectively. Graph \(G\) could be weighted or unweighted; we only consider unweighted graphs in this post. The distance constraint on \(H\) implies that \(H\) is connected and spanning.
Parameter \(t\) is called the stretch of the spanner. We often construct a spanner with an odd stretch: \(t = 2k-1\) for some integer \(k\geq 1\). Why not even stretches? Short answer: there is no gain in terms of the worst case bounds for even stretch [1].
Theorem (Halperin-Zwick): Let \(G\) be an unweighted graph with \(n\) vertices and \(m\) edges. Let \(k\geq 1\) be any given integer. There is an algorithm that runs in time \(O(m)\) and constructs a \((2k-1)\)-spanner of \(G\) with \(O(n^{1+1/k})\) edges.
It is often instructive to think about \(k=2\), i.e, constructing a \(3\)-spanner. And this is where we start.
Here we seek a \(3\)-spanner with \(O(n^{3/2})\) edges. There are two steps: clustering and connecting the clusters. Let’s focus on clustering first. The idea is to: construct a set of radius-1 clusters (a set of stars) that have at least \(\sqrt{n}\) vertices each. This implies that the number of clusters is \(O(\sqrt{n})\) and hence we can afford to add one edge from each vertex to each cluster. The remaining vertices induce a graph of at most \(O(n^{3/2})\); we can add all the edges.
The cluster can be constructed greedily; the pseudocode of the algorithm is given below. We use \(N_G(v)\) to denote neighbors of \(v\) in a graph \(G\).
Clustering\((G)\)
\(1.\) \({\mathcal C} \leftarrow \emptyset, \quad G_1\leftarrow G\)
\(2.\) while \(G_i \not= \emptyset\)
\(3.\) \(x\leftarrow\) an arbitrary vertex in \(G_i\)
\(4.\) \(C_x\leftarrow {x}\)
\(5.\) if \(\lvert N_{G_i}(x)\rvert \geq \sqrt{n}\)
\(6.\) \(C_v\leftarrow C_v\cup N_{G_i}(x)\)
\(7.\) \({\mathcal C} \leftarrow {\mathcal C}\cup {C_x}\)
\(8.\) \(G_{i+1}\leftarrow G_i\setminus C_v, \quad i\leftarrow i+1\)
\(9.\) return \({\mathcal C}\)
We call the vertex \(v\) in the cluster \(C_v\) in line 4 the center of the cluster. We use \(E(C_v)\) to the edges of \(G\) connecting \(v\) to other vertices in \(C_v\).
Observe that every cluster \(C\in {\mathcal C}\) has radius at most \(1\) and it has either at least \(\sqrt{n}\) vertices or exactly one vertex. We call \(C\) a heavy cluster if \(\lvert C \rvert\geq \sqrt{n}\), and a light cluster otherwise.
Observation 1: The number of heavy clusters in \({\mathcal C}\) is at most \(\sqrt{n}\).
To get a 3-spanner of \(G\), we simply add an edge from every vertex to each heavy cluster of \({\mathcal C}\), and an edge between every pair of light clusters. (Light clusters are singletons.)
3Spanner\((G)\)
\(1.\) \({\mathcal C} \leftarrow\)Clustering\((G)\)
\(2.\) \(H\leftarrow (V,\emptyset)\)
\(3.\) for each heavy cluster \(C\in {\mathcal C}\)
\(4.\) add \(E(C)\) to \(H\)
\(5\). for each vertex \(v \in N_G(C)\)
\(6.\) \((v,u)\leftarrow\) an arbitrary edge from \(v\) to \(C\)
\(7.\) add \((u,v)\) to \(H\)
\(8.\) add to \(H\) all edges between light clusters
\(9.\) return \(H\)
In line 5, we use \(N_G(C)\) to denote the set of neighbors of \(C\), which are vertices are not in \(C\) and having at least one edge to \(C\). The running time is clearly \(O(m)\).
Sparsity analysis. Note that \(E(C)\leq \lvert C \rvert-1\). Thus, the total number of edges added to \(H\) in line 4 over all iterations is at most \(n-1\). Furthermore, the number of edges added to \(H\) in the loop in line 5 is at most \(n\) and hence by Observation 1, the total number of edges added in lines 3-7 is \(O(n\sqrt{n}) = O(n^{3/2})\).
To bound the number of edges added in line 8, observe that, if we order light clusters by the order it is added to \({\mathcal C}\) in line 7 of algorithm Clustering, then each light cluster is incident to at most \(\sqrt{n}\) light clusters following it in the order. It follows that the total number of edges added in line 8 is \(O(n\sqrt{n}) = O(n^{3/2})\).
Stretch analysis. We show that \(d_G(u,v)\leq 3 d_H(u,v)\). By the triangle inequality, it suffices to show the inequality for every edge \((u,v)\) of \(G\). This means we have to show that \(d_H(u,v)\leq 3\). This inequality holds if \((u,v)\in E(H)\), and hence we only need to consider the case where at least one of \(u\) and \(v\) is in a heavy cluster.
Figure 1: (a) stretch-3 path for edge \((u,v)\) and (b) stretch-\((2k-1)\) path for edge \((u,v)\)
If \(u\) and \(v\) are in the same heavy cluster \(C\), then \(d_H(u,v)\leq 2\) and the stretch guarantee holds. Otherwise, let \(C_x\) be the heavy cluster centered at \(x\) containing \(v\), say. As \((u,v)\not\in H\), there must be another vertex \(w\in C_x\) such that \((u,w)\in H\) by the construction in line 6. Thus, the path \(u\rightarrow w\rightarrow x\rightarrow v\) is a path of length 3 in \(H\) between \(u\) and \(v\), as desired. See Figure 1(a).
The algorithm for constructing a \((2k-1)\)-spanner with \(O(n^{1+1/k})\) edges is somewhat similar to the stretch-3 case, but we will need a finer analysis. A key observation, which we also use in the 3-spanner construction, is that if we have a cluster, say \(C\), of radius \(k\), and a vertex \(v\in N_G(C)\), it suffices to keep only one edge from \(v\) to \(C\). Thus, as long as \(N_G(C)\) has at most \(n^{1/k}\lvert C \rvert\) vertices, we can add an edge from \(v\) to \(C\) for each \(v\in N_G(C)\); the average number of edges added per vertex of \(C\) is \(n^{1/k}\).
What if \(\lvert N_G(C) \rvert \geq n^{1/k}\lvert C \rvert\)? In this case, we simply grow \(C\) by adding all of its neighbors. How many times will it grow? At most \(k-1\) times, as every time \(C\) grows, its size increases by a factor of strictly larger than \(n^{1/k}\), and there are only \(n\) vertices in the graph.
The pseudocode of the algorithm is given below. The set \(A\) holds the edges between \(C\) and its neighbors described above. The rest is essentially the same as the clustering for stretch 3.
Clustering\((G,k)\)
\(1.\) \({\mathcal C} \leftarrow \emptyset, \quad A\leftarrow \emptyset, \quad G_1\leftarrow G\)
\(2.\) while \(G_i \not= \emptyset\)
\(3.\) \(x\leftarrow\) an arbitrary vertex in \(G_i\)
\(4.\) \(C_x\leftarrow {x}\)
\(5.\) while \(\lvert N_{G_i}(C_x)\rvert \geq n^{1/k} \lvert C_{x} \rvert\)
\(6.\) \(C_v\leftarrow C_x\cup N_{G_i}(C_x)\)
\(7.\) for each \(v \in N_{G_i}(C_x)\)
\(8.\) \((v,u)\leftarrow\) an arbitrary edge from \(v\) to \(C\)
\(9.\) add \((v,u)\) to \(A\)
\(10.\) \({\mathcal C} \leftarrow {\mathcal C}\cup {C_v}\)
\(11.\) \(G_{i+1}\leftarrow G_i\setminus C_v, \quad i\leftarrow i+1\)
\(12.\) return \(({\mathcal C},A)\)
Once we perform clustering, we only need to add the set \(A\) and the edges inside each cluster to the spanner.
Spanner\((G,k)\)
\(1.\) \(H\leftarrow (V,\emptyset)\)
\(2.\) \(({\mathcal C},A) \leftarrow\)Clustering\((G,k)\)
\(3.\) add \(A\) to \(H\)
\(4.\) for each cluster \(C\in {\mathcal C}\)
\(5.\) add \(E(C)\) to \(H\)
\(6.\) return \(H\)
Sparsity analysis. The number of edges added in the loop in line 4 is at most \(n-1\). Observe that for each cluster \(C_x\) added to \({\mathcal C}\) in line 6 of Clustering, the number of edges added to \(A\) in the loop in line 7 is at most \(n^{1/k}\lvert C_{x} \rvert\). Thus, \(\lvert A \rvert\leq n^{1/k}\sum_{C}\lvert C \rvert \leq n^{1+1/k}\). This implies that \(\lvert E(H) \rvert = O(n^{1+1/k})\).
Stretch analysis. Let \((u,v)\) be any edge of \(G\) such that \((u,v)\not\in H\). We need to show that \(d_H(u,v)\leq 2k-1\). Observe that:
Observation 2: Every cluster \(C_x\in {\mathcal C}\) has radius at most \(k-1\).
Proof: Every time the radius of \(C_x\) increases by \(1\), the size of \(C_x\) increases by a factor of strictly larger than \(n^{1/k}\) by the construction. Thus, after \(t\) rounds, \(n\geq \lvert C_{x} \rvert > n^{t/k}\), which gives \(t\leq k-1\).
Let \(C_x\) be the cluster containing \(v\). If \(u\in C_x\), then \(d_G(u,v)\leq 2\cdot (k-1)\). Otherwise, suppose w.l.o.g, that \(v\) is clustered before \(u\). Observe that \(u\in N_{G_{i}}(C_x)\) and hence an edge \((u,w)\) is added to \(A\), which is eventually added to \(H\). See Figure 1(b). Thus, the path consisting of an edge \((u,w)\), the shortest path from \(w\) to \(x\), and the shortest path from \(x\) to \(v\), is a path of length at most \(2(k-1)+1 = 2k-1\) between \(u\) and \(v\) in \(H\), as desired.
It is not hard to construct a class of graphs such that for any graph \(G\) of size \(n\) in the class, the Halperin-Zwick algorithm produces a \((2k-1)\)-spanner for \(G\) that has \(\Omega(n^{1+1/k})\) edges. Could we go below the bound \(\Theta(n^{1+1/k})\) on the number of edges (by a different algorithm, say)? The consensus seems to be no, though currently we do not have a definite answer.
Spanners have a tight connection to the girth of graphs; a graph has girth \(g\) if the shortest simple cycle in the graph has length \(g\).
Observation 3: Let \(H\) be a graph of girth \(2k+1\). Then any \((2k-1)\)-spanner of \(H\) must contain every edge of \(H\).
Observation 3 essentially says that any \((2k-1)\)-spanner of \(H\) must be itself. Thus, the question of the optimality of spanners reduces to: is there any graph with \(o(n^{1+1/k})\) edges and girth \((2k+1)\)? The Erdős’ Girth Conjecture implies that the answer is no.
Erdős’ Girth Conjecture [5]: For any \(n \geq 1\) and \(k\geq 1\), there exists a graph with \(n\) vertices of girth \((2k+1)\) that has \(\Omega(n^{1+1/k})\) edges.
Erdős stated a lower bound \(c_k\cdot n^{1+1/k}\) on the number of edges in the conjecture [5]; that is, the constant is allowed to degrade as \(k\) increases. The spanner literature often cites the stronger version above, where the constant remains the same for every \(k\). The Erdős’ Girth Conjecture is known to hold for a few small values of \(k\).
While Erdős’ Girth Conjecture remains wide open, we could ask: is it possible to construct a \((2k-1)\)-spanner that has girth at least \(2k+1\)? If yes, then the output spanner is (existentially) optimal regardless of the truth of Erdős’ Girth Conjecture.
It turns out that the following simple greedy algorithm, formally described in [2] and attributed to Marshall Bern, does the job: consider edges in increasing weight order and add an edge \(e\) to the current spanner if the distance between its endpoints in the spanner is larger than \((2k-1)w(e)\). The algorithm works for weighted graphs as well. It is an instructive exercise to show that the output graph is a \((2k-1)\)-spanner and has girth at least \(2k+1\).
The major downsize of the greedy algorithm is its running time: the current best known implemtation takes \(O(mn^{1+1/k})\) time. Even in unweighted graphs, to the best of my knowledge, the following problem remains open:
Open Problem: Construct a maximal subgraph of girth at least \((2k+1)\) in nearly linear time.
For an unweighted graph, a maximal subgraph of girth at least \((2k+1)\) is a \((2k-1)\)-spanner.
We have mentioned two algorithms for constructing a spanner. Another beautiful algorithm that I hope to cover in a future post is the randomized construction by Baswana and Sen [4]. A notable feature of the Baswana-Sen algorithm is that it can be implemented efficiently in both parallel and distributed models. The recent survey paper [1] contains almost all known algorithms for spanners and its sibling problems.
The concept of a spanner was formally introduced by Peleg and Schaffer [8], though its conception was much earlier. Peleg and Schaffer constructed a \((4k-3)\)-spanner with \(O(n^{1+1/k})\) edges by connecting every pair of clusters in the output of Clustering\((G,k)\) by an edge. This clustering procedure was due to Awerbuch [2].
[1] Ahmed, R., Bodwin, G., Sahneh, F. D., Hamm, K., Jebelli, M. J. L., Kobourov, S., and Spence, R. (2020). Graph spanners: A tutorial review. Computer Science Review, 37, 100253.
[2] Althöfer, I., Das, G., Dobkin, D., Joseph, D., and Soares, J. (1993). On sparse spanners of weighted graphs. Discrete \& Computational Geometry, 9(1), 81-100.
[3] Awerbuch, B. (1985). Complexity of network synchronization. Journal of the ACM (JACM), 32(4), 804-823.
[4] Baswana, S., and Sen, S. (2007). A simple and linear time randomized algorithm for computing sparse spanners in weighted graphs. Random Structures & Algorithms, 30(4), 532-563.
[5] Erdős, P. (1965). Extremal problems in graph theory. In Proceedings of the Symposium on Theory of Graphs and its Applications, page 29-36, 1963.
[6] Halperin, S., and Zwick, U. (1996). Unpublished manuscript.
[7] Peleg, D. (2000). Distributed computing: a locality-sensitive approach. Society for Industrial and Applied Mathematics.
[8] Peleg, D., and Schäffer, A. A. (1989). Graph spanners. Journal of graph theory, 13(1), 99-116.
]]>Tree Shortcutting Problem: Given an edge-weighted tree \(T\), add (weighted) edges to \(T\), called shortcuts, to get a graph \(K\) such that:
The goal is to minimize the product \(k \cdot \mathrm{tw}(K)\), where \(\mathrm{tw}(K)\) is the treewidth of \(K\).
Figure 1: An emulator \(K\) obtained by adding one edge to the tree \(T\) has hop bound \(3\) and treewidth \(2\). Compared to \(T\), the hop bound decreases by 1 while the treewidth increases by 1.
Such a graph \(K\) is called a low-hop and low-treewidth emulator of the tree \(T\). The parameter \(k\) is called the hop bound of \(K\). It is expected that there will be some trade-off between the hop bound and the treewidth. We are interested in minimizing \(k \cdot \mathrm{tw}(K)\). This product directly affects parameters in our application; see the conclusion section for more details.
For readers who are not familar with treeewidth, see here and here for an excellent introduction, and why treewidth is an interesting graph parameter.
Remark 1: The tree shortcutting problem is already non-trivial for unweighted trees; the shortcuts must be weighted though. Furthermore, for each edge \((u,v)\) added to \(K\), the weight of the edge will be \(d_T(u,v)\). Thus, in the construction below, we do not explicitly assign weights to the added edges.
A version of a tree shortcutting problem where one seeks to minimize the number of edges of \(K\), given a hop bound \(k\), was studied extensively (see 1,2,3,4, including my own work with others), arising in the context of spanners and minimum spanning tree problem. What is interesting there IMO is that we see all kinds of crazy slowly growing functions in computer science: for \(k = 2\), the number of edges of \(K\) is \(\Theta(n \log n)\); for \(k = 3\), the number of edges of \(K\) is \(\Theta(n \log\log n)\); for \(k = 4\), the number of edges is \(\Theta(n \log^* n)\); \(\ldots\) [too difficult to describe]; and for \(k = \alpha(n)\), the number of edges is \(\Theta(n)\). Here, as you might guess, \(\alpha(\cdot)\) is the notorious (one parameter) inverse Ackermann function. (The \(\Theta\) notation in the number of edges means there exist matching lower bounds.) I hope to cover this problem in a future blog post.
Now back to our tree shortcutting problem. Let \(n = \lvert V(T) \rvert\). There are two extreme regimes that I am aware of:
The two regimes might suggest that a lower bound \(k\cdot \mathrm{tw}(K) = \Omega(\log n)\) for any \(k\). In our recent paper [2], we show that this is not the case:
Theorem 1: There exists an emulator \(K\) for any \(n\)-vertex tree \(T\) such that \(h(K) = O(\log \log n)\) and \(\mathrm{tw}(K) = O(\log \log n)\).
Theorem 1 implies that one can get an emulator with \(k\cdot \mathrm{tw}(K) = O((\log \log n)^2)\), which is exponentially smaller than \(O(\log(n))\). The goal of this post is to discuss the proof of Theorem 1. See the conclusion section for a more thorough discussion on other aspects of Theorem 1, in particular, the construction time and application.
For the tree shortcutting problem, it is often insightful to look into the path graph with \(n\) vertices, which is a special case. Once we solve the path graph, extending the ideas to trees is not that difficult.
The (unweighted )path graph \(P_n\) is a path of \(n\) vertices. To simplify the presentation, assume that \(\sqrt{n}\) is an integer. The construction is recursive and described in the pseudo-code below. First we divide \(P_n\) into \(\sqrt{n}\) sub paths of size \(\sqrt{n}\) each. Denote the endpoints of these subpaths by \(b_i = i\sqrt{n}\) for \(1\leq i\leq\sqrt{n}\). We call \({b_i}_{i}\) boundary vertices. We have two types of recursions: (1) top level recursion – lines 2 and 3 – and (2) subpath recursion – lines 5 to 9. The intuition of the top level recursion is a bit tricky and we will get to that later. See Figure 2.
Figure 2: (a) The recursive construction applied to the path \(P_n\). (b) Gluing \(\mathcal T_B\) and all \(\{\mathcal T_i\}\) to obtain a tree decomposition \(\mathcal{T}\) of \(K\) via red edges.
The subpath recursion is natural: recursively shortcut each subpath \(P[b_i,b_{i+1}]\) (line 6). The subpath recursion returns the shortcut graph \(K_i\) and its tree decomposition \(\mathcal T_i\). Next, we add edges from each boundary vertex \(b_i, b_{i+1}\) of the subpath to all other vertices on the subpath (lines 7 and 8). This step guarantees that each vertex of the subpath can “jump” to the boundary vertices using only one edge. In terms of tree decomposition, it means that we add both \(b_i\) and \(b_{i+1}\) to every bag of \(\mathcal T_i\) (line 9).
The top level recursion serves two purposes: (i) creating a low hop emulator for boundary vertices (recall that each vertex can jump to a boundary vertex in the same subpath using one edge) and (ii) gluing \({\mathcal T_i}\) together. More precisely, let \(P_{\sqrt{n}}\) be a path of boundary vertices, i.e., \(b_i\) is adjacent to \(b_{i+1}\) in \(P_{\sqrt{n}}\). We shortcut \(P_{\sqrt{n}}\) recursively, getting the shortcut graph \(K_B\) and its tree decomposition \(\mathcal T_B\) (line 3). Since \((b_i,b_{i+1})\) is an edge in \(P_{\sqrt{n}}\), there must be a bag in \(\mathcal T_B\) containing both \(b_i,b_{i+1}\); that is, the bag \(X\) in line 11 exists. See Figure 2(b). Recall that in line 9, every bag in \(\mathcal T_i\) contains both \(b_i,b_{i+1}\), and that \(K_B\) and \(K_i\) only share two boundary vertices \(b_i,b_{i+1}\). Thus, we can connect \(X\) to an arbitrary bag of \(\mathcal T_i\) as done in line 12. This completes the shortcutting algorithm.
PathShortcutting\((P_n)\)
\(1.\) \(B \leftarrow {0,\sqrt{n}, 2\sqrt{n}, \ldots, n}\) and \(b_i \leftarrow i\sqrt{n}\) for every \(0\leq i \leq \sqrt{n}\)
\(2.\) \(P_{\sqrt{n}} \leftarrow\) unweighted path graph with vertex set \(B\).
\(3.\) \((K_B,\mathcal T_B) \leftarrow\)PathShortcutting\((P_{\sqrt{n}})\)
\(4.\) \(K\leftarrow K_B,\quad \mathcal{T}\leftarrow \mathcal T_B\)
\(5.\) for \(i\leftarrow 0\) to \(\sqrt{n}-1\)
\(6.\) \((K_i,\mathcal T_i) \leftarrow\)PathShortcutting\((P_{n}[b_i, b_{i+1}])\)
\(7.\) for each \(v\in P_{n}[b_i, b_{i+1}]\)
\(8.\) \(E(K_i)\leftarrow {(v,b_i), (v,b_{i+1})}\)
\(9.\) add both \({v_i,v_{i+1}}\) to every bag of \(\mathcal T_i\)
\(10.\) \(K\leftarrow K \cup K_i\)
\(11.\) Let \(X\) be a bag in \(\mathcal{T}\) containing both \(b_i,b_{i+1}\)
\(12.\) Add \(\mathcal T_i\) to \(\mathcal{T}\) by connecting \(X\) to an arbitrary bag of \(\mathcal T_i\)
\(13.\) return \((K,\mathcal{T})\)
It is not difficult to show that \(\mathcal{T}\) indeed is a tree decomposition of \(K\). Thus, we focus on analyzing the hop bound and the treewidth.
Remark 2: For notational convenience, we include \(0\) in the set \(B\) though \(0\not\in P_n\). When calling the recursion, one could simply drop 0.
Figure 3: A low-hop path from \(u\) to \(v\).
Analyzing the hop bound \(h(K)\). Let \(u\) and \(v\) be any two vertices of \(P_{n}\); w.l.o.g, assume that \(u \leq v\), and \(h(n)\) be the hop bound. Let \(b_{u}\) and \(b_v\) be two boundary vertices of the subpaths containing \(u\) and \(v\), respectively, such that \(b_u,b_v \in P[u,v]\). See Figure 3. As mentioned above, line 8 of the algorithms guarantees that there are two edges \((u,b_u)\) and \((b_v,v)\) in \(K\), and the top level recursion (line 3) guarantees that there is a shortest path of hop length \(h(\sqrt{n})\) between \(b_u\) and \(b_v\) in \(K_B\). Thus we have:
\[h(n) \leq h(\sqrt{n}) + 2\]which solves to \(h(n)= O(\log\log n)\).
Analyzing the treewidth \(\mathrm{tw}(K)\). Note that the treewidth of \(\mathcal T_B\) and all \({\mathcal{T_i}}\) (before adding boundary vertices in line 9) is bounded by \(\mathrm{tw}(\sqrt{n})\). Line 9 increases the treewidth of \({\mathcal{T_i}}\) by at most \(2\). Since the treewidth of \(K\) is the maximum treewidth \(\mathcal T_B\) and all \({\mathcal{T_i}}\), we have:
\[\mathrm{tw}(n) \leq \mathrm{tw}(\sqrt{n}) + 2\]which solves to \(\mathrm{tw}(n)= O(\log\log n)\).
This completes the proof of Theorem 1 for the path graph \(P_n\).
What is needed to extend the construction of a path graph to a general tree? Two properties we exploited in the construction of \(P_n\):
Property 2 implies that the total number of boundary vertices is about \(\sqrt{n}\), which plays a key role in analyzing the top level recursion.
It is well known that we can obtain a somewhat similar but weaker decomposition for trees: one can decompose a tree of \(n\) vertices to roughly \(\sqrt{n}\) connected subtrees such that the number of boundary vertices is \(\sqrt{n}\). (A vertex is a boundary vertex of a subtree if it is incident to an edge not in the subtree.) This decomposition is weaker in the sense that a subtree could have more than 2, and indeed up to \(\Omega(\sqrt{n})\), boundary vertices. Is this enough?
Not quite. To glue the tree decomposition \(\mathcal T_i\) to \(\mathcal{T}\) (and effectively to \(\mathcal T_B\)), we rely on the fact that there is a bag \(X\in \mathcal T_B\) containing both boundary vertices in line 11. The analogy for trees would be: there exists a bag \(X\) containing all boundary vertices of each subtree. This is problematic if a subtree has \(\Omega(\sqrt{n})\) boundary vertices.
Then how abound guaranteeing that each subtree has \(O(1)\) vertices, say 3 vertices. Will this be enough? The answer is pathetically no. To guarantee 3 vertices in the same bag, one has to add a clique of size 3 between the boundary vertices in the top level recursion. What it means is that, the graph between boundary vertices on which we recursively call the shortcuting procedure is no longer a tree. Thus, we really need a decomposition where every subtree has at most 2 boundary vertices.
Lemma 1: Let \(T\) be any tree of \(n\) vertices, one can decompose \(T\) into a collection \(\mathcal{D}\) of \(O(\sqrt{n})\) subtrees such that every tree \(T’\in \mathcal{D}\) has \(\lvert V(T’)\rvert \leq \sqrt{n}\) and at most 2 boundary vertices.
Lemma 1 is all we need to prove Theorem 1, following exactly the same construction for the path graph \(P_n\); the details are left to readers.
Remark 3: In developing our shortcutting tree result, we were unaware that Lemma 1 was already known in the literature. A reviewer later pointed out that one can get Lemma 1 from a weaker decomposition using least common ancestor closure [3], which I reproduce below.
Proof of Lemma 1: First, decompose \(T\) into a collection \(\mathcal{D}’\) of \(O(\sqrt{n})\) subtrees such that each tree in \(\mathcal{D}’\) has size at most \(\sqrt{n}\) and that the total number of boundary vertices is \(O(\sqrt{n})\). As mentioned above, this decomposition is well known; see Claim 1 in our paper [2] for a proof.
Let \(A_1\) be the set of boundary vertices; \(\lvert A_1\rvert = O(\sqrt{n})\). Root \(T\) at an arbitrary vertex. Let \(A_2\) be the set containing the ancestor of every pair of vertices in \(A_1\). Let \(B = A_1\cup A_2\).
It is not hard to see that \(\lvert A_2\rvert \leq \lvert A_1\rvert - 1 = O(\sqrt{n})\). Thus, \(\lvert B\rvert = O(\sqrt{n})\). Furthermore, every connected component of \(T\setminus B\) has at most \(\sqrt{n}\) vertices and has edges to at most 2 vertices in \(B\). The set \(B\) induces a decomposition \(\mathcal{D}\) of \(T\) claimed in Lemma 1.
The emulator in Theorem 1 can be constructed in time \(O(n \log \log n)\); see Theorem 10 our paper [2] for more details. The major open problem is:
Open problem: Is \(O((\log \log n)^2)\) the best possible bound for the product \(\mathrm{tw}(K)\cdot h(K)\)?
This open problem is intimately connected to another problem: embedding planar graphs of diameter \(D\) into low treewidth graphs with additive distortion at most \(+\epsilon D\) for any \(\epsilon \in (0,1)\). More precisely, though not explicitly stated [2], one of our main results is:
Theorem 2: If one can construct an emulator \(K\) of treewidth \(\mathrm{tw}(n)\) and hop bound \(h(n)\) for any tree of \(n\) vertices, then one can embed any planar graphs with \(n\) vertices and diameter \(D\) into a graph of treewidth \(O(h(n)\cdot \mathrm{tw}(n)/\epsilon)\) and additive distortion \(+\epsilon D\) for any given \(\epsilon \in (0,1)\).
That is, the product of treewdith and hop bound directly bounds the treewdith of the embedding. Theorem 1 give us an embedding with treewidth \(O((\log\log n)^2/\epsilon)\), which has various algorithmic applications [2]. My belief is that the bound \(O((\log \log n)^2)\) is the best possible.
[1] Bodlaender, H.L. and Hagerup, T. (1995). Parallel algorithms with optimal speedup for bounded treewidth. In ICALP ‘95, 268-279.
[2] Filtser, A. and Hung, L. (2022). Low Treewidth Embeddings of Planar and Minor-Free Metrics. ArXiv preprint arXiv:2203.15627.
[3] Fomin, F.V., Lokshtanov, D., Saurabh, S. and Zehavi, M. (2019). Kernelization: theory of parameterized preprocessing. Cambridge University Press.
]]>