I’ve just uploaded to the arXiv my paper “Sendov’s conjecture for sufficiently high degree polynomials“. This paper is a contribution to an old conjecture of Sendov on the zeroes of polynomials:

Conjecture 1 (Sendov’s conjecture) Let {f: {\bf C} \rightarrow {\bf C}} be a polynomial of degree {n \geq 2} that has all zeroes in the closed unit disk {\{ z: |z| \leq 1 \}}. If {\lambda_0} is one of these zeroes, then {f'} has at least one zero in {\{z: |z-\lambda_0| \leq 1\}}.

It is common in the literature on this problem to normalise {f} to be monic, and to rotate the zero {\lambda_0} to be an element {a} of the unit interval {[0,1]}. As it turns out, the location of {a} on this unit interval {[0,1]} ends up playing an important role in the arguments.

Many cases of this conjecture are already known, for instance

In particular, in high degrees the only cases left uncovered by prior results are when {a} is close (but not too close) to {0}, or when {a} is close (but not too close) to {1}; see Figure 1 of my paper.

Our main result covers the high degree case uniformly for all values of {a \in [0,1]}:

Theorem 2 There exists an absolute constant {n_0} such that Sendov’s conjecture holds for all {n \geq n_0}.

In principle, this reduces the verification of Sendov’s conjecture to a finite time computation, although our arguments use compactness methods and thus do not easily provide an explicit value of {n_0}. I believe that the compactness arguments can be replaced with quantitative substitutes that provide an explicit {n_0}, but the value of {n_0} produced is likely to be extremely large (certainly much larger than {9}).

Because of the previous results (particularly those of Chalebgwa and Chijiwa), we will only need to establish the following two subcases of the above theorem:

Theorem 3 (Sendov’s conjecture near the origin) Under the additional hypothesis {a = o(1/\log n)}, Sendov’s conjecture holds for sufficiently large {n}.

Theorem 4 (Sendov’s conjecture near the unit circle) Under the additional hypothesis {1-o(1) \leq a \leq 1 - \varepsilon_0^n} for a fixed {\varepsilon_0>0}, Sendov’s conjecture holds for sufficiently large {n}.

We approach these theorems using the “compactness and contradiction” strategy, assuming that there is a sequence of counterexamples whose degrees {n} going to infinity, using various compactness theorems to extract various asymptotic objects in the limit {n \rightarrow \infty}, and somehow using these objects to derive a contradiction. There are many ways to effect such a strategy; we will use a formalism that I call “cheap nonstandard analysis” and which is common in the PDE literature, in which one repeatedly passes to subsequences as necessary whenever one invokes a compactness theorem to create a limit object. However, the particular choice of asymptotic formalism one selects is not of essential importance for the arguments.

I also found it useful to use the language of probability theory. Given a putative counterexample {f} to Sendov’s conjecture, let {\lambda} be a zero of {f} (chosen uniformly at random among the {n} zeroes of {f}, counting multiplicity), and let {\zeta} similarly be a uniformly random zero of {f'}. We introduce the logarithmic potentials

\displaystyle  U_\lambda(z) := {\bf E} \log \frac{1}{|z-\lambda|}; \quad U_\zeta(z) := {\bf E} \log \frac{1}{|z-\zeta|}

and the Stieltjes transforms

\displaystyle  s_\lambda(z) := {\bf E} \frac{1}{z-\lambda}; \quad s_\zeta(z) := {\bf E} \log \frac{1}{z-\zeta}.

Standard calculations using the fundamental theorem of algebra yield the basic identities

\displaystyle  U_\lambda(z) = \frac{1}{n} \log \frac{1}{|f(z)|}; \quad U_\zeta(z) = \frac{1}{n-1} \log \frac{n}{|f'(z)|}

and

\displaystyle  s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}; \quad s_\zeta(z) = \frac{1}{n-1} \frac{f''(z)}{f'(z)} \ \ \ \ \ (1)

and in particular the random variables {\lambda, \zeta} are linked to each other by the identity

\displaystyle  U_\lambda(z) - \frac{n-1}{n} U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|. \ \ \ \ \ (2)

On the other hand, the hypotheses of Sendov’s conjecture (and the Gauss-Lucas theorem) place {\lambda,\zeta} inside the unit disk {\{ z:|z| \leq 1\}}. Applying Prokhorov’s theorem, and passing to a subsequence, one can then assume that the random variables {\lambda,\zeta} converge in distribution to some limiting random variables {\lambda^{(\infty)}, \zeta^{(\infty)}} (possibly defined on a different probability space than the original variables {\lambda,\zeta}), also living almost surely inside the unit disk. Standard potential theory then gives the convergence

\displaystyle  U_\lambda(z) \rightarrow U_{\lambda^{(\infty)}}(z); \quad U_\zeta(z) \rightarrow U_{\zeta^{(\infty)}}(z) \ \ \ \ \ (3)

and

\displaystyle  s_\lambda(z) \rightarrow s_{\lambda^{(\infty)}}(z); \quad s_\zeta(z) \rightarrow s_{\zeta^{(\infty)}}(z) \ \ \ \ \ (4)

at least in the local {L^1} sense. Among other things, we then conclude from the identity (2) and some elementary inequalities that

\displaystyle  U_{\lambda^{(\infty)}}(z) = U_{\zeta^{(\infty)}}(z)

for all {|z|>1}. This turns out to have an appealing interpretation in terms of Brownian motion: if one takes two Brownian motions in the complex plane, one originating from {\lambda^{(\infty)}} and one originating from {\zeta^{(\infty)}}, then the location where these Brownian motions first exit the unit disk {\{ z: |z| \leq 1 \}} will have the same distribution. (In our paper we actually replace Brownian motion with the closely related formalism of balayage.) This turns out to connect the random variables {\lambda^{(\infty)}}, {\zeta^{(\infty)}} quite closely to each other. In particular, with this observation and some additional arguments involving both the unique continuation property for harmonic functions and Grace’s theorem (discussed in this previous post), with the latter drawn from the prior work of Dégot, we can get very good control on these distributions:

Theorem 5
  • (i) If {a = o(1)}, then {\lambda^{(\infty)}, \zeta^{(\infty)}} almost surely lie in the semicircle {\{ e^{i\theta}: \pi/2 \leq \theta \leq 3\pi/2\}} and have the same distribution.
  • (ii) If {a = 1-o(1)}, then {\lambda^{(\infty)}} is uniformly distributed on the circle {\{ z: |z|=1\}}, and {\zeta^{(\infty)}} is almost surely zero.

In case (i) (and strengthening the hypothesis {a=o(1)} to {a=o(1/\log n)} to control some technical contributions of “outlier” zeroes of {f}), we can use this information about {\lambda^{(\infty)}} and (4) to ensure that the normalised logarithmic derivative {\frac{1}{n} \frac{f'}{f} = s_\lambda} has a non-negative winding number in a certain small (but not too small) circle around the origin, which by the argument principle is inconsistent with the hypothesis that {f} has a zero at {a = o(1)} and that {f'} has no zeroes near {a}. This is how we establish Theorem 3.

Case (ii) turns out to be more delicate. This is because there are a number of “near-counterexamples” to Sendov’s conjecture that are compatible with the hypotheses and conclusion of case (ii). The simplest such example is {f(z) = z^n - 1}, where the zeroes {\lambda} of {f} are uniformly distributed amongst the {n^{th}} roots of unity (including at {a=1}), and the zeroes of {f'} are all located at the origin. In my paper I also discuss a variant of this construction, in which {f'} has zeroes mostly near the origin, but also acquires a bounded number of zeroes at various locations {\lambda_1+o(1),\dots,\lambda_m+o(1)} inside the unit disk. Specifically, we take

\displaystyle  f(z) := \left(z + \frac{c_2}{n}\right)^{n-m} P(z) - \left(a + \frac{c_2}{n}\right)^{n-m} P(a)

where {a = 1 - \frac{c_1}{n}} for some constants {0 < c_1 < c_2} and

\displaystyle  P(z) := (z-\lambda_1) \dots (z-\lambda_m).

By a perturbative analysis to locate the zeroes of {f}, one eventually would be able to arrive at a true counterexample to Sendov’s conjecture if these locations {\lambda_1,\dots,\lambda_m} were in the open lune

\displaystyle  \{ \lambda: |\lambda| < 1 < |\lambda-1| \}

and if one had the inequality

\displaystyle  c_2 - c_1 - c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| < 0 \ \ \ \ \ (5)

for all {0 \leq \theta \leq 2\pi}. However, if one takes the mean of this inequality in {\theta}, one arrives at the inequality

\displaystyle  c_2 - c_1 + \sum_{j=1}^m \log |1 - \lambda_j| < 0

which is incompatible with the hypotheses {c_2 > c_1} and {|\lambda_j-1| > 1}. In order to extend this argument to more general polynomials {f}, we require a stability analysis of the endpoint equation

\displaystyle  c_2 - c_1 + c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| = 0 \ \ \ \ \ (6)

where we now only assume the closed conditions {c_2 \geq c_1} and {|\lambda_j-1| \geq 1}. The above discussion then places all the zeros {\lambda_j} on the arc

\displaystyle  \{ \lambda: |\lambda| < 1 = |\lambda-1|\} \ \ \ \ \ (7)

and if one also takes the second Fourier coefficient of (6) one also obtains the vanishing second moment

\displaystyle  \sum_{j=1}^m \lambda_j^2 = 0.

These two conditions are incompatible with each other (except in the degenerate case when all the {\lambda_j} vanish), because all the non-zero elements {\lambda} of the arc (7) have argument in {\pm [\pi/3,\pi/2]}, so in particular their square {\lambda^2} will have negative real part. It turns out that one can adapt this argument to the more general potential counterexamples to Sendov’s conjecture (in the form of Theorem 4). The starting point is to use (1), (4), and Theorem 5(ii) to obtain good control on {f''/f'}, which one then integrates and exponentiates to get good control on {f'}, and then on a second integration one gets enough information about {f} to pin down the location of its zeroes to high accuracy. The constraint that these zeroes lie inside the unit disk then gives an inequality resembling (5), and an adaptation of the above stability analysis is then enough to conclude. The arguments here are inspired by the previous arguments of Miller, which treated the case when {a} was extremely close to {1} via a similar perturbative analysis; the main novelty is to control the error terms not in terms of the magnitude of the largest zero {\zeta} of {f'} (which is difficult to manage when {n} gets large), but rather by the variance of those zeroes, which ends up being a more tractable expression to keep track of.

Laura Cladek and I have just uploaded to the arXiv our paper “Additive energy of regular measures in one and higher dimensions, and the fractal uncertainty principle“. This paper concerns a continuous version of the notion of additive energy. Given a finite measure {\mu} on {{\bf R}^d} and a scale {r>0}, define the energy {\mathrm{E}(\mu,r)} at scale {r} to be the quantity

\displaystyle  \mathrm{E}(\mu,r) := \mu^4\left( \{ (x_1,x_2,x_3,x_4) \in ({\bf R}^d)^4: |x_1+x_2-x_3-x_4| \leq r \}\right) \ \ \ \ \ (1)

where {\mu^4} is the product measure on {({\bf R}^d)^4} formed from four copies of the measure {\mu} on {{\bf R}^d}. We will be interested in Cantor-type measures {\mu}, supported on a compact set {X \subset B(0,1)} and obeying the Ahlfors-David regularity condition

\displaystyle  \mu(B(x,r)) \leq C r^\delta

for all balls {B(x,r)} and some constants {C, \delta > 0}, as well as the matching lower bound

\displaystyle  \mu(B(x,r)) \geq C^{-1} r^\delta

when {x \in X} whenever {0 < r < 1}. One should think of {X} as a {\delta}-dimensional fractal set, and {\mu} as some vaguely self-similar measure on this set.

Note that once one fixes {x_1,x_2,x_3}, the variable {x_4} in (1) is constrained to a ball of radius {r}, hence we obtain the trivial upper bound

\displaystyle  \mathrm{E}(\mu,r) \leq C^4 r^\delta. \ \ \ \ \ (2)

If the set {X} contains a lot of “additive structure”, one can expect this bound to be basically sharp; for instance, if {\delta} is an integer, {X} is a {\delta}-dimensional unit disk, and {\mu} is Lebesgue measure on this disk, one can verify that {\mathrm{E}(\mu,r) \sim r^\delta} (where we allow implied constants to depend on {d,\delta}. However we show that if the dimension is non-integer, then one obtains a gain:

Theorem 1 If {0 < \delta < d} is not an integer, and {X, \mu} are as above, then

\displaystyle  \mathrm{E}(\mu,r) \lesssim_{C,\delta,d} r^{\delta+\beta}

for some {\beta>0} depending only on {C,\delta,d}.

Informally, this asserts that Ahlfors-David regular fractal sets of non-integer dimension cannot behave as if they are approximately closed under addition. In fact the gain {\beta} we obtain is quasipolynomial in the regularity constant {C}:

\displaystyle  \beta = \exp\left( - O_{\delta,d}( 1 + \log^{O_{\delta,d}(1)}(C) ) \right).

(We also obtain a localised version in which the regularity condition is only required to hold at scales between {r} and {1}.) Such a result was previously obtained (with more explicit values of the {O_{\delta,d}()} implied constants) in the one-dimensional case {d=1} by Dyatlov and Zahl; but in higher dimensions there does not appear to have been any results for this general class of sets {X} and measures {\mu}. In the paper of Dyatlov and Zahl it is noted that some dependence on {C} is necessary; in particular, {\beta} cannot be much better than {1/\log C}. This reflects the fact that there are fractal sets that do behave reasonably well with respect to addition (basically because they are built out of long arithmetic progressions at many scales); however, such sets are not very Ahlfors-David regular. Among other things, this result readily implies a dimension expansion result

\displaystyle  \mathrm{dim}( f( X, X) ) \geq \delta + \beta

for any non-degenerate smooth map {f: {\bf R}^d \times {\bf R}^d \rightarrow {\bf R}^d}, including the sum map {f(x,y) := x+y} and (in one dimension) the product map {f(x,y) := x \cdot y}, where the non-degeneracy condition required is that the gradients {D_x f(x,y), D_y f(x,y): {\bf R}^d \rightarrow {\bf R}^d} are invertible for every {x,y}. We refer to the paper for the formal statement.

Our higher-dimensional argument shares many features in common with that of Dyatlov and Zahl, notably a reliance on the modern tools of additive combinatorics (and specifically the Bogulybov-Ruzsa lemma of Sanders). However, in one dimension we were also able to find a completely elementary argument, avoiding any particularly advanced additive combinatorics and instead primarily exploiting the order-theoretic properties of the real line, that gave a superior value of {\beta}, namely

\displaystyle  \beta := c \min(\delta,1-\delta) C^{-25}.

One of the main reasons for obtaining such improved energy bounds is that they imply a fractal uncertainty principle in some regimes. We focus attention on the model case of obtaining such an uncertainty principle for the semiclassical Fourier transform

\displaystyle  {\mathcal F}_h f(\xi) := (2\pi h)^{-d/2} \int_{{\bf R}^d} e^{-i x \cdot \xi/h} f(x)\ dx

where {h>0} is a small parameter. If {X, \mu, \delta} are as above, and {X_h} denotes the {h}-neighbourhood of {X}, then from the Hausdorff-Young inequality one obtains the trivial bound

\displaystyle  \| 1_{X_h} {\mathcal F}_h 1_{X_h} \|_{L^2({\bf R}^d) \rightarrow L^2({\bf R}^d)} \lesssim_{C,d} h^{\max\left(\frac{d}{2}-\delta,0\right)}.

(There are also variants involving pairs of sets {X_h, Y_h}, but for simplicity we focus on the uncertainty principle for a single set {X_h}.) The fractal uncertainty principle, when it applies, asserts that one can improve this to

\displaystyle  \| 1_{X_h} {\mathcal F}_h 1_{X_h} \|_{L^2({\bf R}^d) \rightarrow L^2({\bf R}^d)} \lesssim_{C,d} h^{\max\left(\frac{d}{2}-\delta,0\right) + \beta}

for some {\beta>0}; informally, this asserts that a function and its Fourier transform cannot simultaneously be concentrated in the set {X_h} when {\delta \leq \frac{d}{2}}, and that a function cannot be concentrated on {X_h} and have its Fourier transform be of maximum size on {X_h} when {\delta \geq \frac{d}{2}}. A modification of the disk example mentioned previously shows that such a fractal uncertainty principle cannot hold if {\delta} is an integer. However, in one dimension, the fractal uncertainty principle is known to hold for all {0 < \delta < 1}. The above-mentioned results of Dyatlov and Zahl were able to establish this for {\delta} close to {1/2}, and the remaining cases {1/2 < \delta < 1} and {0 < \delta < 1/2} were later established by Bourgain-Dyatlov and Dyatlov-Jin respectively. Such uncertainty principles have applications to hyperbolic dynamics, in particular in establishing spectral gaps for certain Selberg zeta functions.

It remains a largely open problem to establish a fractal uncertainty principle in higher dimensions. Our results allow one to establish such a principle when the dimension {\delta} is close to {d/2}, and {d} is assumed to be odd (to make {d/2} a non-integer). There is also work of Han and Schlag that obtains such a principle when one of the copies of {X_h} is assumed to have a product structure. We hope to obtain further higher-dimensional fractal uncertainty principles in subsequent work.

We now sketch how our main theorem is proved. In both one dimension and higher dimensions, the main point is to get a preliminary improvement

\displaystyle  \mathrm{E}(\mu,r_0) \leq \varepsilon r_0^\delta \ \ \ \ \ (3)

over the trivial bound (2) for any small {\varepsilon>0}, provided {r_0} is sufficiently small depending on {\varepsilon, \delta, d}; one can then iterate this bound by a fairly standard “induction on scales” argument (which roughly speaking can be used to show that energies {\mathrm{E}(\mu,r)} behave somewhat multiplicatively in the scale parameter {r}) to propagate the bound to a power gain at smaller scales. We found that a particularly clean way to run the induction on scales was via use of the Gowers uniformity norm {U^2}, and particularly via a clean Fubini-type inequality

\displaystyle  \| f \|_{U^2(V \times V')} \leq \|f\|_{U^2(V; U^2(V'))}

(ultimately proven using the Gowers-Cauchy-Schwarz inequality) that allows one to “decouple” coarse and fine scale aspects of the Gowers norms (and hence of additive energies).

It remains to obtain the preliminary improvement. In one dimension this is done by identifying some “left edges” of the set {X} that supports {\mu}: intervals {[x, x+K^{-n}]} that intersect {X}, but such that a large interval {[x-K^{-n+1},x]} just to the left of this interval is disjoint from {X}. Here {K} is a large constant and {n} is a scale parameter. It is not difficult to show (using in particular the Archimedean nature of the real line) that if one has the Ahlfors-David regularity condition for some {0 < \delta < 1} then left edges exist in abundance at every scale; for instance most points of {X} would be expected to lie in quite a few of these left edges (much as most elements of, say, the ternary Cantor set {\{ \sum_{n=1}^\infty \varepsilon_n 3^{-n} \varepsilon_n \in \{0,1\} \}} would be expected to contain a lot of {0}s in their base {3} expansion). In particular, most pairs {(x_1,x_2) \in X \times X} would be expected to lie in a pair {[x,x+K^{-n}] \times [y,y+K^{-n}]} of left edges of equal length. The key point is then that if {(x_1,x_2) \in X \times X} lies in such a pair with {K^{-n} \geq r}, then there are relatively few pairs {(x_3,x_4) \in X \times X} at distance {O(K^{-n+1})} from {(x_1,x_2)} for which one has the relation {x_1+x_2 = x_3+x_4 + O(r)}, because {x_3,x_4} will both tend to be to the right of {x_1,x_2} respectively. This causes a decrement in the energy at scale {K^{-n+1}}, and by carefully combining all these energy decrements one can eventually cobble together the energy bound (3).

We were not able to make this argument work in higher dimension (though perhaps the cases {0 < \delta < 1} and {d-1 < \delta < d} might not be completely out of reach from these methods). Instead we return to additive combinatorics methods. If the claim (3) failed, then by applying the Balog-Szemeredi-Gowers theorem we can show that the set {X} has high correlation with an approximate group {H}, and hence (by the aforementioned Bogulybov-Ruzsa type theorem of Sanders, which is the main source of the quasipolynomial bounds in our final exponent) {X} will exhibit an approximate “symmetry” along some non-trivial arithmetic progression of some spacing length {r} and some diameter {R \gg r}. The {r}-neighbourhood {X_r} of {X} will then resemble the union of parallel “cylinders” of dimensions {r \times R}. If we focus on a typical {R}-ball of {X_r}, the set now resembles a Cartesian product of an interval of length {R} with a subset of a {d-1}-dimensional hyperplane, which behaves approximately like an Ahlfors-David regular set of dimension {\delta-1} (this already lets us conclude a contradiction if {\delta<1}). Note that if the original dimension {\delta} was non-integer then this new dimension {\delta-1} will also be non-integer. It is then possible to contradict the failure of (3) by appealing to a suitable induction hypothesis at one lower dimension.

Consider a disk {D(z_0,r) := \{ z: |z-z_0| < r \}} in the complex plane. If one applies an affine-linear map {f(z) = az+b} to this disk, one obtains

\displaystyle  f(D(z_0,r)) = D(f(z_0), |f'(z_0)| r).

For maps that are merely holomorphic instead of affine-linear, one has some variants of this assertion, which I am recording here mostly for my own reference:

Theorem 1 (Holomorphic images of disks) Let {D(z_0,r)} be a disk in the complex plane, and {f: D(z_0,r) \rightarrow {\bf C}} be a holomorphic function with {f'(z_0) \neq 0}.
  • (i) (Open mapping theorem or inverse function theorem) {f(D(z_0,r))} contains a disk {D(f(z_0),\varepsilon)} for some {\varepsilon>0}. (In fact there is even a holomorphic right inverse of {f} from {D(f(z_0), \varepsilon)} to {D(z_0,r)}.)
  • (ii) (Bloch theorem) {f(D(z_0,r))} contains a disk {D(w, c |f'(z_0)| r)} for some absolute constant {c>0} and some {w \in {\bf C}}. (In fact there is even a holomorphic right inverse of {f} from {D(w, c |f'(z_0)| r)} to {D(z_0,r)}.)
  • (iii) (Koebe quarter theorem) If {f} is injective, then {f(D(z_0,r))} contains the disk {D(f(z_0), \frac{1}{4} |f'(z_0)| r)}.
  • (iv) If {f} is a polynomial of degree {n}, then {f(D(z_0,r))} contains the disk {D(f(z_0), \frac{1}{n} |f'(z_0)| r)}.
  • (v) If one has a bound of the form {|f'(z)| \leq A |f'(z_0)|} for all {z \in D(z_0,r)} and some {A>1}, then {f(D(z_0,r))} contains the disk {D(f(z_0), \frac{c}{A} |f'(z_0)| r)} for some absolute constant {c>0}. (In fact there is holomorphic right inverse of {f} from {D(f(z_0), \frac{c}{A} |f'(z_0)| r)} to {D(z_0,r)}.)

Parts (i), (ii), (iii) of this theorem are standard, as indicated by the given links. I found part (iv) as (a consequence of) Theorem 2 of this paper of Degot, who remarks that it “seems not already known in spite of its simplicity”; an equivalent form of this result also appears in Lemma 4 of this paper of Miller. The proof is simple:

Proof: (Proof of (iv)) Let {w \in D(f(z_0), \frac{1}{n} |f'(z_0)| r)}, then we have a lower bound for the log-derivative of {f(z)-w} at {z_0}:

\displaystyle  \frac{|f'(z_0)|}{|f(z_0)-w|} > \frac{n}{r}

(with the convention that the left-hand side is infinite when {f(z_0)=w}). But by the fundamental theorem of algebra we have

\displaystyle  \frac{f'(z_0)}{f(z_0)-w} = \sum_{j=1}^n \frac{1}{z_0-\zeta_j}

where {\zeta_1,\dots,\zeta_n} are the roots of the polynomial {f(z)-w} (counting multiplicity). By the pigeonhole principle, there must therefore exist a root {\zeta_j} of {f(z) - w} such that

\displaystyle  \frac{1}{|z_0-\zeta_j|} > \frac{1}{r}

and hence {\zeta_j \in D(z_0,r)}. Thus {f(D(z_0,r))} contains {w}, and the claim follows. \Box

The constant {\frac{1}{n}} in (iv) is completely sharp: if {f(z) = z^n} and {z_0} is non-zero then {f(D(z_0,|z_0|))} contains the disk

\displaystyle D(f(z_0), \frac{1}{n} |f'(z_0)| r) = D( z_0^n, |z_0|^n)

but avoids the origin, thus does not contain any disk of the form {D( z_0^n, |z_0|^n+\varepsilon)}. This example also shows that despite parts (ii), (iii) of the theorem, one cannot hope for a general inclusion of the form

\displaystyle  f(D(z_0,r)) \supset D(f(z_0), c |f'(z_0)| r )

for an absolute constant {c>0}.

Part (v) is implicit in the standard proof of Bloch’s theorem (part (ii)), and is easy to establish:

Proof: (Proof of (v)) From the Cauchy inequalities one has {f''(z) = O(\frac{A}{r} |f'(z_0)|)} for {z \in D(z_0,r/2)}, hence by Taylor’s theorem with remainder {f(z) = f(z_0) + f'(z_0) (z-z_0) (1 + O( A \frac{|z-z_0|}{r} ) )} for {z \in D(z_0, r/2)}. By Rouche’s theorem, this implies that the function {f(z)-w} has a unique zero in {D(z_0, 2cr/A)} for any {w \in D(f(z_0), cr|f'(z_0)|/A)}, if {c>0} is a sufficiently small absolute constant. The claim follows. \Box

Note that part (v) implies part (i). A standard point picking argument also lets one deduce part (ii) from part (v):

Proof: (Proof of (ii)) By shrinking {r} slightly if necessary we may assume that {f} extends analytically to the closure of the disk {D(z_0,r)}. Let {c} be the constant in (v) with {A=2}; we will prove (iii) with {c} replaced by {c/2}. If we have {|f'(z)| \leq 2 |f'(z_0)|} for all {z \in D(z_0,r/2)} then we are done by (v), so we may assume without loss of generality that there is {z_1 \in D(z_0,r/2)} such that {|f'(z_1)| > 2 |f'(z_0)|}. If {|f'(z)| \leq 2 |f'(z_1)|} for all {z \in D(z_1,r/4)} then by (v) we have

\displaystyle  f( D(z_0, r) ) \supset f( D(z_1,r/2) ) \supset D( f(z_1), \frac{c}{2} |f'(z_1)| \frac{r}{2} )

\displaystyle \supset D( f(z_1), \frac{c}{2} |f'(z_0)| r )

and we are again done. Hence we may assume without loss of generality that there is {z_2 \in D(z_1,r/4)} such that {|f'(z_2)| > 2 |f'(z_1)|}. Iterating this procedure in the obvious fashion we either are done, or obtain a Cauchy sequence {z_0, z_1, \dots} in {D(z_0,r)} such that {f'(z_j)} goes to infinity as {j \rightarrow \infty}, which contradicts the analytic nature of {f} (and hence continuous nature of {f'}) on the closure of {D(z_0,r)}. This gives the claim. \Box

Here is another classical result stated by Alexander (and then proven by Kakeya and by Szego, but also implied to a classical theorem of Grace and Heawood) that is broadly compatible with parts (iii), (iv) of the above theorem:

Proposition 2 Let {D(z_0,r)} be a disk in the complex plane, and {f: D(z_0,r) \rightarrow {\bf C}} be a polynomial of degree {n \geq 1} with {f'(z) \neq 0} for all {z \in D(z_0,r)}. Then {f} is injective on {D(z_0, \sin\frac{\pi}{n})}.

The radius {\sin \frac{\pi}{n}} is best possible, for the polynomial {f(z) = z^n} has {f'} non-vanishing on {D(1,1)}, but one has {f(\cos(\pi/n) e^{i \pi/n}) = f(\cos(\pi/n) e^{-i\pi/n})}, and {\cos(\pi/n) e^{i \pi/n}, \cos(\pi/n) e^{-i\pi/n}} lie on the boundary of {D(1,\sin \frac{\pi}{n})}.

If one narrows {\sin \frac{\pi}{n}} slightly to {\sin \frac{\pi}{2n}} then one can quickly prove this proposition as follows. Suppose for contradiction that there exist distinct {z_1, z_2 \in D(z_0, \sin\frac{\pi}{n})} with {f(z_1)=f(z_2)}, thus if we let {\gamma} be the line segment contour from {z_1} to {z_2} then {\int_\gamma f'(z)\ dz}. However, by assumption we may factor {f'(z) = c (z-\zeta_1) \dots (z-\zeta_{n-1})} where all the {\zeta_j} lie outside of {D(z_0,r)}. Elementary trigonometry then tells us that the argument of {z-\zeta_j} only varies by less than {\frac{\pi}{n}} as {z} traverses {\gamma}, hence the argument of {f'(z)} only varies by less than {\pi}. Thus {f'(z)} takes values in an open half-plane avoiding the origin and so it is not possible for {\int_\gamma f'(z)\ dz} to vanish.

To recover the best constant of {\sin \frac{\pi}{n}} requires some effort. By taking contrapositives and applying an affine rescaling and some trigonometry, the proposition can be deduced from the following result, known variously as the Grace-Heawood theorem or the complex Rolle theorem.

Proposition 3 (Grace-Heawood theorem) Let {f: {\bf C} \rightarrow {\bf C}} be a polynomial of degree {n \geq 1} such that {f(1)=f(-1)}. Then {f'} contains a zero in the closure of {D( 0, \cot \frac{\pi}{n} )}.

This is in turn implied by a remarkable and powerful theorem of Grace (which we shall prove shortly). Given two polynomials {f,g} of degree at most {n}, define the apolar form {(f,g)_n} by

\displaystyle  (f,g)_n := \sum_{k=0}^n (-1)^k f^{(k)}(0) g^{(n-k)}(0). \ \ \ \ \ (1)

Theorem 4 (Grace’s theorem) Let {C} be a circle or line in {{\bf C}}, dividing {{\bf C} \backslash C} into two open connected regions {\Omega_1, \Omega_2}. Let {f,g} be two polynomials of degree at most {n \geq 1}, with all the zeroes of {f} lying in {\Omega_1} and all the zeroes of {g} lying in {\Omega_2}. Then {(f,g)_n \neq 0}.

(Contrapositively: if {(f,g)_n=0}, then the zeroes of {f} cannot be separated from the zeroes of {g} by a circle or line.)

Indeed, a brief calculation reveals the identity

\displaystyle  f(1) - f(-1) = (f', g)_{n-1}

where {g} is the degree {n-1} polynomial

\displaystyle  g(z) := \frac{1}{n!} ((z+1)^n - (z-1)^n).

The zeroes of {g} are {i \cot \frac{\pi j}{n}} for {j=1,\dots,n-1}, so the Grace-Heawood theorem follows by applying Grace’s theorem with {C} equal to the boundary of {D(0, \cot \frac{\pi}{n})}.

The same method of proof gives the following nice consequence:

Theorem 5 (Perpendicular bisector theorem) Let {f: {\bf C} \rightarrow C} be a polynomial such that {f(z_1)=f(z_2)} for some distinct {z_1,z_2}. Then the zeroes of {f'} cannot all lie on one side of the perpendicular bisector of {z_1,z_2}. For instance, if {f(1)=f(-1)}, then the zeroes of {f'} cannot all lie in the halfplane {\{ z: \mathrm{Re} z > 0 \}} or the halfplane {\{ z: \mathrm{Re} z < 0 \}}.

I’d be interested in seeing a proof of this latter theorem that did not proceed via Grace’s theorem.

Now we give a proof of Grace’s theorem. The case {n=1} can be established by direct computation, so suppose inductively that {n>1} and that the claim has already been established for {n-1}. Given the involvement of circles and lines it is natural to suspect that a Möbius transformation symmetry is involved. This is indeed the case and can be made precise as follows. Let {V_n} denote the vector space of polynomials {f} of degree at most {n}, then the apolar form is a bilinear form {(,)_n: V_n \times V_n \rightarrow {\bf C}}. Each translation {z \mapsto z+a} on the complex plane induces a corresponding map on {V_n}, mapping each polynomial {f} to its shift {\tau_a f(z) := f(z-a)}. We claim that the apolar form is invariant with respect to these translations:

\displaystyle  ( \tau_a f, \tau_a g )_n = (f,g)_n.

Taking derivatives in {a}, it suffices to establish the skew-adjointness relation

\displaystyle  (f', g)_n + (f,g')_n = 0

but this is clear from the alternating form of (1).

Next, we see that the inversion map {z \mapsto 1/z} also induces a corresponding map on {V_n}, mapping each polynomial {f \in V_n} to its inversion {\iota f(z) := z^n f(1/z)}. From (1) we see that this map also (projectively) preserves the apolar form:

\displaystyle  (\iota f, \iota g)_n = (-1)^n (f,g)_n.

More generally, the group of Möbius transformations on the Riemann sphere acts projectively on {V_n}, with each Möbius transformation {T: {\bf C} \rightarrow {\bf C}} mapping each {f \in V_n} to {Tf(z) := g_T(z) f(T^{-1} z)}, where {g_T} is the unique (up to constants) rational function that maps this a map from {V_n} to {V_n} (its divisor is {n(T \infty) - n(\infty)}). Since the Möbius transformations are generated by translations and inversion, we see that the action of Möbius transformations projectively preserves the apolar form; also, we see this action of {T} on {V_n} also moves the zeroes of each {f \in V_n} by {T} (viewing polynomials of degree less than {n} in {V_n} as having zeroes at infinity). In particular, the hypotheses and conclusions of Grace’s theorem are preserved by this Möbius action. We can then apply such a transformation to move one of the zeroes of {f} to infinity (thus making {f} a polynomial of degree {n-1}), so that {C} must now be a circle, with the zeroes of {g} inside the circle and the remaining zeroes of {f} outside the circle. But then

\displaystyle  (f,g)_n = (f, g')_{n-1}.

By the Gauss-Lucas theorem, the zeroes of {g'} are also inside {C}. The claim now follows from the induction hypothesis.

Ben Green and I have updated our paper “An arithmetic regularity lemma, an associated counting lemma, and applications” to account for a somewhat serious issue with the paper that was pointed out to us recently by Daniel Altman. This paper contains two core theorems:

  • An “arithmetic regularity lemma” that, roughly speaking, decomposes an arbitrary bounded sequence {f(n)} on an interval {\{1,\dots,N\}} as an “irrational nilsequence” {F(g(n) \Gamma)} of controlled complexity, plus some “negligible” errors (where one uses the Gowers uniformity norm as the main norm to control the neglibility of the error); and
  • An “arithmetic counting lemma” that gives an asymptotic formula for counting various averages {{\mathbb E}_{{\bf n} \in {\bf Z}^d \cap P} f(\psi_1({\bf n})) \dots f(\psi_t({\bf n}))} for various affine-linear forms {\psi_1,\dots,\psi_t} when the functions {f} are given by irrational nilsequences.

The combination of the two theorems is then used to address various questions in additive combinatorics.

There are no direct issues with the arithmetic regularity lemma. However, it turns out that the arithmetic counting lemma is only true if one imposes an additional property (which we call the “flag property”) on the affine-linear forms {\psi_1,\dots,\psi_t}. Without this property, there does not appear to be a clean asymptotic formula for these averages if the only hypothesis one places on the underlying nilsequences is irrationality. Thus when trying to understand the asymptotics of averages involving linear forms that do not obey the flag property, the paradigm of understanding these averages via a combination of the regularity lemma and a counting lemma seems to require some significant revision (in particular, one would probably have to replace the existing regularity lemma with some variant, despite the fact that the lemma is still technically true in this setting). Fortunately, for most applications studied to date (including the important subclass of translation-invariant affine forms), the flag property holds; however our claim in the paper to have resolved a conjecture of Gowers and Wolf on the true complexity of systems of affine forms must now be narrowed, as our methods only verify this conjecture under the assumption of the flag property.

In a bit more detail: the asymptotic formula for our counting lemma involved some finite-dimensional vector spaces {\Psi^{[i]}} for various natural numbers {i}, defined as the linear span of the vectors {(\psi^i_1({\bf n}), \dots, \psi^i_t({\bf n}))} as {{\bf n}} ranges over the parameter space {{\bf Z}^d}. Roughly speaking, these spaces encode some constraints one would expect to see amongst the forms {\psi^i_1({\bf n}), \dots, \psi^i_t({\bf n})}. For instance, in the case of length four arithmetic progressions when {d=2}, {{\bf n} = (n,r)}, and

\displaystyle  \psi_i({\bf n}) = n + (i-1)r

for {i=1,2,3,4}, then {\Psi^{[1]}} is spanned by the vectors {(1,1,1,1)} and {(1,2,3,4)} and can thus be described as the two-dimensional linear space

\displaystyle  \Psi^{[1]} = \{ (a,b,c,d): a-2b+c = b-2c+d = 0\} \ \ \ \ \ (1)

while {\Psi^{[2]}} is spanned by the vectors {(1,1,1,1)}, {(1,2,3,4)}, {(1^2,2^2,3^2,4^2)} and can be described as the hyperplane

\displaystyle  \Psi^{[2]} = \{ (a,b,c,d): a-3b+3c-d = 0 \}. \ \ \ \ \ (2)

As a special case of the counting lemma, we can check that if {f} takes the form {f(n) = F( \alpha n, \beta n^2 + \gamma n)} for some irrational {\alpha,\beta \in {\bf R}/{\bf Z}}, some arbitrary {\gamma \in {\bf R}/{\bf Z}}, and some smooth {F: {\bf R}/{\bf Z} \times {\bf R}/{\bf Z} \rightarrow {\bf C}}, then the limiting value of the average

\displaystyle  {\bf E}_{n, r \in [N]} f(n) f(n+r) f(n+2r) f(n+3r)

as {N \rightarrow \infty} is equal to

\displaystyle  \int_{a_1,b_1,c_1,d_1 \in {\bf R}/{\bf Z}: a_1-2b_1+c_1=b_1-2c_1+d_1=0} \int_{a_2,b_2,c_2,d_2 \in {\bf R}/{\bf Z}: a_2-3b_2+3c_2-d_2=0}

\displaystyle  F(a_1,a_2) F(b_1,b_2) F(c_1,c_2) F(d_1,d_2)

which reflects the constraints

\displaystyle  \alpha n - 2 \alpha(n+r) + \alpha(n+2r) = \alpha(n+r) - 2\alpha(n+2r)+\alpha(n+3r)=0

and

\displaystyle  (\beta n^2 + \gamma n) - 3 (\beta(n+r)^2+\gamma(n+r))

\displaystyle + 3 (\beta(n+2r)^2 +\gamma(n+2r)) - (\beta(n+3r)^2+\gamma(n+3r))=0.

These constraints follow from the descriptions (1), (2), using the containment {\Psi^{[1]} \subset \Psi^{[2]}} to dispense with the lower order term {\gamma n} (which then plays no further role in the analysis).

The arguments in our paper turn out to be perfectly correct under the assumption of the “flag property” that {\Psi^{[i]} \subset \Psi^{[i+1]}} for all {i}. The problem is that the flag property turns out to not always hold. A counterexample, provided by Daniel Altman, involves the four linear forms

\displaystyle  \psi_1(n,r) = r; \psi_2(n,r) = 2n+2r; \psi_3(n,r) = n+3r; \psi_4(n,r) = n.

Here it turns out that

\displaystyle  \Psi^{[1]} = \{ (a,b,c,d): d-c=3a; b-2a=2d\}

and

\displaystyle  \Psi^{[2]} = \{ (a,b,c,d): 24a+3b-4c-8d=0 \}

and {\Psi^{[1]}} is no longer contained in {\Psi^{[2]}}. The analogue of the asymptotic formula given previously for {f(n) = F( \alpha n, \beta n^2 + \gamma n)} is then valid when {\gamma} vanishes, but not when {\gamma} is non-zero, because the identity

\displaystyle  24 (\beta \psi_1(n,r)^2 + \gamma \psi_1(n,r)) + 3 (\beta \psi_2(n,r)^2 + \gamma \psi_2(n,r))

\displaystyle - 4 (\beta \psi_3(n,r)^2 + \gamma \psi_3(n,r)) - 8 (\beta \psi_4(n,r)^2 + \gamma \psi_4(n,r)) = 0

holds in the former case but not the latter. Thus the output of any purported arithmetic regularity lemma in this case is now sensitive to the lower order terms of the nilsequence and cannot be described in a uniform fashion for all “irrational” sequences. There should still be some sort of formula for the asymptotics from the general equidistribution theory of nilsequences, but it could be considerably more complicated than what is presented in this paper.

Fortunately, the flag property does hold in several key cases, most notably the translation invariant case when {\Psi^{[1]}} contains {(1,\dots,1)}, as well as “complexity one” cases. Nevertheless non-flag property systems of affine forms do exist, thus limiting the range of applicability of the techniques in this paper. In particular, the conjecture of Gowers and Wolf (Theorem 1.13 in the paper) is now open again in the non-flag property case.

Several years ago, I developed a public lecture on the cosmic distance ladder in astronomy from a historical perspective (and emphasising the role of mathematics in building the ladder). I previously blogged about the lecture here; the most recent version of the slides can be found here. Recently, I have begun working with Tanya Klowden (a long time friend with a background in popular writing on a variety of topics, including astronomy) to expand the lecture into a popular science book, with the tentative format being non-technical chapters interspersed with some more mathematical sections to give some technical details. We are still in the middle of the writing process, but we have produced a sample chapter (which deals with what we call the “fourth rung” of the distance ladder – the distances and orbits of the planets – and how the work of Copernicus, Brahe, Kepler and others led to accurate measurements of these orbits, as well as Kepler’s famous laws of planetary motion). As always, any feedback on the chapter is welcome. (Due to various pandemic-related uncertainties, we do not have a definite target deadline for when the book will be completed, but presumably this will occur sometime in the next year.)

The book is currently under contract with Yale University Press. My coauthor Tanya Klowden can be reached at [email protected].

Rachel Greenfeld and I have just uploaded to the arXiv our paper “The structure of translational tilings in {{\bf Z}^d}“. This paper studies the tilings {1_F * 1_A = 1} of a finite tile {F} in a standard lattice {{\bf Z}^d}, that is to say sets {A \subset {\bf Z}^d} (which we call tiling sets) such that every element of {{\bf Z}^d} lies in exactly one of the translates {a+F, a \in A} of {F}. We also consider more general tilings of level {k} {1_F * 1_A = k} for a natural number {k} (several of our results consider an even more general setting in which {1_F * 1_A} is periodic but allowed to be non-constant).

In many cases the tiling set {A} will be periodic (by which we mean translation invariant with respect to some lattice (a finite index subgroup) of {{\bf Z}^d}). For instance one simple example of a tiling is when {F \subset {\bf Z}^2} is the unit square {F = \{0,1\}^2} and {A} is the lattice {2{\bf Z}^2 = \{ 2x: x \in {\bf Z}^2\}}. However one can modify some tilings to make them less periodic. For instance, keeping {F = \{0,1\}^2} one also has the tiling set

\displaystyle  A = \{ (2x, 2y+a(x)): x,y \in {\bf Z} \}

where {a: {\bf Z} \rightarrow \{0,1\}} is an arbitrary function. This tiling set is periodic in a single direction {(0,2)}, but is not doubly periodic. For the slightly modified tile {F = \{0,1\} \times \{0,2\}}, the set

\displaystyle  A = \{ (2x, 4y+2a(x)): x,y \in {\bf Z} \} \cup \{ (2x+b(y), 4y+1): x,y \in {\bf Z}\}

for arbitrary {a,b: {\bf Z} \rightarrow \{0,1\}} can be verified to be a tiling set, which in general will not exhibit any periodicity whatsoever; however, it is weakly periodic in the sense that it is the disjoint union of finitely many sets, each of which is periodic in one direction.

The most well known conjecture in this area is the Periodic Tiling Conjecture:

Conjecture 1 (Periodic tiling conjecture) If a finite tile {F \subset {\bf Z}^d} has at least one tiling set, then it has a tiling set which is periodic.

This conjecture was stated explicitly by Lagarias and Wang, and also appears implicitly in this text of Grunbaum and Shepard. In one dimension {d=1} there is a simple pigeonhole principle argument of Newman that shows that all tiling sets are in fact periodic, which certainly implies the periodic tiling conjecture in this case. The {d=2} case was settled more recently by Bhattacharya, but the higher dimensional cases {d > 2} remain open in general.

We are able to obtain a new proof of Bhattacharya’s result that also gives some quantitative bounds on the periodic tiling set, which are polynomial in the diameter of the set if the cardinality {|F|} of the tile is bounded:

Theorem 2 (Quantitative periodic tiling in {{\bf Z}^2}) If a finite tile {F \subset {\bf Z}^2} has at least one tiling set, then it has a tiling set which is {M{\bf Z}^2}-periodic for some {M \ll_{|F|} \mathrm{diam}(F)^{O(|F|^4)}}.

Among other things, this shows that the problem of deciding whether a given subset of {{\bf Z}^2} of bounded cardinality tiles {{\bf Z}^2} or not is in the NP complexity class with respect to the diameter {\mathrm{diam}(F)}. (Even the decidability of this problem was not known until the result of Bhattacharya.)

We also have a closely related structural theorem:

Theorem 3 (Quantitative weakly periodic tiling in {{\bf Z}^2}) Every tiling set of a finite tile {F \subset {\bf Z}^2} is weakly periodic. In fact, the tiling set is the union of at most {|F|-1} disjoint sets, each of which is periodic in a direction of magnitude {O_{|F|}( \mathrm{diam}(F)^{O(|F|^2)})}.

We also have a new bound for the periodicity of tilings in {{\bf Z}}:

Theorem 4 (Universal period for tilings in {{\bf Z}}) Let {F \subset {\bf Z}} be finite, and normalized so that {0 \in F}. Then every tiling set of {F} is {qn}-periodic, where {q} is the least common multiple of all primes up to {2|F|}, and {n} is the least common multiple of the magnitudes {|f|} of all {f \in F \backslash \{0\}}.

We remark that the current best complexity bound of determining whether a subset of {{\bf Z}} tiles {{\bf Z}} or not is {O( \exp(\mathrm{diam}(F)^{1/3+o(1)}))}, due to Biro. It may be that the results in this paper can improve upon this bound, at least for tiles of bounded cardinality.

On the other hand, we discovered a genuine difference between level one tiling and higher level tiling, by locating a counterexample to the higher level analogue of (the qualitative version of) Theorem 3:

Theorem 5 (Counterexample) There exists an eight-element subset {F \subset {\bf Z}^2} and a level {4} tiling {1_F * 1_A = 4} such that {A} is not weakly periodic.

We do not know if there is a corresponding counterexample to the higher level periodic tiling conjecture (that if {F} tiles {{\bf Z}^d} at level {k}, then there is a periodic tiling at the same level {k}). Note that it is important to keep the level fixed, since one trivially always has a periodic tiling at level {|F|} from the identity {1_F * 1 = |F|}.

The methods of Bhattacharya used the language of ergodic theory. Our investigations also originally used ergodic-theoretic and Fourier-analytic techniques, but we ultimately found combinatorial methods to be more effective in this problem (and in particular led to quite strong quantitative bounds). The engine powering all of our results is the following remarkable fact, valid in all dimensions:

Lemma 6 (Dilation lemma) Suppose that {A} is a tiling of a finite tile {F \subset {\bf Z}^d}. Then {A} is also a tiling of the dilated tile {rF} for any {r} coprime to {n}, where {n} is the least common multiple of all the primes up to {|F|}.

Versions of this dilation lemma have previously appeared in work of Tijdeman and of Bhattacharya. We sketch a proof here. By the fundamental theorem of arithmetic and iteration it suffices to establish the case where {r} is a prime {p>|F|}. We need to show that {1_{pF} * 1_A = 1}. It suffices to show the claim {1_{pF} * 1_A = 1 \hbox{ mod } p}, since both sides take values in {\{0,\dots,|F|\} \subset \{0,\dots,p-1\}}. The convolution algebra {{\bf F}_p[{\bf Z}^d]} (or group algebra) of finitely supported functions from {{\bf Z}^d} to {{\bf F}_p} is a commutative algebra of characteristic {p}, so we have the Frobenius identity {(f+g)^{*p} = f^{*p} + g^{*p}} for any {f,g}. As a consequence we see that {1_{pF} = 1_F^{*p} \hbox{ mod } p}. The claim now follows by convolving the identity {1_F * 1_A = 1 \hbox{ mod } p} by {p-1} further copies of {1_F}.

In our paper we actually establish a more general version of the dilation lemma that can handle tilings of higher level or of a periodic set, and this stronger version is useful to get the best quantitative results, but for simplicity we focus attention just on the above simple special case of the dilation lemma.

By averaging over all {r} in an arithmetic progression, one already gets a useful structural theorem for tilings in any dimension, which appears to be new despite being an easy consequence of Lemma 6:

Corollary 7 (Structure theorem for tilings) Suppose that {A} is a tiling of a finite tile {F \subset {\bf Z}^d}, where we normalize {0 \in F}. Then we have a decomposition

\displaystyle  1_A = 1 - \sum_{f \in F \backslash 0} \varphi_f \ \ \ \ \ (1)

where each {\varphi_f: {\bf Z}^d \rightarrow [0,1]} is a function that is periodic in the direction {nf}, where {n} is the least common multiple of all the primes up to {|F|}.

Proof: From Lemma 6 we have {1_A = 1 - \sum_{f \in F \backslash 0} \delta_{rf} * 1_A} for any {r = 1 \hbox{ mod } n}, where {\delta_{rf}} is the Kronecker delta at {rf}. Now average over {r} (extracting a weak limit or generalised limit as necessary) to obtain the conclusion. \Box

The identity (1) turns out to impose a lot of constraints on the functions {\varphi_f}, particularly in one and two dimensions. On one hand, one can work modulo {1} to eliminate the {1_A} and {1} terms to obtain the equation

\displaystyle  \sum_{f \in F \backslash 0} \varphi_f = 0 \hbox{ mod } 1

which in two dimensions in particular puts a lot of structure on each individual {\varphi_f} (roughly speaking it makes the {\varphi_f \hbox{ mod } 1} behave in a polynomial fashion, after collecting commensurable terms). On the other hand we have the inequality

\displaystyle  \sum_{f \in F \backslash 0} \varphi_f \leq 1 \ \ \ \ \ (2)

which can be used to exclude “equidistributed” polynomial behavior after a certain amount of combinatorial analysis. Only a small amount of further argument is then needed to conclude Theorem 3 and Theorem 2.

For level {k} tilings the analogue of (2) becomes

\displaystyle  \sum_{f \in F \backslash 0} \varphi_f \leq k

which is a significantly weaker inequality and now no longer seems to prohibit “equidistributed” behavior. After some trial and error we were able to come up with a completely explicit example of a tiling that actually utilises equidistributed polynomials; indeed the tiling set we ended up with was a finite boolean combination of Bohr sets.

We are currently studying what this machinery can tell us about tilings in higher dimensions, focusing initially on the three-dimensional case.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “Foundational aspects of uncountable measure theory: Gelfand duality, Riesz representation, canonical models, and canonical disintegration“. This paper arose from our longer-term project to systematically develop “uncountable” ergodic theory – ergodic theory in which the groups acting are not required to be countable, the probability spaces one acts on are not required to be standard Borel, or Polish, and the compact groups that arise in the structural theory (e.g., the theory of group extensions) are not required to be separable. One of the motivations of doing this is to allow ergodic theory results to be applied to ultraproducts of finite dynamical systems, which can then hopefully be transferred to establish combinatorial results with good uniformity properties. An instance of this is the uncountable Mackey-Zimmer theorem, discussed in this companion blog post.

In the course of this project, we ran into the obstacle that many foundational results, such as the Riesz representation theorem, often require one or more of these countability hypotheses when encountered in textbooks. Other technical issues also arise in the uncountable setting, such as the need to distinguish the Borel {\sigma}-algebra from the (two different types of) Baire {\sigma}-algebra. As such we needed to spend some time reviewing and synthesizing the known literature on some foundational results of “uncountable” measure theory, which led to this paper. As such, most of the results of this paper are already in the literature, either explicitly or implicitly, in one form or another (with perhaps the exception of the canonical disintegration, which we discuss below); we view the main contribution of this paper as presenting the results in a coherent and unified fashion. In particular we found that the language of category theory was invaluable in clarifying and organizing all the different results. In subsequent work we (and some other authors) will use the results in this paper for various applications in uncountable ergodic theory.

The foundational results covered in this paper can be divided into a number of subtopics (Gelfand duality, Baire {\sigma}-algebras and Riesz representation, canonical models, and canonical disintegration), which we discuss further below the fold.

Read the rest of this entry »

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Mackey-Zimmer theorem“. This paper is part of our longer term project to develop “uncountable” versions of various theorems in ergodic theory; see this previous paper of Asgar and myself for the first paper in this series (and another paper will appear shortly).

In this case the theorem in question is the Mackey-Zimmer theorem, previously discussed in this blog post. This theorem gives an important classification of group and homogeneous extensions of measure-preserving systems. Let us first work in the (classical) setting of concrete measure-preserving systems. Let {Y = (Y, \mu_Y, T_Y)} be a measure-preserving system for some group {\Gamma}, thus {(Y,\mu_Y)} is a (concrete) probability space and {T_Y : \gamma \rightarrow T_Y^\gamma} is a group homomorphism from {\Gamma} to the automorphism group {\mathrm{Aut}(Y,\mu_Y)} of the probability space. (Here we are abusing notation by using {Y} to refer both to the measure-preserving system and to the underlying set. In the notation of the paper we would instead distinguish these two objects as {Y_{\mathbf{ConcPrb}_\Gamma}} and {Y_{\mathbf{Set}}} respectively, reflecting two of the (many) categories one might wish to view {Y} as a member of, but for sake of this informal overview we will not maintain such precise distinctions.) If {K} is a compact group, we define a (concrete) cocycle to be a collection of measurable functions {\rho_\gamma : Y \rightarrow K} for {\gamma \in \Gamma} that obey the cocycle equation

\displaystyle  \rho_{\gamma \gamma'}(y) = \rho_\gamma(T_Y^{\gamma'} y) \rho_{\gamma'}(y) \ \ \ \ \ (1)

for each {\gamma,\gamma' \in \Gamma} and all {y \in Y}. (One could weaken this requirement by only demanding the cocycle equation to hold for almost all {y}, rather than all {y}; we will effectively do so later in the post, when we move to opposite probability algebra systems.) Any such cocycle generates a group skew-product {X = Y \rtimes_\rho K} of {Y}, which is another measure-preserving system {(X, \mu_X, T_X)} where
  • {X = Y \times K} is the Cartesian product of {Y} and {K};
  • {\mu_X = \mu_Y \times \mathrm{Haar}_K} is the product measure of {\mu_Y} and Haar probability measure on {K}; and
  • The action {T_X: \gamma \rightarrow } is given by the formula

    \displaystyle  T_X^\gamma(y,k) := (T_Y^\gamma y, \rho_\gamma(y) k). \ \ \ \ \ (2)

The cocycle equation (1) guarantees that {T_X} is a homomorphism, and the (left) invariance of Haar measure and Fubini’s theorem guarantees that the {T_X^\gamma} remain measure preserving. There is also the more general notion of a homogeneous skew-product {X \times Y \times_\rho K/L} in which the group {K} is replaced by the homogeneous space {K/L} for some closed subgroup of {L}, noting that {K/L} still comes with a left-action of {K} and a Haar measure. Group skew-products are very “explicit” ways to extend a system {Y}, as everything is described by the cocycle {\rho} which is a relatively tractable object to manipulate. (This is not to say that the cohomology of measure-preserving systems is trivial, but at least there are many tools one can use to study them, such as the Moore-Schmidt theorem discussed in this previous post.)

This group skew-product {X} comes with a factor map {\pi: X \rightarrow Y} and a coordinate map {\theta: X \rightarrow K}, which by (2) are related to the action via the identities

\displaystyle  \pi \circ T_X^\gamma = T_Y^\gamma \circ \pi \ \ \ \ \ (3)

and

\displaystyle  \theta \circ T_X^\gamma = (\rho_\gamma \circ \pi) \theta \ \ \ \ \ (4)

where in (4) we are implicitly working in the group of (concretely) measurable functions from {Y} to {K}. Furthermore, the combined map {(\pi,\theta): X \rightarrow Y \times K} is measure-preserving (using the product measure on {Y \times K}), indeed the way we have constructed things this map is just the identity map.

We can now generalize the notion of group skew-product by just working with the maps {\pi, \theta}, and weakening the requirement that {(\pi,\theta)} be measure-preserving. Namely, define a group extension of {Y} by {K} to be a measure-preserving system {(X,\mu_X, T_X)} equipped with a measure-preserving map {\pi: X \rightarrow Y} obeying (3) and a measurable map {\theta: X \rightarrow K} obeying (4) for some cocycle {\rho}, such that the {\sigma}-algebra of {X} is generated by {\pi,\theta}. There is also a more general notion of a homogeneous extension in which {\theta} takes values in {K/L} rather than {K}. Then every group skew-product {Y \rtimes_\rho K} is a group extension of {Y} by {K}, but not conversely. Here are some key counterexamples:

  • (i) If {H} is a closed subgroup of {K}, and {\rho} is a cocycle taking values in {H}, then {Y \rtimes_\rho H} can be viewed as a group extension of {Y} by {K}, taking {\theta: Y \rtimes_\rho H \rightarrow K} to be the vertical coordinate {\theta(y,h) = h} (viewing {h} now as an element of {K}). This will not be a skew-product by {K} because {(\theta,\pi)} pushes forward to the wrong measure on {Y \times K}: it pushes forward to {\mu_Y \times \mathrm{Haar}_H} rather than {\mu_Y \times \mathrm{Haar}_K}.
  • (ii) If one takes the same example as (i), but twists the vertical coordinate {\theta} to another vertical coordinate {\tilde \theta(y,h) := \Phi(y) \theta(y,h)} for some measurable “gauge function” {\Phi: Y \rightarrow K}, then {Y \rtimes_\rho H} is still a group extension by {K}, but now with the cocycle {\rho} replaced by the cohomologous cocycle

    \displaystyle  \tilde \rho_\gamma(y) := \Phi(T_Y^\gamma y) \rho_\gamma \Phi(y)^{-1}.

    Again, this will not be a skew product by {K}, because {(\theta,\pi)} pushes forward to a twisted version of {\mu_Y \times \mathrm{Haar}_H} that is supported (at least in the case where {Y} is compact and the cocycle {\rho} is continuous) on the {H}-bundle {\bigcup_{y \in Y} \{y\} \times \Phi(y) H}.
  • (iii) With the situation as in (i), take {X} to be the union {X = Y \rtimes_\rho H \uplus Y \rtimes_\rho Hk \subset Y \times K} for some {k \in K} outside of {H}, where we continue to use the action (2) and the standard vertical coordinate {\theta: (y,k) \mapsto k} but now use the measure {\mu_Y \times (\frac{1}{2} \mathrm{Haar}_H + \frac{1}{2} \mathrm{Haar}_{Hk})}.

As it turns out, group extensions and homogeneous extensions arise naturally in the Furstenberg-Zimmer structural theory of measure-preserving systems; roughly speaking, every compact extension of {Y} is an inverse limit of group extensions. It is then of interest to classify such extensions.

Examples such as (iii) are annoying, but they can be excluded by imposing the additional condition that the system {(X,\mu_X,T_X)} is ergodic – all invariant (or essentially invariant) sets are of measure zero or measure one. (An essentially invariant set is a measurable subset {E} of {X} such that {T^\gamma E} is equal modulo null sets to {E} for all {\gamma \in \Gamma}.) For instance, the system in (iii) is non-ergodic because the set {Y \times H} (or {Y \times Hk}) is invariant but has measure {1/2}. We then have the following fundamental result of Mackey and Zimmer:

Theorem 1 (Countable Mackey Zimmer theorem) Let {\Gamma} be a group, {Y} be a concrete measure-preserving system, and {K} be a compact Hausdorff group. Assume that {\Gamma} is at most countable, {Y} is a standard Borel space, and {K} is metrizable. Then every (concrete) ergodic group extension of {Y} is abstractly isomorphic to a group skew-product (by some closed subgroup {H} of {K}), and every (concrete) ergodic homogeneous extension of {Y} is similarly abstractly isomorphic to a homogeneous skew-product.

We will not define precisely what “abstractly isomorphic” means here, but it roughly speaking means “isomorphic after quotienting out the null sets”. A proof of this theorem can be found for instance in .

The main result of this paper is to remove the “countability” hypotheses from the above theorem, at the cost of working with opposite probability algebra systems rather than concrete systems. (We will discuss opposite probability algebras in a subsequent blog post relating to another paper in this series.)

Theorem 2 (Uncountable Mackey Zimmer theorem) Let {\Gamma} be a group, {Y} be an opposite probability algebra measure-preserving system, and {K} be a compact Hausdorff group. Then every (abstract) ergodic group extension of {Y} is abstractly isomorphic to a group skew-product (by some closed subgroup {H} of {K}), and every (abstract) ergodic homogeneous extension of {Y} is similarly abstractly isomorphic to a homogeneous skew-product.

We plan to use this result in future work to obtain uncountable versions of the Furstenberg-Zimmer and Host-Kra structure theorems.

As one might expect, one locates a proof of Theorem 2 by finding a proof of Theorem 1 that does not rely too strongly on “countable” tools, such as disintegration or measurable selection, so that all of those tools can be replaced by “uncountable” counterparts. The proof we use is based on the one given in this previous post, and begins by comparing the system {X} with the group extension {Y \rtimes_\rho K}. As the examples (i), (ii) show, these two systems need not be isomorphic even in the ergodic case, due to the different probability measures employed. However one can relate the two after performing an additional averaging in {K}. More precisely, there is a canonical factor map {\Pi: X \rtimes_1 K \rightarrow Y \times_\rho K} given by the formula

\displaystyle  \Pi(x, k) := (\pi(x), \theta(x) k).

This is a factor map not only of {\Gamma}-systems, but actually of {\Gamma \times K^{op}}-systems, where the opposite group {K^{op}} to {K} acts (on the left) by right-multiplication of the second coordinate (this reversal of order is why we need to use the opposite group here). The key point is that the ergodicity properties of the system {Y \times_\rho K} are closely tied the group {H} that is “secretly” controlling the group extension. Indeed, in example (i), the invariant functions on {Y \times_\rho K} take the form {(y,k) \mapsto f(Hk)} for some measurable {f: H \backslash K \rightarrow {\bf C}}, while in example (ii), the invariant functions on {Y \times_{\tilde \rho} K} take the form {(y,k) \mapsto f(H \Phi(y)^{-1} k)}. In either case, the invariant factor is isomorphic to {H \backslash K}, and can be viewed as a factor of the invariant factor of {X \rtimes_1 K}, which is isomorphic to {K}. Pursuing this reasoning (using an abstract ergodic theorem of Alaoglu and Birkhoff, as discussed in the previous post) one obtains the Mackey range {H}, and also obtains the quotient {\tilde \Phi: Y \rightarrow K/H} of {\Phi: Y \rightarrow K} to {K/H} in this process. The main remaining task is to lift the quotient {\tilde \Phi} back up to a map {\Phi: Y \rightarrow K} that stays measurable, in order to “untwist” a system that looks like (ii) to make it into one that looks like (i). In countable settings this is where a “measurable selection theorem” would ordinarily be invoked, but in the uncountable setting such theorems are not available for concrete maps. However it turns out that they still remain available for abstract maps: any abstractly measurable map {\tilde \Phi} from {Y} to {K/H} has an abstractly measurable lift from {Y} to {K}. To prove this we first use a canonical model for opposite probability algebras (which we will discuss in a companion post to this one, to appear shortly) to work with continuous maps (on a Stone space) rather than abstractly measurable maps. The measurable map {\tilde \Phi} then induces a probability measure on {Y \times K/H}, formed by pushing forward {\mu_Y} by the graphing map {y \mapsto (y,\tilde \Phi(y))}. This measure in turn has several lifts up to a probability measure on {Y \times K}; for instance, one can construct such a measure {\overline{\mu}} via the Riesz representation theorem by demanding

\displaystyle  \int_{Y \times K} f(y,k) \overline{\mu}(y,k) := \int_Y (\int_{\tilde \Phi(y) H} f(y,k)\ d\mathrm{Haar}_{\tilde \Phi(y) H})\ d\mu_Y(y)

for all continuous functions {f}. This measure does not come from a graph of any single lift {\Phi: Y \rightarrow K}, but is in some sense an “average” of the entire ensemble of these lifts. But it turns out one can invoke the Krein-Milman theorem to pass to an extremal lifting measure which does come from an (abstract) lift {\Phi}, and this can be used as a substitute for a measurable selection theorem. A variant of this Krein-Milman argument can also be used to express any homogeneous extension as a quotient of a group extension, giving the second part of the Mackey-Zimmer theorem.

I have uploaded to the arXiv my paper “Exploring the toolkit of Jean Bourgain“. This is one of a collection of papers to be published in the Bulletin of the American Mathematical Society describing aspects of the work of Jean Bourgain; other contributors to this collection include Keith Ball, Ciprian Demeter, and Carlos Kenig. Because the other contributors will be covering specific areas of Jean’s work in some detail, I decided to take a non-overlapping tack, and focus instead on some basic tools of Jean that he frequently used across many of the fields he contributed to. Jean had a surprising number of these “basic tools” that he wielded with great dexterity, and in this paper I focus on just a few of them:

  • Reducing qualitative analysis results (e.g., convergence theorems or dimension bounds) to quantitative analysis estimates (e.g., variational inequalities or maximal function estimates).
  • Using dyadic pigeonholing to locate good scales to work in or to apply truncations.
  • Using random translations to amplify small sets (low density) into large sets (positive density).
  • Combining large deviation inequalities with metric entropy bounds to control suprema of various random processes.

Each of these techniques is individually not too difficult to explain, and were certainly employed on occasion by various mathematicians prior to Bourgain’s work; but Jean had internalized them to the point where he would instinctively use them as soon as they became relevant to a given problem at hand. I illustrate this at the end of the paper with an exposition of one particular result of Jean, on the Erdős similarity problem, in which his main result (that any sum {S = S_1+S_2+S_3} of three infinite sets of reals has the property that there exists a positive measure set {E} that does not contain any homothetic copy {x+tS} of {S}) is basically proven by a sequential application of these tools (except for dyadic pigeonholing, which turns out not to be needed here).

I had initially intended to also cover some other basic tools in Jean’s toolkit, such as the uncertainty principle and the use of probabilistic decoupling, but was having trouble keeping the paper coherent with such a broad focus (certainly I could not identify a single paper of Jean’s that employed all of these tools at once). I hope though that the examples given in the paper gives some reasonable impression of Jean’s research style.

Starting on Oct 2, I will be teaching Math 246A, the first course in the three-quarter graduate complex analysis sequence at the math department here at UCLA.  This first course covers much of the same ground as an honours undergraduate complex analysis course, in particular focusing on the basic properties of holomorphic functions such as the Cauchy and residue theorems, the classification of singularities, and the maximum principle, but there will be more of an emphasis on rigour, generalisation and abstraction, and connections with other parts of mathematics.  The main text I will be using for this course is Stein-Shakarchi (with Ahlfors as a secondary text), but I will also be using the blog lecture notes I wrote the last time I taught this course in 2016. At this time I do not expect to significantly deviate from my past lecture notes, though I do not know at present how different the pace will be this quarter when the course is taught remotely. As with my 247B course last spring, the lectures will be open to the public, though other coursework components will be restricted to enrolled students.

Archives