I’ve just uploaded to the arXiv my paper The Ionescu-Wainger multiplier theorem and the adeles“. This paper revisits a useful multiplier theorem of Ionescu and Wainger on “major arc” Fourier multiplier operators on the integers {{\bf Z}} (or lattices {{\bf Z}^d}), and strengthens the bounds while also interpreting it from the viewpoint of the adelic integers {{\bf A}_{\bf Z}} (which were also used in my recent paper with Krause and Mirek).

For simplicity let us just work in one dimension. Any smooth function {m: {\bf R}/{\bf Z} \rightarrow {\bf C}} then defines a discrete Fourier multiplier operator {T_m: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any {1 \leq p \leq \infty} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_m f(\xi) =: m(\xi) {\mathcal F}_{\bf Z} f(\xi)

where {{\mathcal F}_{\bf Z} f(\xi) := \sum_{n \in {\bf Z}} f(n) e(n \xi)} is the Fourier transform on {{\bf Z}}; similarly, any test function {m: {\bf R} \rightarrow {\bf C}} defines a continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})} by the formula

\displaystyle  {\mathcal F}_{\bf R} T_m f(\xi) := m(\xi) {\mathcal F}_{\bf R} f(\xi)

where {{\mathcal F}_{\bf R} f(\xi) := \int_{\bf R} f(x) e(x \xi)\ dx}. In both cases we refer to {m} as the symbol of the multiplier operator {T_m}.

We will be interested in discrete Fourier multiplier operators whose symbols are supported on a finite union of arcs. One way to construct such operators is by “folding” continuous Fourier multiplier operators into various target frequencies. To make this folding operation precise, given any continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})}, and any frequency {\alpha \in {\bf R}/{\bf Z}}, we define the discrete Fourier multiplier operator {T_{m;\alpha}: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any frequency shift {\alpha \in {\bf R}/{\bf Z}} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_{m,\alpha} f(\xi) := \sum_{\theta \in {\bf R}: \xi = \alpha + \theta} m(\theta) {\mathcal F}_{\bf Z} f(\xi)

or equivalently

\displaystyle  T_{m;\alpha} f(n) = \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

More generally, given any finite set {\Sigma \subset {\bf R}/{\bf Z}}, we can form a multifrequency projection operator {T_{m;\Sigma}} on {\ell^p({\bf Z})} by the formula

\displaystyle  T_{m;\Sigma} := \sum_{\alpha \in \Sigma} T_{m;\alpha}

thus

\displaystyle  T_{m;\alpha} f(n) = \sum_{\alpha \in \Sigma} \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

This construction gives discrete Fourier multiplier operators whose symbol can be localised to a finite union of arcs. For instance, if {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]}, then {T_{m;\Sigma}} is a Fourier multiplier whose symbol is supported on the set {\bigcup_{\alpha \in \Sigma} \alpha + [-\varepsilon,\varepsilon]}.

There are a body of results relating the {\ell^p({\bf Z})} theory of discrete Fourier multiplier operators such as {T_{m;\alpha}} or {T_{m;\Sigma}} with the {L^p({\bf R})} theory of their continuous counterparts. For instance we have the basic result of Magyar, Stein, and Wainger:

Proposition 1 (Magyar-Stein-Wainger sampling principle) Let {1 \leq p \leq \infty} and {\alpha \in {\bf R}/{\bf Z}}.
  • (i) If {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2,1/2]}, then {\|T_{m;\alpha}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}, where {B(V)} denotes the operator norm of an operator {T: V \rightarrow V}.
  • (ii) More generally, if {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2Q,1/2Q]} for some natural number {Q}, then {\|T_{m;\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}.

When {p=2} the implied constant in these bounds can be set to equal {1}. In the paper of Magyar, Stein, and Wainger it was posed as an open problem as to whether this is the case for other {p}; in an appendix to this paper I show that the answer is negative if {p} is sufficiently close to {1} or {\infty}, but I do not know the full answer to this question.

This proposition allows one to get a good multiplier theory for symbols supported near cyclic groups {\frac{1}{Q}{\bf Z}/{\bf Z}}; for instance it shows that a discrete Fourier multiplier with symbol {\sum_{\alpha \in \frac{1}{Q}{\bf Z}/{\bf Z}} \phi(Q(\xi-\alpha))} for a fixed test function {\phi} is bounded on {\ell^p({\bf Z})}, uniformly in {p} and {Q}. For many applications in discrete harmonic analysis, one would similarly like a good multiplier theory for symbols supported in “major arc” sets such as

\displaystyle  \bigcup_{q=1}^N \bigcup_{\alpha \in \frac{1}{q}{\bf Z}/{\bf Z}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (1)

and in particular to get a good Littlewood-Paley theory adapted to major arcs. (This is particularly the case when trying to control “true complexity zero” expressions for which the minor arc contributions can be shown to be negligible; my recent paper with Krause and Mirek is focused on expressions of this type.) At present we do not have a good multiplier theory that is directly adapted to the classical major arc set (1) (though I do not know of rigorous negative results that show that such a theory is not possible); however, Ionescu and Wainger were able to obtain a useful substitute theory in which (1) was replaced by a somewhat larger set that had better multiplier behaviour. Starting with a finite collection {S} of pairwise coprime natural numbers, and a natural number {k}, one can form the major arc type set

\displaystyle  \bigcup_{\alpha \in \Sigma_{\leq k}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (2)

where {\Sigma_{\leq k} \subset {\bf R}/{\bf Z}} consists of all rational points in the unit circle of the form {\frac{a}{Q} \mod 1} where {Q} is the product of at most {k} elements from {S} and {a} is an integer. For suitable choices of {S} and {k} not too large, one can make this set (2) contain the set (1) while still having a somewhat controlled size (very roughly speaking, one chooses {S} to consist of (small powers of) large primes between {N^\rho} and {N} for some small constant {\rho>0}, together with something like the product of all the primes up to {N^\rho} (raised to suitable powers)).

In the regime where {k} is fixed and {\varepsilon} is small, there is a good theory:

Theorem 2 (Ionescu-Wainger theorem, rough version) If {p} is an even integer or the dual of an even integer, and {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]} for a sufficiently small {\varepsilon > 0}, then

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \lesssim_{p, k} (\log(1+|S|))^{O_k(1)} \|T_m\|_{B(L^p({\bf R}))}.

There is a more explicit description of how small {\varepsilon} needs to be for this theorem to work (roughly speaking, it is not much more than what is needed for all the arcs {\alpha + [-\varepsilon,\varepsilon]} in (2) to be disjoint), but we will not give it here. The logarithmic loss of {(\log(1+|S|))^{O_k(1)}} was reduced to {\log(1+|S|)} by Mirek. In this paper we refine the bound further to

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \leq O(r \log(2+kr))^k \|T_m\|_{B(L^p({\bf R}))}. \ \ \ \ \ (3)

when {p = 2r} or {p = (2r)'} for some integer {r}. In particular there is no longer any logarithmic loss in the cardinality of the set {S}.

The proof of (3) follows a similar strategy as to previous proofs of Ionescu-Wainger type. By duality we may assume {p=2r}. We use the following standard sequence of steps:

  • (i) (Denominator orthogonality) First one splits {T_{m;\Sigma_{\leq k}} f} into various pieces depending on the denominator {Q} appearing in the element of {\Sigma_{\leq k}}, and exploits “superorthogonality” in {Q} to estimate the {\ell^p} norm by the {\ell^p} norm of an appropriate square function.
  • (ii) (Nonconcentration) One expands out the {p^{th}} power of the square function and estimates it by a “nonconcentrated” version in which various factors that arise in the expansion are “disjoint”.
  • (iii) (Numerator orthogonality) We now decompose based on the numerators {a} appearing in the relevant elements of {\Sigma_{\leq k}}, and exploit some residual orthogonality in this parameter to reduce to estimating a square-function type expression involving sums over various cosets {\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}.
  • (iv) (Marcinkiewicz-Zygmund) One uses the Marcinkiewicz-Zygmund theorem relating scalar and vector valued operator norms to eliminate the role of the multiplier {m}.
  • (v) (Rubio de Francia) Use a reverse square function estimate of Rubio de Francia type to conclude.

The main innovations are that of using the probabilistic decoupling method to remove some logarithmic losses in (i), and recent progress on the Erdos-Rado sunflower conjecture (as discussed in this recent post) to improve the bounds in (ii). For (i), the key point is that one can express a sum such as

\displaystyle  \sum_{A \in \binom{S}{k}} f_A,

where {\binom{S}{k}} is the set of {k}-element subsets of an index set {S}, and {f_A} are various complex numbers, as an average

\displaystyle  \sum_{A \in \binom{S}{k}} f_A = \frac{k^k}{k!} {\bf E} \sum_{s_1 \in {\bf S}_1,\dots,s_k \in {\bf S}_k} f_{\{s_1,\dots,s_k\}}

where {S = {\bf S}_1 \cup \dots \cup {\bf S}_k} is a random partition of {S} into {k} subclasses (chosen uniformly over all such partitions), basically because every {k}-element subset {A} of {S} has a probability exactly {\frac{k!}{k^k}} of being completely shattered by such a random partition. This “decouples” the index set {\binom{S}{k}} into a Cartesian product {{\bf S}_1 \times \dots \times {\bf S}_k} which is more convenient for application of the superorthogonality theory. For (ii), the point is to efficiently obtain estimates of the form

\displaystyle  (\sum_{A \in \binom{S}{k}} F_A)^r \lesssim_{k,r} \sum_{A_1,\dots,A_r \in \binom{S}{k} \hbox{ sunflower}} F_{A_1} \dots F_{A_r}

where {F_A} are various non-negative quantities, and a sunflower is a collection of sets {A_1,\dots,A_r} that consist of a common “core” {A_0} and disjoint “petals” {A_1 \backslash A_0,\dots,A_r \backslash A_0}. The other parts of the argument are relatively routine; see for instance this survey of Pierce for a discussion of them in the simple case {k=1}.

In this paper we interpret the Ionescu-Wainger multiplier theorem as being essentially a consequence of various quantitative versions of the Shannon sampling theorem. Recall that this theorem asserts that if a (Schwartz) function {f: {\bf R} \rightarrow {\bf C}} has its Fourier transform supported on {[-1/2,1/2]}, then {f} can be recovered uniquely from its restriction {f|_{\bf Z}: {\bf Z} \rightarrow {\bf C}}. In fact, as can be shown from a little bit of routine Fourier analysis, if we narrow the support of the Fourier transform slightly to {[-c,c]} for some {0 < c < 1/2}, then the restriction {f|_{\bf Z}} has the same {L^p} behaviour as the original function, in the sense that

\displaystyle  \| f|_{\bf Z} \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R})} \ \ \ \ \ (4)

for all {0 < p \leq \infty}; see Theorem 4.18 of this paper of myself with Krause and Mirek. This is consistent with the uncertainty principle, which suggests that such functions {f} should behave like a constant at scales {\sim 1/c}.

The quantitative sampling theorem (4) can be used to give an alternate proof of Proposition 1(i), basically thanks to the identity

\displaystyle  T_{m;0} (f|_{\bf Z}) = (T_m f)_{\bf Z}

whenever {f: {\bf R} \rightarrow {\bf C}} is Schwartz and has Fourier transform supported in {[-1/2,1/2]}, and {m} is also supported on {[-1/2,1/2]}; this identity can be easily verified from the Poisson summation formula. A variant of this argument also yields an alternate proof of Proposition 1(ii), where the role of {{\bf R}} is now played by {{\bf R} \times {\bf Z}/Q{\bf Z}}, and the standard embedding of {{\bf Z}} into {{\bf R}} is now replaced by the embedding {\iota_Q: n \mapsto (n, n \hbox{ mod } Q)} of {{\bf Z}} into {{\bf R} \times {\bf Z}/Q{\bf Z}}; the analogue of (4) is now

\displaystyle  \| f \circ \iota_Q \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R} \times {\bf Z}/Q{\bf Z})} \ \ \ \ \ (5)

whenever {f: {\bf R} \times {\bf Z}/Q{\bf Z} \rightarrow {\bf C}} is Schwartz and has Fourier transform {{\mathcal F}_{{\bf R} \times {\bf Z}/Q{\bf Z}} f\colon {\bf R} \times \frac{1}{Q}{\bf Z}/{\bf Z} \rightarrow {\bf C}} supported in {[-c/Q,c/Q] \times \frac{1}{Q}{\bf Z}/{\bf Z}}, and {{\bf Z}/Q{\bf Z}} is endowed with probability Haar measure.

The locally compact abelian groups {{\bf R}} and {{\bf R} \times {\bf Z}/Q{\bf Z}} can all be viewed as projections of the adelic integers {{\bf A}_{\bf Z} := {\bf R} \times \hat {\bf Z}} (the product of the reals and the profinite integers {\hat {\bf Z}}). By using the Ionescu-Wainger multiplier theorem, we are able to obtain an adelic version of the quantitative sampling estimate (5), namely

\displaystyle  \| f \circ \iota \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf A}_{\bf Z})}

whenever {1 < p < \infty}, {f: {\bf A}_{\bf Z} \rightarrow {\bf C}} is Schwartz-Bruhat and has Fourier transform {{\mathcal F}_{{\bf A}_{\bf Z}} f: {\bf R} \times {\bf Q}/{\bf Z} \rightarrow {\bf C}} supported on {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}} for some sufficiently small {\varepsilon} (the precise bound on {\varepsilon} depends on {S, p, c} in a fashion not detailed here). This allows one obtain an “adelic” extension of the Ionescu-Wainger multiplier theorem, in which the {\ell^p({\bf Z})} operator norm of any discrete multiplier operator whose symbol is supported on major arcs can be shown to be comparable to the {L^p({\bf A}_{\bf Z})} operator norm of an adelic counterpart to that multiplier operator; in principle this reduces “major arc” harmonic analysis on the integers {{\bf Z}} to “low frequency” harmonic analysis on the adelic integers {{\bf A}_{\bf Z}}, which is a simpler setting in many ways (mostly because the set of major arcs (2) is now replaced with a product set {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}}).

Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

Theorem 1 (Birkhoff ergodic theorem) Let {(X,\mu,T)} be a measure-preserving system (by which we mean {(X,\mu)} is a {\sigma}-finite measure space, and {T: X \rightarrow X} is invertible and measure-preserving), and let {f \in L^p(X)} for any {1 \leq p < \infty}. Then the averages {\frac{1}{N} \sum_{n=1}^N f(T^n x)} converge pointwise for {\mu}-almost every {x \in X}.

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint {p=1}, where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

Theorem 2 (Pointwise ergodic theorem for polynomial averages) Let {(X,\mu,T)} be a measure-preserving system, and let {f \in L^p(X)} for any {1 < p < \infty}. Let {P \in {\bf Z}[{\mathrm n}]} be a polynpmial with integer coefficients. Then the averages {\frac{1}{N} \sum_{n=1}^N f(T^{P(n)} x)} converge pointwise for {\mu}-almost every {x \in X}.

For bilinear averages, we have a separate 1990 result of Bourgain (for {L^\infty} functions), extended to other {L^p} spaces by Lacey, and with an alternate proof given, by Demeter:

Theorem 3 (Pointwise ergodic theorem for two linear polynomials) Let {(X,\mu,T)} be a measure-preserving system with finite measure, and let {f \in L^{p_1}(X)}, {g \in L^{p_2}} for some {1 < p_1,p_2 \leq \infty} with {\frac{1}{p_1}+\frac{1}{p_2} < \frac{3}{2}}. Then for any integers {a,b}, the averages {\frac{1}{N} \sum_{n=1}^N f(T^{an} x) g(T^{bn} x)} converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial) Let {(X,\mu,T)} be a measure-preserving system, and let {f \in L^{p_1}(X)}, {g \in L^{p_2}} for some {1 < p_1,p_2 < \infty} with {\frac{1}{p_1}+\frac{1}{p_2} \leq 1}. Then for any polynomial {P \in {\bf Z}[{\mathrm n}]} of degree {d \geq 2}, the averages {\frac{1}{N} \sum_{n=1}^N f(T^{n} x) g(T^{P(n)} x)} converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with {\frac{1}{p_1}+\frac{1}{p_2}>1}, but we will not discuss these extensions here. A good model case to keep in mind is when {p_1=p_2=2} and {P(n) = n^2} (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the {d=2} case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions {f,g} involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if {f,g} are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which {X = {\bf Z}} with counting measure, and {T(n) = n-1}). A model problem is to establish the maximal inequality

\displaystyle  \| \sup_N |A_N(f,g)| \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \ \ \ \ \ (1)

where {N} ranges over powers of two and {A_N} is the bilinear operator

\displaystyle  A_N(f,g)(x) := \frac{1}{N} \sum_{n=1}^N f(x-n) g(x-n^2).

The single scale estimate

\displaystyle  \| A_N(f,g) \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})}

or equivalently (by duality)

\displaystyle  \frac{1}{N} \sum_{n=1}^N \sum_{x \in {\bf Z}} h(x) f(x-n) g(x-n^2) \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \|h\|_{\ell^\infty({\bf Z})} \ \ \ \ \ (2)

is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales {N}.

The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when {f(x) = e(ax/q) F(x)}, {g(x) = e(bx/q) G(x)}, {h(x) = e(cx/q) H(x)} where {q=O(1)} is a small modulus, {a,b,c} are such that {a+b+c=0 \hbox{ mod } q}, {G} is a smooth cutoff to an interval {I} of length {O(N^2)}, and {F=H} is also supported on {I} and behaves like a constant on intervals of length {O(N)}. Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials {P} by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when {f,g,h} are supported on a common interval {I} of length {O(N^2)} and are normalised in {\ell^\infty} rather than {\ell^2}. (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the {f,h} factors; the corresponding statement for {g} was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the {\ell^\infty}-normalized version of (2)).

For our applications we had to extend the {\ell^\infty} inverse theory of Peluse and Prendiville to an {\ell^2} theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

\displaystyle  A^*_N(h,g)(x) = \frac{1}{N} \sum_{n=1}^N h(x+n) g(x+n-n^2)

can be well approximated in {\ell^1} by a function that has Fourier support on “major arcs” if {g,h} enjoy {\ell^\infty} control. To get the required extension to {\ell^2} in the {f} aspect one has to improve the control on the error from {\ell^1} to {\ell^2}; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recent {\ell^p({\bf Z})} improving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as {x \mapsto \frac{1}{N} \sum_{n=1}^N g(x+n-n^2)}, one can relax the {\ell^\infty} hypothesis on {g} to an {\ell^2} hypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function {f}; a modification of the arguments also gives something similar for {g}.

Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when {f,g} are supported near rational numbers {a/q} with {q \sim 2^l} for some moderately large {l}. The inverse theory gives good control (with an exponential decay in {l}) on individual scales {N}, and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of {A_N} to eventually handle all “small” scales, with {N} ranging up to say {2^{2^u}} where {u = C 2^{\rho l}} for some small constant {\rho} and large constant {C}. For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator {Q}, and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers {{\bf Z}} to the locally compact abelian group {{\bf R} \times {\bf Z}/Q{\bf Z}}. Actually it was conceptually clearer for us to work instead with the adelic integers {{\mathbf A}_{\bf Z} ={\bf R} \times \hat {\bf Z}}, which is the inverse limit of the {{\bf R} \times {\bf Z}/Q{\bf Z}}. Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

\displaystyle  A_{N,{\bf R}}(f,g)(x) := \frac{1}{N} \int_0^N f(x-t) g(x-t^2)\ dt

on {{\bf R}}, and the “arithmetic” bilinear operator

\displaystyle  A_{\hat Z}(f,g)(x) := \int_{\hat {\bf Z}} f(x-y) g(x-y^2) d\mu_{\hat {\bf Z}}(y)

on the profinite integers {\hat {\bf Z}}, equipped with probability Haar measure {\mu_{\hat {\bf Z}}}. After a number of standard manipulations (interpolation, Fubini’s theorem, Holder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing an {L^q} improving estimate

\displaystyle  \| A_{\hat {\bf Z}}(f,g) \|_{L^q(\hat {\bf Z})} \lesssim \|f\|_{L^2(\hat {\bf Z})} \|g\|_{L^2(\hat {\bf Z})}

for some {q>2}. Splitting the profinite integers {\hat {\bf Z}} into the product of the {p}-adic integers {{\bf Z}_p}, it suffices to establish this claim for each {{\bf Z}_p} separately (so long as we keep the implied constant equal to {1} for sufficiently large {p}). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmetic {L^q} improving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the {p}-adic field {{\bf Q}_p}, which are a variant of some estimates of Kowalski and Wright.

Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the local Fourier uniformity conjecture for bounded multiplicative functions such as the Liouville function {\lambda}. One form of this conjecture is the assertion that

\displaystyle  \int_0^X \| \lambda \|_{U^k([x,x+H])}\ dx = o(X) \ \ \ \ \ (1)

as {X \rightarrow \infty} for any fixed {k \geq 0} and any {H = H(X) \leq X} that goes to infinity as {X \rightarrow \infty}, where {U^k([x,x+H])} is the (normalized) Gowers uniformity norm. Among other things this conjecture implies (logarithmically averaged version of) the Chowla and Sarnak conjectures for the Liouville function (or the Möbius function), see this previous blog post.

The conjecture gets more difficult as {k} increases, and also becomes more difficult the more slowly {H} grows with {X}. The {k=0} conjecture is equivalent to the assertion

\displaystyle  \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n)| \ dx = o(HX)

which was proven (for arbitrarily slowly growing {H}) in a landmark paper of Matomäki and Radziwill, discussed for instance in this blog post.

For {k=1}, the conjecture is equivalent to the assertion

\displaystyle  \int_0^X \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)| \ dx = o(HX). \ \ \ \ \ (2)

This remains open for sufficiently slowly growing {H} (and it would be a major breakthrough in particular if one could obtain this bound for {H} as small as {\log^\varepsilon X} for any fixed {\varepsilon>0}, particularly if applicable to more general bounded multiplicative functions than {\lambda}, as this would have new implications for a generalization of the Chowla conjecture known as the Elliott conjecture). Recently, Kaisa, Maks and myself were able to establish this conjecture in the range {H \geq X^\varepsilon} (in fact we have since worked out in the current paper that we can get {H} as small as {\exp(\log^{5/8+\varepsilon} X)}). In our current paper we establish Fourier uniformity conjecture for higher {k} for the same range of {H}. This in particular implies local orthogonality to polynomial phases,

\displaystyle  \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) e(-P(n))| \ dx = o(HX) \ \ \ \ \ (3)

where {\mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} denotes the polynomials of degree at most {k-1}, but the full conjecture is a bit stronger than this, establishing the more general statement

\displaystyle  \int_0^X \sup_{g \in \mathrm{Poly}({\bf R} \rightarrow G)} |\sum_{x \leq n \leq x+H} \lambda(n) \overline{F}(g(n) \Gamma)| \ dx = o(HX) \ \ \ \ \ (4)

for any degree {k} filtered nilmanifold {G/\Gamma} and Lipschitz function {F: G/\Gamma \rightarrow {\bf C}}, where {g} now ranges over polynomial maps from {{\bf R}} to {G}. The method of proof follows the same general strategy as in the previous paper with Kaisa and Maks. (The equivalence of (4) and (1) follows from the inverse conjecture for the Gowers norms, proven in this paper.) We quickly sketch first the proof of (3), using very informal language to avoid many technicalities regarding the precise quantitative form of various estimates. If the estimate (3) fails, then we have the correlation estimate

\displaystyle  |\sum_{x \leq n \leq x+H} \lambda(n) e(-P_x(n))| \gg H

for many {x \sim X} and some polynomial {P_x} depending on {x}. The difficulty here is to understand how {P_x} can depend on {x}. We write the above correlation estimate more suggestively as

\displaystyle  \lambda(n) \sim_{[x,x+H]} e(P_x(n)).

Because of the multiplicativity {\lambda(np) = -\lambda(p)} at small primes {p}, one expects to have a relation of the form

\displaystyle  e(P_{x'}(p'n)) \sim_{[x/p,x/p+H/p]} e(P_x(pn)) \ \ \ \ \ (5)

for many {x,x'} for which {x/p \approx x'/p'} for some small primes {p,p'}. (This can be formalised using an inequality of Elliott related to the Turan-Kubilius theorem.) This gives a relationship between {P_x} and {P_{x'}} for “edges” {x,x'} in a rather sparse “graph” connecting the elements of say {[X/2,X]}. Using some graph theory one can locate some non-trivial “cycles” in this graph that eventually lead (in conjunction to a certain technical but important “Chinese remainder theorem” step to modify the {P_x} to eliminate a rather serious “aliasing” issue that was already discussed in this previous post) to obtain functional equations of the form

\displaystyle  P_x(a_x \cdot) \approx P_x(b_x \cdot)

for some large and close (but not identical) integers {a_x,b_x}, where {\approx} should be viewed as a first approximation (ignoring a certain “profinite” or “major arc” term for simplicity) as “differing by a slowly varying polynomial” and the polynomials {P_x} should now be viewed as taking values on the reals rather than the integers. This functional equation can be solved to obtain a relation of the form

\displaystyle  P_x(t) \approx T_x \log t

for some real number {T_x} of polynomial size, and with further analysis of the relation (5) one can make {T_x} basically independent of {x}. This simplifies (3) to something like

\displaystyle  \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}| \ dx = o(HX)

and this is now of a form that can be treated by the theorem of Matomäki and Radziwill (because {n \mapsto \lambda(n) n^{-iT}} is a bounded multiplicative function). (Actually because of the profinite term mentioned previously, one also has to insert a Dirichlet character of bounded conductor into this latter conclusion, but we will ignore this technicality.)

Now we apply the same strategy to (4). For abelian {G} the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence {g_x \in \mathrm{Poly}({\bf R} \rightarrow G)} attached to many {x \sim X}, and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation

\displaystyle  g_x(a_x \cdot) \Gamma \approx g_x(b_x \cdot) \Gamma \ \ \ \ \ (6)

where the relation {\approx} is rather technical and will not be detailed here. A new difficulty arises in that there are some unwanted solutions to this equation, such as

\displaystyle  g_x(t) = \gamma^{\frac{\log(a_x t)}{\log(a_x/b_x)}}

for some {\gamma \in \Gamma}, which do not necessarily lead to multiplicative characters like {n^{-iT}} as in the polynomial case, but instead to some unfriendly looking “generalized multiplicative characters” (think of {e(\lfloor \alpha \log n \rfloor \beta \log n)} as a rough caricature). To avoid this problem, we rework the graph theory portion of the argument to produce not just one functional equation of the form (6)for each {x}, but many, leading to dilation invariances

\displaystyle  g_x((1+\theta) t) \Gamma \approx g_x(t) \Gamma

for a “dense” set of {\theta}. From a certain amount of Lie algebra theory (ultimately arising from an understanding of the behaviour of the exponential map on nilpotent matrices, and exploiting the hypothesis that {G} is non-abelian) one can conclude that (after some initial preparations to avoid degenerate cases) {g_x(t)} must behave like {\gamma_x^{\log t}} for some central element {\gamma_x} of {G}. This eventually brings one back to the multiplicative characters {n^{-iT}} that arose in the polynomial case, and the arguments now proceed as before.

We give two applications of this higher order Fourier uniformity. One regards the growth of the number

\displaystyle  s(k) := |\{ (\lambda(n+1),\dots,\lambda(n+k)): n \in {\bf N} \}|

of length {k} sign patterns in the Liouville function. The Chowla conjecture implies that {s(k) = 2^k}, but even the weaker conjecture of Sarnak that {s(k) \gg (1+\varepsilon)^k} for some {\varepsilon>0} remains open. Until recently, the best asymptotic lower bound on {s(k)} was {s(k) \gg k^2}, due to McNamara; with our result, we can now show {s(k) \gg_A k^A} for any {A} (in fact we can get {s(k) \gg_\varepsilon \exp(\log^{8/5-\varepsilon} k)} for any {\varepsilon>0}). The idea is to repeat the now-standard argument to exploit multiplicativity at small primes to deduce Chowla-type conjectures from Fourier uniformity conjectures, noting that the Chowla conjecture would give all the sign patterns one could hope for. The usual argument here uses the “entropy decrement argument” to eliminate a certain error term (involving the large but mean zero factor {p 1_{p|n}-1}). However the observation is that if there are extremely few sign patterns of length {k}, then the entropy decrement argument is unnecessary (there isn’t much entropy to begin with), and a more low-tech moment method argument (similar to the derivation of Chowla’s conjecture from Sarnak’s conjecture, as discussed for instance in this post) gives enough of Chowla’s conjecture to produce plenty of length {k} sign patterns. If there are not extremely few sign patterns of length {k} then we are done anyway. One quirk of this argument is that the sign patterns it produces may only appear exactly once; in contrast with preceding arguments, we were not able to produce a large number of sign patterns that each occur infinitely often.

The second application is to obtain cancellation for various polynomial averages involving the Liouville function {\lambda} or von Mangoldt function {\Lambda}, such as

\displaystyle  {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \lambda(n+P_2(m)) \dots \lambda(n+P_k(m))

or

\displaystyle  {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \Lambda(n+P_2(m)) \dots \Lambda(n+P_k(m))

where {P_1,\dots,P_k} are polynomials of degree at most {d}, no two of which differ by a constant (the latter is essential to avoid having to establish the Chowla or Hardy-Littlewood conjectures, which of course remain open). Results of this type were previously obtained by Tamar Ziegler and myself in the “true complexity zero” case when the polynomials {P} had distinct degrees, in which one could use the {k=0} theory of Matomäki and Radziwill; now that higher {k} is available at the scale {H=X^{1/d}} we can now remove this restriction.

A family {A_1,\dots,A_r} of sets for some {r \geq 1} is a sunflower if there is a core set {A_0} contained in each of the {A_i} such that the petal sets {A_i \backslash A_0, i=1,\dots,r} are disjoint. If {k,r \geq 1}, let {\mathrm{Sun}(k,r)} denote the smallest natural number with the property that any family of {\mathrm{Sun}(k,r)} distinct sets of cardinality at most {k} contains {r} distinct elements {A_1,\dots,A_r} that form a sunflower. The celebrated Erdös-Rado theorem asserts that {\mathrm{Sun}(k,r)} is finite; in fact Erdös and Rado gave the bounds

\displaystyle  (r-1)^k \leq \mathrm{Sun}(k,r) \leq (r-1)^k k! + 1. \ \ \ \ \ (1)

The sunflower conjecture asserts in fact that the upper bound can be improved to {\mathrm{Sun}(k,r) \leq O(1)^k r^k}. This remains open at present despite much effort (including a Polymath project); after a long series of improvements to the upper bound, the best general bound known currently is

\displaystyle  \mathrm{Sun}(k,r) \leq O( r \log(kr) )^k \ \ \ \ \ (2)

for all {k,r \geq 2}, established in 2019 by Rao (building upon a recent breakthrough a month previously of Alweiss, Lovett, Wu, and Zhang). Here we remove the easy cases {k=1} or {r=1} in order to make the logarithmic factor {\log(kr)} a little cleaner.

Rao’s argument used the Shannon noiseless coding theorem. It turns out that the argument can be arranged in the very slightly different language of Shannon entropy, and I would like to present it here. The argument proceeds by locating the core and petals of the sunflower separately (this strategy is also followed in Alweiss-Lovett-Wu-Zhang). In both cases the following definition will be key. In this post all random variables, such as random sets, will be understood to be discrete random variables taking values in a finite range. We always use boldface symbols to denote random variables, and non-boldface for deterministic quantities.

Definition 1 (Spread set) Let {R > 1}. A random set {{\bf A}} is said to be {R}-spread if one has

\displaystyle  {\mathbb P}( S \subset {\bf A}) \leq R^{-|S|}

for all sets {S}. A family {(A_i)_{i \in I}} of sets is said to be {R}-spread if {I} is non-empty and the random variable {A_{\bf i}} is {R}-spread, where {{\bf i}} is drawn uniformly from {I}.

The core can then be selected greedily in such a way that the remainder of a family becomes spread:

Lemma 2 (Locating the core) Let {(A_i)_{i \in I}} be a family of subsets of a finite set {X}, each of cardinality at most {k}, and let {R > 1}. Then there exists a “core” set {S_0} of cardinality at most {k} such that the set

\displaystyle  J := \{ i \in I: S_0 \subset A_i \} \ \ \ \ \ (3)

has cardinality at least {R^{-|S_0|} |I|}, and such that the family {(A_j \backslash S_0)_{j \in J}} is {R}-spread. Furthermore, if {|I| > R^k} and the {A_i} are distinct, then {|S_0| < k}.

Proof: We may assume {I} is non-empty, as the claim is trivial otherwise. For any {S \subset X}, define the quantity

\displaystyle  Q(S) := R^{|S|} |\{ i \in I: S \subset A_i\}|,

and let {S_0} be a subset of {X} that maximizes {Q(S_0)}. Since {Q(\emptyset) = |I| > 0} and {Q(S)=0} when {|S| >k}, we see that {0 \leq |S_0| \leq K}. If the {A_i} are distinct and {|I| > R^k}, then we also have {Q(S) \leq R^k < |I| = Q(\emptyset)} when {|S|=k}, thus in this case we have {|S_0| < k}.

Let {J} be the set (3). Since {Q(S_0) \geq Q(\emptyset)>0}, {J} is non-empty. It remains to check that the family {(A_j \backslash S_0)_{j \in J}} is {R}-spread. But for any {S \subset X} and {{\bf j}} drawn uniformly at random from {J} one has

\displaystyle  {\mathbb P}( S \subset A_{\bf j} \backslash S_0 ) = \frac{|\{ i \in I: S_0 \cup S \subset A_i\}|}{|\{ i \in I: S_0 \subset A_i\}|} = R^{|S_0|-|S_0 \cup S|} \frac{Q(S)}{Q(S_0)}.

Since {Q(S) \leq Q(S_0)} and {|S_0|-|S_0 \cup S| \geq - |S|}, we obtain the claim \Box

In view of the above lemma, the bound (2) will then follow from

Proposition 3 (Locating the petals) Let {r, k \geq 2} be natural numbers, and suppose that {R \geq C r \log(kr)} for a sufficiently large constant {C}. Let {(A_i)_{i \in I}} be a finite family of subsets of a finite set {X}, each of cardinality at most {k} which is {R}-spread. Then there exist {i_1,\dots,i_r \in I} such that {A_{i_1},\dots,A_{i_r}} is disjoint.

Indeed, to prove (2), we assume that {(A_i)_{i \in I}} is a family of sets of cardinality greater than {R^k} for some {R \geq Cr \log(kr)}; by discarding redundant elements and sets we may assume that {I} is finite and that all the {A_i} are contained in a common finite set {X}. Apply Lemma 2 to find a set {S_0 \subset X} of cardinality {|S_0| < k} such that the family {(A_j \backslash S_0)_{j \in J}} is {R}-spread. By Proposition 3 we can find {j_1,\dots,j_r \in J} such that {A_{j_1} \backslash S_0,\dots,A_{j_r} \backslash S_0} are disjoint; since these sets have cardinality {k - |S_0| > 0}, this implies that the {j_1,\dots,j_r} are distinct. Hence {A_{j_1},\dots,A_{j_r}} form a sunflower as required.

Remark 4 Proposition 3 is easy to prove if we strengthen the condition on {R} to {R > k(r-1)}. In this case, we have {\mathop{\bf P}_{i \in I}( x \in A_i) < 1/k(r-1)} for every {x \in X}, hence by the union bound we see that for any {i_1,\dots,i_j \in I} with {j \leq r-1} there exists {i_{j+1} \in I} such that {A_{i_{j+1}}} is disjoint from the set {A_{i_1} \cup \dots \cup A_{i_j}}, which has cardinality at most {k(r-1)}. Iterating this, we obtain the conclusion of Proposition 3 in this case. This recovers a bound of the form {\mathrm{Sun}(k,r) \leq (k(r-1))^k+1}, and by pursuing this idea a little further one can recover the original upper bound (1) of Erdös and Rado.

It remains to prove Proposition 3. In fact we can locate the petals one at a time, placing each petal inside a random set.

Proposition 5 (Locating a single petal) Let the notation and hypotheses be as in Proposition 3. Let {{\bf V}} be a random subset of {X}, such that each {x \in X} lies in {{\bf V}} with an independent probability of {1/r}. Then with probability greater than {1-1/r}, {{\bf V}} contains one of the {A_i}.

To see that Proposition 5 implies Proposition 3, we randomly partition {X} into {{\bf V}_1 \cup \dots \cup {\bf V}_r} by placing each {x \in X} into one of the {{\bf V}_j}, {j=1,\dots,r} chosen uniformly and independently at random. By Proposition 5 and the union bound, we see that with positive probability, it is simultaneously true for all {j=1,\dots,r} that each {{\bf V}_j} contains one of the {A_i}. Selecting one such {A_i} for each {{\bf V}_j}, we obtain the required disjoint petals.

We will prove Proposition 5 by gradually increasing the density of the random set and arranging the sets {A_i} to get quickly absorbed by this random set. The key iteration step is

Proposition 6 (Refinement inequality) Let {R > 1} and {0 < \delta < 1}. Let {{\bf A}} be a random subset of a finite set {X} which is {R}-spread, and let {{\bf V}} be a random subset of {X} independent of {{\bf A}}, such that each {x \in X} lies in {{\bf V}} with an independent probability of {\delta}. Then there exists another random subset {{\bf A}'} of {X} with the same distribution as {{\bf A}}, such that {{\bf A}' \backslash {\bf V} \subset {\bf A}} and

\displaystyle  {\mathbb E} |{\bf A}' \backslash {\bf V}| \leq \frac{5}{\log(R\delta)} {\mathbb E} |{\bf A}|.

Note that a direct application of the first moment method gives only the bound

\displaystyle  {\mathbb E} |{\bf A} \backslash {\bf V}| \leq (1-\delta) {\mathbb E} |{\bf A}|,

but the point is that by switching from {{\bf A}} to an equivalent {{\bf A}'} we can replace the {1-\delta} factor by a quantity significantly smaller than {1}.

One can iterate the above proposition, repeatedly replacing {{\bf A}, X} with {{\bf A}' \backslash {\bf V}, X \backslash {\bf V}} (noting that this preserves the {R}-spread nature {{\bf A}}) to conclude

Corollary 7 (Iterated refinement inequality) Let {R > 1}, {0 < \delta < 1}, and {m \geq 1}. Let {{\bf A}} be a random subset of a finite set {X} which is {R}-spread, and let {{\bf V}} be a random subset of {X} independent of {{\bf A}}, such that each {x \in X} lies in {{\bf V}} with an independent probability of {1-(1-\delta)^m}. Then there exists another random subset {{\bf A}'} of {X} with the same distribution as {{\bf A}}, such that

\displaystyle  {\mathbb E} |{\bf A}' \backslash {\bf V}| \leq (\frac{5}{\log(R\delta)})^m {\mathbb E} |{\bf A}|.

Now we can prove Proposition 5. Let {m} be chosen shortly. Applying Corollary 7 with {{\bf A}} drawn uniformly at random from the {(A_i)_{i \in I}}, and setting {1-(1-\delta)^m = 1/r}, or equivalently {\delta = 1 - (1 - 1/r)^{1/m}}, we have

\displaystyle  {\mathbb E} |{\bf A}' \backslash {\bf V}| \leq (\frac{5}{\log(R\delta)})^m k.

In particular, if we set {m = \lceil \log kr \rceil}, so that {\delta \sim \frac{1}{r \log kr}}, then by choice of {R} we have {\frac{5}{\log(R\delta)} < \frac{1}{2}}, hence

\displaystyle  {\mathbb E} |{\bf A}' \backslash {\bf V}| < \frac{1}{r}.

In particular with probability at least {1 - \frac{1}{r}}, there must exist {A_i} such that {|A_i \backslash {\bf V}| = 0}, giving the proposition.

It remains to establish Proposition 6. This is the difficult step, and requires a clever way to find the variant {{\bf A}'} of {{\bf A}} that has better containment properties in {{\bf V}} than {{\bf A}} does. The main trick is to make a conditional copy {({\bf A}', {\bf V}')} of {({\bf A}, {\bf V})} that is conditionally independent of {({\bf A}, {\bf V})} subject to the constraint {{\bf A} \cup {\bf V} = {\bf A}' \cup {\bf V}'}. The point here is that this constrant implies the inclusions

\displaystyle  {\bf A}' \backslash {\bf V} \subset {\bf A} \cap {\bf A}' \subset \subset {\bf A} \ \ \ \ \ (4)

and

\displaystyle  {\bf A}' \backslash {\bf A} \subset {\bf V}. \ \ \ \ \ (5)

Because of the {R}-spread hypothesis, it is hard for {{\bf A}} to contain any fixed large set. If we could apply this observation in the contrapositive to {{\bf A} \cap {\bf A}'} we could hope to get a good upper bound on the size of {{\bf A} \cap {\bf A}'} and hence on {{\bf A} \backslash {\bf V}} thanks to (4). One can also hope to improve such an upper bound by also employing (5), since it is also hard for the random set {{\bf V}} to contain a fixed large set. There are however difficulties with implementing this approach due to the fact that the random sets {{\bf A} \cap {\bf A}', {\bf A}' \backslash {\bf A}} are coupled with {{\bf A}, {\bf V}} in a moderately complicated fashion. In Rao’s argument a somewhat complicated encoding scheme was created to give information-theoretic control on these random variables; below thefold we accomplish a similar effect by using Shannon entropy inequalities in place of explicit encoding. A certain amount of information-theoretic sleight of hand is required to decouple certain random variables to the extent that the Shannon inequalities can be effectively applied. The argument bears some resemblance to the “entropy compression method” discussed in this previous blog post; there may be a way to more explicitly express the argument below in terms of that method. (There is also some kinship with the method of dependent random choice, which is used for instance to establish the Balog-Szemerédi-Gowers lemma, and was also translated into information theoretic language in these unpublished notes of Van Vu and myself.)

Read the rest of this entry »

Kari Astala, Steffen Rohde, Eero Saksman and I have (finally!) uploaded to the arXiv our preprint “Homogenization of iterated singular integrals with applications to random quasiconformal maps“. This project started (and was largely completed) over a decade ago, but for various reasons it was not finalised until very recently. The motivation for this project was to study the behaviour of “random” quasiconformal maps. Recall that a (smooth) quasiconformal map is a homeomorphism {f: {\bf C} \rightarrow {\bf C}} that obeys the Beltrami equation

\displaystyle  \frac{\partial f}{\partial \overline{z}} = \mu \frac{\partial f}{\partial z}

for some Beltrami coefficient {\mu: {\bf C} \rightarrow D(0,1)}; this can be viewed as a deformation of the Cauchy-Riemann equation {\frac{\partial f}{\partial \overline{z}} = 0}. Assuming that {f(z)} is asymptotic to {z} at infinity, one can (formally, at least) solve for {f} in terms of {\mu} using the Beurling transform

\displaystyle  Tf(z) := \frac{\partial}{\partial z}(\frac{\partial f}{\partial \overline{z}})^{-1}(z) = -\frac{1}{\pi} p.v. \int_{\bf C} \frac{f(w)}{(w-z)^2}\ dw

by the Neumann series

\displaystyle  \frac{\partial f}{\partial \overline{z}} = \mu + \mu T \mu + \mu T \mu T \mu + \dots.

We looked at the question of the asymptotic behaviour of {f} if {\mu = \mu_\delta} is a random field that oscillates at some fine spatial scale {\delta>0}. A simple model to keep in mind is

\displaystyle  \mu_\delta(z) = \varphi(z) \sum_{n \in {\bf Z}^2} \epsilon_n 1_{n\delta + [0,\delta]^2}(z) \ \ \ \ \ (1)

where {\epsilon_n = \pm 1} are independent random signs and {\varphi: {\bf C} \rightarrow D(0,1)} is a bump function. For models such as these, we show that a homogenisation occurs in the limit {\delta \rightarrow 0}; each multilinear expression

\displaystyle  \mu_\delta T \mu_\delta \dots T \mu_\delta \ \ \ \ \ (2)

converges weakly in probability (and almost surely, if we restrict {\delta} to a lacunary sequence) to a deterministic limit, and the associated quasiconformal map {f = f_\delta} similarly converges weakly in probability (or almost surely). (Results of this latter type were also recently obtained by Ivrii and Markovic by a more geometric method which is simpler, but is applied to a narrower class of Beltrami coefficients.) In the specific case (1), the limiting quasiconformal map is just the identity map {f(z)=z}, but if for instance replaces the {\epsilon_n} by non-symmetric random variables then one can have significantly more complicated limits. The convergence theorem for multilinear expressions such as is not specific to the Beurling transform {T}; any other translation and dilation invariant singular integral can be used here.

The random expression (2) is somewhat reminiscent of a moment of a random matrix, and one can start computing it analogously. For instance, if one has a decomposition {\mu_\delta = \sum_{n \in {\bf Z}^2} \mu_{\delta,n}} such as (1), then (2) expands out as a sum

\displaystyle  \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mu_{\delta,n_1} T \mu_{\delta,n_2} \dots T \mu_{\delta,n_k}

The random fluctuations of this sum can be treated by a routine second moment estimate, and the main task is to show that the expected value

\displaystyle  \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} \dots T \mu_{\delta,n_k}) \ \ \ \ \ (3)

becomes asymptotically independent of {\delta}.

If all the {n_1,\dots,n_k} were distinct then one could use independence to factor the expectation to get

\displaystyle  \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1}) T \mathop{\bf E}(\mu_{\delta,n_2}) \dots T \mathop{\bf E}(\mu_{\delta,n_k})

which is a relatively straightforward expression to calculate (particularly in the model (1), where all the expectations here in fact vanish). The main difficulty is that there are a number of configurations in (3) in which various of the {n_j} collide with each other, preventing one from easily factoring the expression. A typical problematic contribution for instance would be a sum of the form

\displaystyle  \sum_{n_1,n_2 \in {\bf Z}^2: n_1 \neq n_2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_1} T \mu_{\delta,n_2}). \ \ \ \ \ (4)

This is an example of what we call a non-split sum. This can be compared with the split sum

\displaystyle  \sum_{n_1,n_2 \in {\bf Z}^2: n_1 \neq n_2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_2}). \ \ \ \ \ (5)

If we ignore the constraint {n_1 \neq n_2} in the latter sum, then it splits into

\displaystyle  f_\delta T g_\delta

where

\displaystyle  f_\delta := \sum_{n_1 \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_1})

and

\displaystyle  g_\delta := \sum_{n_2 \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_2} T \mu_{\delta,n_2})

and one can hope to treat this sum by an induction hypothesis. (To actually deal with constraints such as {n_1 \neq n_2} requires an inclusion-exclusion argument that creates some notational headaches but is ultimately manageable.) As the name suggests, the non-split configurations such as (4) cannot be factored in this fashion, and are the most difficult to handle. A direct computation using the triangle inequality (and a certain amount of combinatorics and induction) reveals that these sums are somewhat localised, in that dyadic portions such as

\displaystyle  \sum_{n_1,n_2 \in {\bf Z}^2: |n_1 - n_2| \sim R} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_1} T \mu_{\delta,n_2})

exhibit power decay in {R} (when measured in suitable function space norms), basically because of the large number of times one has to transition back and forth between {n_1} and {n_2}. Thus, morally at least, the dominant contribution to a non-split sum such as (4) comes from the local portion when {n_2=n_1+O(1)}. From the translation and dilation invariance of {T} this type of expression then simplifies to something like

\displaystyle  \varphi(z)^4 \sum_{n \in {\bf Z}^2} \eta( \frac{n-z}{\delta} )

(plus negligible errors) for some reasonably decaying function {\eta}, and this can be shown to converge to a weak limit as {\delta \rightarrow 0}.

In principle all of these limits are computable, but the combinatorics is remarkably complicated, and while there is certainly some algebraic structure to the calculations, it does not seem to be easily describable in terms of an existing framework (e.g., that of free probability).

This set of notes discusses aspects of one of the oldest questions in Fourier analysis, namely the nature of convergence of Fourier series.

If {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} is an absolutely integrable function, its Fourier coefficients {\hat f: {\bf Z} \rightarrow {\bf C}} are defined by the formula

\displaystyle  \hat f(n) := \int_{{\bf R}/{\bf Z}} f(x) e^{-2\pi i nx}\ dx.

If {f} is smooth, then the Fourier coefficients {\hat f} are absolutely summable, and we have the Fourier inversion formula

\displaystyle  f(x) = \sum_{n \in {\bf Z}} \hat f(n) e^{2\pi i nx}

where the series here is uniformly convergent. In particular, if we define the partial summation operators

\displaystyle  S_N f(x) := \sum_{|n| \leq N} \hat f(n) e^{2\pi i nx}

then {S_N f} converges uniformly to {f} when {f} is smooth.

What if {f} is not smooth, but merely lies in an {L^p({\bf R}/{\bf Z})} class for some {1 \leq p \leq \infty}? The Fourier coefficients {\hat f} remain well-defined, as do the partial summation operators {S_N}. The question of convergence in norm is relatively easy to settle:

Exercise 1
  • (i) If {1 < p < \infty} and {f \in L^p({\bf R}/{\bf Z})}, show that {S_N f} converges in {L^p({\bf R}/{\bf Z})} norm to {f}. (Hint: first use the boundedness of the Hilbert transform to show that {S_N} is bounded in {L^p({\bf R}/{\bf Z})} uniformly in {N}.)
  • (ii) If {p=1} or {p=\infty}, show that there exists {f \in L^p({\bf R}/{\bf Z})} such that the sequence {S_N f} is unbounded in {L^p({\bf R}/{\bf Z})} (so in particular it certainly does not converge in {L^p({\bf R}/{\bf Z})} norm to {f}. (Hint: first show that {S_N} is not bounded in {L^p({\bf R}/{\bf Z})} uniformly in {N}, then apply the uniform boundedness principle in the contrapositive.)

The question of pointwise almost everywhere convergence turned out to be a significantly harder problem:

Theorem 2 (Pointwise almost everywhere convergence)
  • (i) (Kolmogorov, 1923) There exists {f \in L^1({\bf R}/{\bf Z})} such that {S_N f(x)} is unbounded in {N} for almost every {x}.
  • (ii) (Carleson, 1966; conjectured by Lusin, 1913) For every {f \in L^2({\bf R}/{\bf Z})}, {S_N f(x)} converges to {f(x)} as {N \rightarrow \infty} for almost every {x}.
  • (iii) (Hunt, 1967) For every {1 < p \leq \infty} and {f \in L^p({\bf R}/{\bf Z})}, {S_N f(x)} converges to {f(x)} as {N \rightarrow \infty} for almost every {x}.

Note from Hölder’s inequality that {L^2({\bf R}/{\bf Z})} contains {L^p({\bf R}/{\bf Z})} for all {p\geq 2}, so Carleson’s theorem covers the {p \geq 2} case of Hunt’s theorem. We remark that the precise threshold near {L^1} between Kolmogorov-type divergence results and Carleson-Hunt pointwise convergence results, in the category of Orlicz spaces, is still an active area of research; see this paper of Lie for further discussion.

Carleson’s theorem in particular was a surprisingly difficult result, lying just out of reach of classical methods (as we shall see later, the result is much easier if we smooth either the function {f} or the summation method {S_N} by a tiny bit). Nowadays we realise that the reason for this is that Carleson’s theorem essentially contains a frequency modulation symmetry in addition to the more familiar translation symmetry and dilation symmetry. This basically rules out the possibility of attacking Carleson’s theorem with tools such as Calderón-Zygmund theory or Littlewood-Paley theory, which respect the latter two symmetries but not the former. Instead, tools from “time-frequency analysis” that essentially respect all three symmetries should be employed. We will illustrate this by giving a relatively short proof of Carleson’s theorem due to Lacey and Thiele. (There are other proofs of Carleson’s theorem, including Carleson’s original proof, its modification by Hunt, and a later time-frequency proof by Fefferman; see Remark 18 below.)

Read the rest of this entry »

In contrast to previous notes, in this set of notes we shall focus exclusively on Fourier analysis in the one-dimensional setting {d=1} for simplicity of notation, although all of the results here have natural extensions to higher dimensions. Depending on the physical context, one can view the physical domain {{\bf R}} as representing either space or time; we will mostly think in terms of the former interpretation, even though the standard terminology of “time-frequency analysis”, which we will make more prominent use of in later notes, clearly originates from the latter.

In previous notes we have often performed various localisations in either physical space or Fourier space {{\bf R}}, for instance in order to take advantage of the uncertainty principle. One can formalise these operations in terms of the functional calculus of two basic operations on Schwartz functions {{\mathcal S}({\bf R})}, the position operator {X: {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})} defined by

\displaystyle  (Xf)(x) := x f(x)

and the momentum operator {D: {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})}, defined by

\displaystyle  (Df)(x) := \frac{1}{2\pi i} \frac{d}{dx} f(x). \ \ \ \ \ (1)

(The terminology comes from quantum mechanics, where it is customary to also insert a small constant {h} on the right-hand side of (1) in accordance with de Broglie’s law. Such a normalisation is also used in several branches of mathematics, most notably semiclassical analysis and microlocal analysis, where it becomes profitable to consider the semiclassical limit {h \rightarrow 0}, but we will not emphasise this perspective here.) The momentum operator can be viewed as the counterpart to the position operator, but in frequency space instead of physical space, since we have the standard identity

\displaystyle  \widehat{Df}(\xi) = \xi \hat f(\xi)

for any {\xi \in {\bf R}} and {f \in {\mathcal S}({\bf R})}. We observe that both operators {X,D} are formally self-adjoint in the sense that

\displaystyle  \langle Xf, g \rangle = \langle f, Xg \rangle; \quad \langle Df, g \rangle = \langle f, Dg \rangle

for all {f,g \in {\mathcal S}({\bf R})}, where we use the {L^2({\bf R})} Hermitian inner product

\displaystyle  \langle f, g\rangle := \int_{\bf R} f(x) \overline{g(x)}\ dx.

Clearly, for any polynomial {P(x)} of one real variable {x} (with complex coefficients), the operator {P(X): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})} is given by the spatial multiplier operator

\displaystyle  (P(X) f)(x) = P(x) f(x)

and similarly the operator {P(D): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})} is given by the Fourier multiplier operator

\displaystyle  \widehat{P(D) f}(\xi) = P(\xi) \hat f(\xi).

Inspired by this, if {m: {\bf R} \rightarrow {\bf C}} is any smooth function that obeys the derivative bounds

\displaystyle  \frac{d^j}{dx^j} m(x) \lesssim_{m,j} \langle x \rangle^{O_{m,j}(1)} \ \ \ \ \ (2)

for all {j \geq 0} and {x \in {\bf R}} (that is to say, all derivatives of {m} grow at most polynomially), then we can define the spatial multiplier operator {m(X): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})} by the formula

\displaystyle  (m(X) f)(x) := m(x) f(x);

one can easily verify from several applications of the Leibniz rule that {m(X)} maps Schwartz functions to Schwartz functions. We refer to {m(x)} as the symbol of this spatial multiplier operator. In a similar fashion, we define the Fourier multiplier operator {m(D)} associated to the symbol {m(\xi)} by the formula

\displaystyle  \widehat{m(D) f}(\xi) := m(\xi) \hat f(\xi).

For instance, any constant coefficient linear differential operators {\sum_{k=0}^n c_k \frac{d^k}{dx^k}} can be written in this notation as

\displaystyle \sum_{k=0}^n c_k \frac{d^k}{dx^k} =\sum_{k=0}^n c_k (2\pi i D)^k;

however there are many Fourier multiplier operators that are not of this form, such as fractional derivative operators {\langle D \rangle^s = (1- \frac{1}{4\pi^2} \frac{d^2}{dx^2})^{s/2}} for non-integer values of {s}, which is a Fourier multiplier operator with symbol {\langle \xi \rangle^s}. It is also very common to use spatial cutoffs {\psi(X)} and Fourier cutoffs {\psi(D)} for various bump functions {\psi} to localise functions in either space or frequency; we have seen several examples of such cutoffs in action in previous notes (often in the higher dimensional setting {d>1}).

We observe that the maps {m \mapsto m(X)} and {m \mapsto m(D)} are ring homomorphisms, thus for instance

\displaystyle  (m_1 + m_2)(D) = m_1(D) + m_2(D)

and

\displaystyle  (m_1 m_2)(D) = m_1(D) m_2(D)

for any {m_1,m_2} obeying the derivative bounds (2); also {m(D)} is formally adjoint to {\overline{m(D)}} in the sense that

\displaystyle  \langle m(D) f, g \rangle = \langle f, \overline{m}(D) g \rangle

for {f,g \in {\mathcal S}({\bf R})}, and similarly for {m(X)} and {\overline{m}(X)}. One can interpret these facts as part of the functional calculus of the operators {X,D}, which can be interpreted as densely defined self-adjoint operators on {L^2({\bf R})}. However, in this set of notes we will not develop the spectral theory necessary in order to fully set out this functional calculus rigorously.

In the field of PDE and ODE, it is also very common to study variable coefficient linear differential operators

\displaystyle  \sum_{k=0}^n c_k(x) \frac{d^k}{dx^k} \ \ \ \ \ (3)

where the {c_0,\dots,c_n} are now functions of the spatial variable {x} obeying the derivative bounds (2). A simple example is the quantum harmonic oscillator Hamiltonian {-\frac{d^2}{dx^2} + x^2}. One can rewrite this operator in our notation as

\displaystyle  \sum_{k=0}^n c_k(X) (2\pi i D)^k

and so it is natural to interpret this operator as a combination {a(X,D)} of both the position operator {X} and the momentum operator {D}, where the symbol {a: {\bf R} \times {\bf R} \rightarrow {\bf C}} this operator is the function

\displaystyle  a(x,\xi) := \sum_{k=0}^n c_k(x) (2\pi i \xi)^k. \ \ \ \ \ (4)

Indeed, from the Fourier inversion formula

\displaystyle  f(x) = \int_{\bf R} \hat f(\xi) e^{2\pi i x \xi}\ d\xi

for any {f \in {\mathcal S}({\bf R})} we have

\displaystyle  (2\pi i D)^k f(x) = \int_{\bf R} (2\pi i \xi)^k \hat f(\xi) e^{2\pi i x \xi}\ d\xi

and hence on multiplying by {c_k(x)} and summing we have

\displaystyle (\sum_{k=0}^n c_k(X) (2\pi i D)^k) f(x) = \int_{\bf R} a(x,\xi) \hat f(\xi) e^{2\pi i x \xi}\ d\xi.

Inspired by this, we can introduce the Kohn-Nirenberg quantisation by defining the operator {a(X,D) = a_{KN}(X,D): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})} by the formula

\displaystyle  a(X,D) f(x) = \int_{\bf R} a(x,\xi) \hat f(\xi) e^{2\pi i x \xi}\ d\xi \ \ \ \ \ (5)

whenever {f \in {\mathcal S}({\bf R})} and {a: {\bf R} \times {\bf R} \rightarrow {\bf C}} is any smooth function obeying the derivative bounds

\displaystyle  \frac{\partial^j}{\partial x^j} \frac{\partial^l}{\partial \xi^l} a(x,\xi) \lesssim_{a,j,l} \langle x \rangle^{O_{a,j}(1)} \langle \xi \rangle^{O_{a,j,l}(1)} \ \ \ \ \ (6)

for all {j,l \geq 0} and {x \in {\bf R}} (note carefully that the exponent in {x} on the right-hand side is required to be uniform in {l}). This quantisation clearly generalises both the spatial multiplier operators {m(X)} and the Fourier multiplier operators {m(D)} defined earlier, which correspond to the cases when the symbol {a(x,\xi)} is a function of {x} only or {\xi} only respectively. Thus we have combined the physical space {{\bf R} = \{ x: x \in {\bf R}\}} and the frequency space {{\bf R} = \{ \xi: \xi \in {\bf R}\}} into a single domain, known as phase space {{\bf R} \times {\bf R} = \{ (x,\xi): x,\xi \in {\bf R} \}}. The term “time-frequency analysis” encompasses analysis based on decompositions and other manipulations of phase space, in much the same way that “Fourier analysis” encompasses analysis based on decompositions and other manipulations of frequency space. We remark that the Kohn-Nirenberg quantization is not the only choice of quantization one could use; see Remark 19 below.

Exercise 1

  • (i) Show that for {a} obeying (6), that {a(X,D)} does indeed map {{\mathcal S}({\bf R})} to {{\mathcal S}({\bf R})}.
  • (ii) Show that the symbol {a} is uniquely determined by the operator {a(X,D)}. That is to say, if {a,b} are two functions obeying (6) with {a(X,D) f = b(X,D) f} for all {f \in {\mathcal S}({\bf R})}, then {a=b}. (Hint: apply {a(X,D)-b(X,D)} to a suitable truncation of a plane wave {x \mapsto e^{2\pi i x \xi}} and then take limits.)

In principle, the quantisations {a(X,D)} are potentially very useful for such tasks as inverting variable coefficient linear operators, or to localize a function simultaneously in physical and Fourier space. However, a fundamental difficulty arises: map from symbols {a} to operators {a(X,D)} is now no longer a ring homomorphism, in particular

\displaystyle  (a_1 a_2)(X,D) \neq a_1(X,D) a_2(X,D) \ \ \ \ \ (7)

in general. Fundamentally, this is due to the fact that pointwise multiplication of symbols is a commutative operation, whereas the composition of operators such as {X} and {D} does not necessarily commute. This lack of commutativity can be measured by introducing the commutator

\displaystyle  [A,B] := AB - BA

of two operators {A,B}, and noting from the product rule that

\displaystyle  [X,D] = -\frac{1}{2\pi i} \neq 0.

(In the language of Lie groups and Lie algebras, this tells us that {X,D} are (up to complex constants) the standard Lie algebra generators of the Heisenberg group.) From a quantum mechanical perspective, this lack of commutativity is the root cause of the uncertainty principle that prevents one from simultaneously localizing in both position and momentum past a certain point. Here is one basic way of formalising this principle:

Exercise 2 (Heisenberg uncertainty principle) For any {x_0, \xi_0 \in {\bf R}} and {f \in \mathcal{S}({\bf R})}, show that

\displaystyle  \| (X-x_0) f \|_{L^2({\bf R})} \| (D-\xi_0) f\|_{L^2({\bf R})} \geq \frac{1}{4\pi} \|f\|_{L^2({\bf R})}^2.

(Hint: evaluate the expression {\langle [X-x_0, D - \xi_0] f, f \rangle} in two different ways and apply the Cauchy-Schwarz inequality.) Informally, this exercise asserts that the spatial uncertainty {\Delta x} and the frequency uncertainty {\Delta \xi} of a function obey the Heisenberg uncertainty relation {\Delta x \Delta \xi \gtrsim 1}.

Nevertheless, one still has the correspondence principle, which asserts that in certain regimes (which, with our choice of normalisations, corresponds to the high-frequency regime), quantum mechanics continues to behave like a commutative theory, and one can sometimes proceed as if the operators {X,D} (and the various operators {a(X,D)} constructed from them) commute up to “lower order” errors. This can be formalised using the pseudodifferential calculus, which we give below the fold, in which we restrict the symbol {a} to certain “symbol classes” of various orders (which then restricts {a(X,D)} to be pseudodifferential operators of various orders), and obtains approximate identities such as

\displaystyle  (a_1 a_2)(X,D) \approx a_1(X,D) a_2(X,D)

where the error between the left and right-hand sides is of “lower order” and can in fact enjoys a useful asymptotic expansion. As a first approximation to this calculus, one can think of functions {f \in {\mathcal S}({\bf R})} as having some sort of “phase space portrait{\tilde f(x,\xi)} which somehow combines the physical space representation {x \mapsto f(x)} with its Fourier representation {\xi \mapsto f(\xi)}, and pseudodifferential operators {a(X,D)} behave approximately like “phase space multiplier operators” in this representation in the sense that

\displaystyle  \widetilde{a(X,D) f}(x,\xi) \approx a(x,\xi) \tilde f(x,\xi).

Unfortunately the uncertainty principle (or the non-commutativity of {X} and {D}) prevents us from making these approximations perfectly precise, and it is not always clear how to even define a phase space portrait {\tilde f} of a function {f} precisely (although there are certain popular candidates for such a portrait, such as the FBI transform (also known as the Gabor transform in signal processing literature), or the Wigner quasiprobability distribution, each of which have some advantages and disadvantages). Nevertheless even if the concept of a phase space portrait is somewhat fuzzy, it is of great conceptual benefit both within mathematics and outside of it. For instance, the musical score one assigns a piece of music can be viewed as a phase space portrait of the sound waves generated by that music.

To complement the pseudodifferential calculus we have the basic Calderón-Vaillancourt theorem, which asserts that pseudodifferential operators of order zero are Calderón-Zygmund operators and thus bounded on {L^p({\bf R})} for {1 < p < \infty}. The standard proof of this theorem is a classic application of one of the basic techniques in harmonic analysis, namely the exploitation of almost orthogonality; the proof we will give here will achieve this through the elegant device of the Cotlar-Stein lemma.

Pseudodifferential operators (especially when generalised to higher dimensions {d \geq 1}) are a fundamental tool in the theory of linear PDE, as well as related fields such as semiclassical analysis, microlocal analysis, and geometric quantisation. There is an even wider class of operators that is also of interest, namely the Fourier integral operators, which roughly speaking not only approximately multiply the phase space portrait {\tilde f(x,\xi)} of a function by some multiplier {a(x,\xi)}, but also move the portrait around by a canonical transformation. However, the development of theory of these operators is beyond the scope of these notes; see for instance the texts of Hormander or Eskin.

This set of notes is only the briefest introduction to the theory of pseudodifferential operators. Many texts are available that cover the theory in more detail, for instance this text of Taylor.

Read the rest of this entry »

The square root cancellation heuristic, briefly mentioned in the preceding set of notes, predicts that if a collection {z_1,\dots,z_n} of complex numbers have phases that are sufficiently “independent” of each other, then

\displaystyle  |\sum_{j=1}^n z_j| \approx (\sum_{j=1}^n |z_j|^2)^{1/2};

similarly, if {f_1,\dots,f_n} are a collection of functions in a Lebesgue space {L^p(X,\mu)} that oscillate “independently” of each other, then we expect

\displaystyle  \| \sum_{j=1}^n f_j \|_{L^p(X,\mu)} \approx \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p(X,\mu)}.

We have already seen one instance in which this heuristic can be made precise, namely when the phases of {z_j,f_j} are randomised by a random sign, so that Khintchine’s inequality (Lemma 4 from Notes 1) can be applied. There are other contexts in which a square function estimate

\displaystyle  \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p(X,\mu)} \lesssim \| \sum_{j=1}^n f_j \|_{L^p(X,\mu)}

or a reverse square function estimate

\displaystyle  \| \sum_{j=1}^n f_j \|_{L^p(X,\mu)} \lesssim \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p(X,\mu)}

(or both) are known or conjectured to hold. For instance, the useful Littlewood-Paley inequality implies (among other things) that for any {1 < p < \infty}, we have the reverse square function estimate

\displaystyle  \| \sum_{j=1}^n f_j \|_{L^p({\bf R}^d)} \lesssim_{p,d} \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p({\bf R}^d)}, \ \ \ \ \ (1)

whenever the Fourier transforms {\hat f_j} of the {f_j} are supported on disjoint annuli {\{ \xi \in {\bf R}^d: 2^{k_j} \leq |\xi| < 2^{k_j+1} \}}, and we also have the matching square function estimate

\displaystyle  \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p({\bf R}^d)} \lesssim_{p,d} \| \sum_{j=1}^n f_j \|_{L^p({\bf R}^d)}

if there is some separation between the annuli (for instance if the {k_j} are {2}-separated). We recall the proofs of these facts below the fold. In the {p=2} case, we of course have Pythagoras’ theorem, which tells us that if the {f_j} are all orthogonal elements of {L^2(X,\mu)}, then

\displaystyle  \| \sum_{j=1}^n f_j \|_{L^2(X,\mu)} = (\sum_{j=1}^n \| f_j \|_{L^2(X,\mu)}^2)^{1/2} = \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^2(X,\mu)}.

In particular, this identity holds if the {f_j \in L^2({\bf R}^d)} have disjoint Fourier supports in the sense that their Fourier transforms {\hat f_j} are supported on disjoint sets. For {p=4}, the technique of bi-orthogonality can also give square function and reverse square function estimates in some cases, as we shall also see below the fold.
In recent years, it has begun to be realised that in the regime {p > 2}, a variant of reverse square function estimates such as (1) is also useful, namely decoupling estimates such as

\displaystyle  \| \sum_{j=1}^n f_j \|_{L^p({\bf R}^d)} \lesssim_{p,d} (\sum_{j=1}^n \|f_j\|_{L^p({\bf R}^d)}^2)^{1/2} \ \ \ \ \ (2)

(actually in practice we often permit small losses such as {n^\varepsilon} on the right-hand side). An estimate such as (2) is weaker than (1) when {p\geq 2} (or equal when {p=2}), as can be seen by starting with the triangle inequality

\displaystyle  \| \sum_{j=1}^n |f_j|^2 \|_{L^{p/2}({\bf R}^d)} \leq \sum_{j=1}^n \| |f_j|^2 \|_{L^{p/2}({\bf R}^d)},

and taking the square root of both side to conclude that

\displaystyle  \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p({\bf R}^d)} \leq (\sum_{j=1}^n \|f_j\|_{L^p({\bf R}^d)}^2)^{1/2}. \ \ \ \ \ (3)

However, the flip side of this weakness is that (2) can be easier to prove. One key reason for this is the ability to iterate decoupling estimates such as (2), in a way that does not seem to be possible with reverse square function estimates such as (1). For instance, suppose that one has a decoupling inequality such as (2), and furthermore each {f_j} can be split further into components {f_j= \sum_{k=1}^m f_{j,k}} for which one has the decoupling inequalities

\displaystyle  \| \sum_{k=1}^m f_{j,k} \|_{L^p({\bf R}^d)} \lesssim_{p,d} (\sum_{k=1}^m \|f_{j,k}\|_{L^p({\bf R}^d)}^2)^{1/2}.

Then by inserting these bounds back into (2) we see that we have the combined decoupling inequality

\displaystyle  \| \sum_{j=1}^n\sum_{k=1}^m f_{j,k} \|_{L^p({\bf R}^d)} \lesssim_{p,d} (\sum_{j=1}^n \sum_{k=1}^m \|f_{j,k}\|_{L^p({\bf R}^d)}^2)^{1/2}.

This iterative feature of decoupling inequalities means that such inequalities work well with the method of induction on scales, that we introduced in the previous set of notes.
In fact, decoupling estimates share many features in common with restriction theorems; in addition to induction on scales, there are several other techniques that first emerged in the restriction theory literature, such as wave packet decompositions, rescaling, and bilinear or multilinear reductions, that turned out to also be well suited to proving decoupling estimates. As with restriction, the curvature or transversality of the different Fourier supports of the {f_j} will be crucial in obtaining non-trivial estimates.
Strikingly, in many important model cases, the optimal decoupling inequalities (except possibly for epsilon losses in the exponents) are now known. These estimates have in turn had a number of important applications, such as establishing certain discrete analogues of the restriction conjecture, or the first proof of the main conjecture for Vinogradov mean value theorems in analytic number theory.
These notes only serve as a brief introduction to decoupling. A systematic exploration of this topic can be found in this recent text of Demeter.
Read the rest of this entry »

I was greatly saddened to learn that John Conway died yesterday from COVID-19, aged 82.

My own mathematical areas of expertise are somewhat far from Conway’s; I have played for instance with finite simple groups on occasion, but have not studied his work on moonshine and the monster group.  But I have certainly encountered his results every so often in surprising contexts; most recently, when working on the Collatz conjecture, I looked into Conway’s wonderfully preposterous FRACTRAN language, which can encode any Turing machine as an iteration of a Collatz-type map, showing in particular that there are generalisations of the Collatz conjecture that are undecidable in axiomatic frameworks such as ZFC.  [EDIT: also, my belief that the Navier-Stokes equations admit solutions that blow up in finite time is also highly influenced by the ability of Conway’s game of life to generate self-replicating “von Neumann machines“.]

I first met John as an incoming graduate student in Princeton in 1992; indeed, a talk he gave, on “Extreme proofs” (proofs that are in some sense “extreme points” in the “convex hull” of all proofs of a given result), may well have been the first research-level talk I ever attended, and one that set a high standard for all the subsequent talks I went to, with Conway’s ability to tease out deep and interesting mathematics from seemingly frivolous questions making a particular impact on me.  (Some version of this talk eventually became this paper of Conway and Shipman many years later.)

Conway was fond of hanging out in the Princeton graduate lounge at the time of my studies there, often tinkering with some game or device, and often enlisting any nearby graduate students to assist him with some experiment or other.  I have a vague memory of being drafted into holding various lengths of cloth with several other students in order to compute some element of a braid group; on another occasion he challenged me to a board game he recently invented (now known as “Phutball“) with Elwyn Berlekamp and Richard Guy (who, by sad coincidence, both also passed away in the last 12 months).  I still remember being repeatedly obliterated in that game, which was a healthy and needed lesson in humility for me (and several of my fellow graduate students) at the time.  I also recall Conway spending several weeks trying to construct a strange periscope-type device to try to help him visualize four-dimensional objects by giving his eyes vertical parallax in addition to the usual horizontal parallax, although he later told me that the only thing the device made him experience was a headache.

About ten years ago we ran into each other at some large mathematics conference, and lacking any other plans, we had a pleasant dinner together at the conference hotel.  We talked a little bit of math, but mostly the conversation was philosophical.  I regrettably do not remember precisely what we discussed, but it was very refreshing and stimulating to have an extremely frank and heartfelt interaction with someone with Conway’s level of insight and intellectual clarity.

Conway was arguably an extreme point in the convex hull of all mathematicians.  He will very much be missed.

My student, Jaume de Dios, has set up a web site to collect upcoming mathematics seminars from any institution that are open online.  (For instance, it has a talk that I will be giving in an hour.)   There is a form for adding further talks to the site; please feel free to contribute (or make other suggestions) in order to make the seminar list more useful.

UPDATE: Here are some other lists of mathematical seminars online:

Perhaps further links of this type could be added in the comments.  It would perhaps make sense to somehow unify these lists into a single one that can be updated through crowdsourcing.

EDIT: See also IPAM’s advice page on running virtual seminars.

Archives