A peculiar factorization

Proving theorems in mathematics is frequently a tedious endeavour, because you’re usually just taking well-known techniques and applying them in a slightly different setting resulting in results that are ever more specialized. Often times you ‘know’ exactly what the answer should be and how the proof will go, expecting at most minor technical difficulties along the way.

However, once in a while your intuitions and expectations are shattered, or you encounter something that is mind-blowing. This is one such circumstance.

Recently I started working with the following family of polynomials

f_c(x) = x^6 - 3x^5 + cx^4 + (5 - 2c)x^3 + cx^2 - 3x + 1.$

Being a sextic, there is no general procedure to find the roots of this thing, and there is no “algebraic” reason why it should factor at all. I did not expect to be able to factor it at all, much less find an explicit procedure to solve the corresponding sextic equation using radicals. To understand these polynomials, I plotted them for several values of c and discovered the following facts, all unexpected, but not equally bizarre:

(1) The roots of f_c(x) are always all real or all complex, there is never any mixed roots. This is not surprising at all if you know where the family comes from, but without that knowledge, could be quite shocking.

(2) If the roots are all complex, then there is always a conjugate pair \alpha_1, \overline{\alpha_1} such that the real part of \alpha is 1/2. This, to me at least, was totally unexpected.

(3) If \alpha_2, \overline{\alpha_2} and \alpha_3, \overline{\alpha_3} are the two other pairs of complex roots, labelled so that \alpha_i have positive imaginary part, then in fact the imaginary parts of \alpha_2, \alpha_3 are equal. Moreover, the real parts of \alpha_2, \alpha_3 always sum to one! This is truly a wtf moment.

Using this data, one can in fact (after a lot of work and computer algebra) show that f_c(x) in fact always factors as follows:

f_c(x) = (x^2 - x + a)(x^2 - x/a + 1/a)(x^2 - (2 - 1/a)x + 1),

where a is a root of the cubic equation x^3 + (3 - c)x^2 + 3x - 1 = 0.

Since cubic equations are always solvable in radicals and quadratic equations are always solvable by radicals, it follows that f_c(x) is always solvable by radicals. Moreover, the procedure above is extremely explicit and simple to implement!

I have no explanation for why this happened. If you do, please let me know, I am very curious!

An interesting problem

Recently in my Facebook feed, I sometimes get “problems of the week” from the Canadian Math Society with respect to the Canadian Open Mathematics Challenge. This week’s problem seemed quite fun, so I tried it. The question is stated in the link above.

The fact that f(x) must be weakly increasing is obvious from the statement. Suppose that 0 \leq x < y \leq 4. Then 0 \leq y-x \leq 4, whence f(y) \geq f(x) + f(y-x) from the hypothesis on f. Since f is non-negative on [0,4], the result follows.

The second conclusion is a bit more interesting. We first show that f(1/2^k) \leq 1/2^{k+1} for all integers k \geq -2. The case k = -2 is from the hypothesis. From there, we see that f(4) \geq 2f(2), whence f(2) \leq 1. The desired conclusion then follows from induction.

Now suppose that there exists a positive number x \in [0,4] such that f(x) > x. Then we have 2 \geq f(4-x) + f(x) > x, by the non-negativity of f and our hypothesis on x. Hence, we have restricted our x to the interval [0,2). Iterating this argument, we find that 1 = f(2) \geq f(2-x) + f(x) > x, so in fact x \in [0,1). Iterating this and recalling the previous paragraphing, we find that x < 1/2^k for all positive integers $k$, so that x = 0. But from the inequality f(4) \geq f(4) + f(0) and the non-negativity of f, it follows that f(0) = 0, so this is a contradiction. Hence no such x exists.

An introduction to the determinant method: Part III

In this latest installment on the determinant method (see here and here for the previous two posts), I will finally discuss something I have actually completed work on: the p-adic determinant method in the setting of weighted projective spaces.

In the original formulation of the determinant method due to Heath-Brown, he showed that the quality of the estimates involving the determinant method heavily depends on the dimension of the variety. In particular, the factor that one uses to control the number of auxiliary polynomials needed depends on the factor 1/d^{1/n}, where n is the dimension of the variety. Thus, the larger n gets, the closer this gets to unity if the degree is fixed. Therefore, one can hope to do better by considering a variety defined by homogeneous polynomials as a projective variety, since then the dimension gets reduced by one. Aside from this, Heath-Brown did not appear to use any other notions from algebraic geometry, and his proof in his 2002 Annals paper used ad-hoc arguments that also did not seem to relate much to algebraic geometry. However, Broberg soon formulated Heath-Brown’s theorem in terms of algebraic geometry and thus significantly solidified the theoretical backbone of Heath-Brown’s machinery. Consequently, Salberger established Heath-Brown’s theorem in the language of algebraic geometry. In this setting, it became clear how to generalize Heath-Brown’s theorem to other types of projective spaces, in particular weighted projective spaces. 

The advantage is that we are able to deal with varieties with good algebraic structure, but for which the natural counting function for heights of bounded height involve very skew boxes. The standard determinant method for projective spaces work best with boxes that are roughly cubes. A prototypical example is the following two problems:

Question 1: Let f(x_1, x_2, x_3) be a ternary form with even degree. Count the number of rational points on the variety defined by

f(x_1, x_2, x_3) = t^2.

Question 2: Let f(x,y) be a binary form of degree d \geq 3 with integral coefficients, which is not too degenerate. Estimate the density of pairs of integers (x,y) such that f(x,y) is square-free.

In the first question, it is natural to define the counting function as N_f(B) = \# \{(x_1, x_2, x_3 \in \mathbb{Z}^3 : \gcd(x_1, x_2, x_3) = 1, |x_i| \leq B\}. Making this choice, we realize that |t| could be as large as  B^{d/2}. Thus, we would be considering very lopsided boxes. In the second question, after some non-trivial work due to George Greaves, one can show that the question is reduced to counting solutions to the equation

f(x,y) = vz^2

where |x|, |y| \leq B, |z| \ll B^\beta where 2 < \beta < d/2, and |v| \ll B^{d - 2 \beta} (see my paper).

The first type of variety should be naturally considered to be a surface in the weighted projective space \mathbb{P}(1,1,1,d/2), and the latter should be considered as a surface in \mathbb{P}(1,1,d-4, 2).

The advantage of considering these as surfaces in a suitable weighted projective space is that we can now take advantage of the structure of weighted projective spaces to `equalize’ the box size, so that the large outlier bounds are properly taken care of. The trade-off is that the 1/d^{1/r} term in the exponent of the determinant method is replaced with (w/d)^{1/r}, where w is the product of all of the weights and r is the dimension of the variety. This trade-off in fact produces superior results, as opposed to considering the variety as a hypersurface in \mathbb{A}^4, say.

The way to approach the determinant method in the weighted projective setting is to obtain the correct analogues of the Hilbert function. This has been worked out by Dolgachev in a seminal 1982 paper, where many of the basic notions of weighted projective spaces are worked out. Having established the correct algebraic geometric machinery for weighted projective spaces, it is then a relatively simple matter to translate the work of Salberger into the new setting. This is the focus of this paper.

A little piece of financial math

Like many homeowners, I don’t put a lot of thought into my mortgage payments. Indeed, it suffices for me to know that if I pay the bank the amount that they asked for, then they’ll stay off my back for at least a month. Most of the Canadian banks have a mortgage calculator on their website, and I’ve never put much thought as to how it works.

This changed when my wife, who’s getting certified as a professional accountant, asked me how to perform what turns out to be equivalent to calculating a mortgage. The derivation is quite intricate and surprisingly satisfying mathematically, so I thought I would share.

Let us set up the problem as follows. Suppose that you borrowed an amount k from a lender, and the interest rate is r percent per period. You are to pay the lender back in n equal payments, including all interest and principal. How do you calculate the amount of each payment, and the total cost of borrowing (excluding time value of money which is not part of the problem)?

This question seems quite intractable, as there are many unknowns. However, in a realistic situation k, r, n are all known. In concrete terms, suppose you borrowed k = 250,000 dollars to buy a house, at an interest rate of 3% per month. You are to pay the bank back over a period of 25 years, or n = 300 months. Therefore, our final formula would make use of the parameters n, k ,r, or in other words, we wish to derive the payment amount P as a function of n, k, r.

To do this, let us think about the first payment, say P_1. We have to pay rk amount in interest, and an amount a_1 in principal. Therefore, we have

P_1 = rk + a_1.

The next payment, P_2, we pay an interest equal r(k - a_1), since the principal has been reduced by a_1, and a principal amount equal to a_2 = P_2 - r(k-a_1). Making use of the fact that each payment is equal, i.e. P_1 = P_2 = P, we have

a_2 = P - r(k - P + rk) = P - rk + rP - r^2 k = (1+r)P - k(r + r^2) = (1+r)a_1.

Likewise, for the third payment, we have

a_3 = P - r(k - a_1 - a_2) = P - r(k - a_1 - (1+r)a_1) = a_1 + ra_1 + r(1+r)a_1 = a_1(1+r)^2.

We now have enough data to make an induction hypothesis, namely

a_l = a_1 (1+r)^{l-1}.

If we assume the induction hypothesis, then we have

a_{l+1} = P - r(k - a_1 - \cdots - a_l) = a_1 + r a_1 + \cdots + r(1+r)^{l-1}a_1 = a_1(1+r)^l.

Now we use the fact that after n payments, we must pay back the principal. In other words, we must have

\sum_{j=1}^n a_j = k,

which is equivalent to

a_1 \sum_{j=0}^{n-1} (1+r)^j = k.

The partial geometric series on the left hand side can be evaluated to be

a_1 \left(\frac{(1+r)^n - 1}{r}\right),


a_1 = \frac{kr}{(1+r)^n - 1}.

Recall that a_1 = P - rk, and we obtain

P = rk\left(1 + \frac{1}{(1+r)^n - 1}\right) = \frac{rk(1+r)^n}{(1+r)^n - 1}.

Lessons learned from WW2

The conflict known as World War 2 (officially starting with the invasion of Poland by Germany in 1939, even though many parts of the world was already at war years before that date) is without a doubt the most brutal war (in terms of sheer number of people killed) in the history of the world. It is without surprise that all nations touched by that war had learned a harsh lesson from that war. What is debatable, however, is whether that learned lesson is the same for each participant. I claim that the answer is most certainly ‘no’, and most importantly, the two most powerful nations in the world today learned drastically different lessons.

If the United States learned anything from WW2, it is that if you leave a dangerous despot in charge for too long, you are only giving him enough time to build up his forces to wreck even greater damage later. In other words, the US learned that non-interventionist policies lead to large scale conflict. I remember in our schools, this specific point was made against Neville Chamberlain. If we let a ‘Hitler’ build a powerful autocratic regime for too long, we will face a devastating war. This explains the subsequent US response to the USSR and can be used to explain the Vietnam War, both Iraq Wars, and the war in Afghanistan. These are all manifestations of a ‘preemptive strike’ doctrine, designed to remove dangerous enemy leaders before they are strong enough to pose a serious threat. The USA has used this tactic in less inspiring ways, including the removal of democratically elected socialist leader Salvadore Allende in Chile.

What China learned is that if social order collapses and internal conflict erupts, the nation will be too weak and exhausted to face foreign threats appropriately. Therefore, it has since emphasized social order and national unity above all else in the last seventy years. Interestingly, China did not surmise that the reason Japan was able to invade China so successfully and so devastatingly is because China failed to strike first, but that China was so weak that Japan saw it as an opportunity. Despite its bitter history, China never devised a doctrine of first-strike against would-be opponents. Indeed, China’s foreign policy is almost entirely non-interventionist.

Arguably, China and the USA are the two biggest voices on the United Nations Security Council, who is charged with resolving conflicts like the Syrian conflict. Because of the vastly divergent views of the two most dominant members (not to mention both hold veto power), it is unlikely the UNSC will adopt any tangible action that will actually lead to any reasonable solution. Indeed, what a ‘solution’ might look like will be very different for the two nations.

The Untied States’s desired solution would be: remove Bashar Al-Assad from power, implement a ‘moderate’ (i.e. western leaning) government who will oversee democratic elections, then deployment of western forces to prop up the system, then begin the work of rebuilding the country. However, the removal of the ‘bad regime’ is extremely important desired consequence for the US. The other things are of lesser priority.

China’s desired solution would be to supply the current government with resources to combat their enemies (either western leaning or not), ensure stability, and no foreign troops deployed. At the end of the day, the Assad regime would remain and the country would go back to the way it was before the civil war started five years ago.

In terms of ending the current crisis, both solutions reach the immediate goal of stopping the hemorrhaging of refugees from Syria, but the final outcomes are complete polar opposites. Public opinion in the west will never support China’s solution, so the only possibility is a UNSC stalemate and a weakly worded resolution that ‘condemns’ the actions of various vague parties.

If this conflict is to be stopped, then one side has to win. If every group of combatants is getting weapons and supplies because someone they are ideologically aligned with is willing to supply them, the conflict will never end and the root problem never addressed, until the one group of fighters that are the most dedicated and most fanatical comes out on top. If the history of China is anything of a guide, that group is not necessarily the most well equipped. The most tenacious and dedicated faction will come out on top, and that faction is likely the one where they are not fighting for money but for some ideology, however deranged it is. By that of course I mean the Islamic State.

So if the world’s two superpowers will not yield, they might both lose big time to their most dangerous common enemy. In the sake of both country’s national interests and for the broader interests of humanity, it is important that at least one side is willing to back off. However, the US has been expecting the world to ‘compromise’ to their desired solution for so long, with China being fed up with this for decades, that this is an unlikely scenario.

Bhargavaology and Chabauty implies weak Stewart’s conjecture ‘almost surely’

This is a continuation of a previous post. Recall that Stewart’s conjecture asserts that there exists a constant c_0 such that for all binary forms F(x,y) \in \mathbb{Z}[x,y] with non-zero discriminant and degree r at least 3, there exists a number C_F which depends on the coefficients of F such that for all integers |h| \geq C_F, the equation F(x,y) = h has at most c_0 solutions. The so-called weak Stewart’s conjecture is the same statement with the absolute constant c_0 replaced by a constant c(r) which is only allowed to depend on the degree of F, but otherwise independent of (the coefficients of) F. The weaker conjecture is ‘almost surely’ a consequence of a theorem of Bhargava and Gross, which is the main content of this paper and recent work of Michael Stoll using Chabauty’s method; seen here. Indeed, by the Bhargava-Gross theorem, we know that `almost all’ (in the sense that all but o(2^{-g}) when ordered by naive height) hyperellitpic curves of genus g with g large has Jacobian rank less than g-3, since the average size is only 3/2, and from there we obtain the bound that c(2g+2) \leq 8(g+1)(g-1) + (4g-12)g = 12g^2 - 12g - 8.

The formulation is as follows. The equation F(x,y) = hz^2 defines the equation of a quadratic twist of the hyperelliptic curve F(x,y) = z^2. Assuming that the average over the family of quadratic twists does not deviate too much from the average over all curves, we should see that every curve in the family of quadratic twists have small rank ‘most of the time’, and hence obtain the weak version of Stewart’s conjecture.

However, we are far from being able to establish such a result. The most basic obstruction is that there could exist a set of exceptional curves with very large Jacobian rank in a set of density o(2^{-g}), which may still be positive. Secondly, it is very difficult to control the average size, much less an upper bound, for the Jacobian rank of twists of a curve, as there are infinitely many congruence conditions at play.

Nevertheless, with a sharper tool to deal with potential large rank cases or even an argument to dispense the large genus case altogether, Chabauty’s method and Bhargavaology would imply the weak version of Stewart’s conjecture. Exciting times!

Thue equation “funnel”

In this post I will be detailing a problem I have thought about for about a year, but has not made any progress on. The problem is one posed by my advisor, Professor Cameron Stewart, in the following paper:

C.L. Stewart, On the number of solutions of polynomial congruences and Thue equations, Journal of the AMS, (4) 4 (1991), 793-835.

His conjecture can be seen on page 816 of the above paper:

“… we conjecture that there exists an absolute constant c_0 such that for any binary form F \in \mathbb{Z}[x,y] with nonzero discriminant and degree at least three there exists a number C, which depends on F, such that if h is an integer larger than C then the Thue equation [F(x,y) = h] has at most c_0 solutions in coprime integers x and y.”

In the same paper, Stewart proves that the equation F(x,y) = h has no more than O((\deg F)^{1 + \omega(g)}) solutions, where g is a large divisor of h. The crucial point is that quite often, we can pick g which has few prime divisors, so that \omega(g) is small (see also this recent preprint due to Shabnam Akhtari: http://arxiv.org/abs/1508.03602).

However, this situation should not be typical. The basic idea is that when h is large with respect to the coefficients of F, that F(x,y) = h should behave `generically’, meaning that most of the points on the plane curve defined by the Thue equation should have few rational points. Indeed, generically the curve F(x,y) = h should be of general type (i.e. have genus at least 2), so one should not expect too many rational points by Faltings’ theorem.

Nevertheless, if one checks Stewart’s argument (this particular part is not necessarily new; Bombieri and Schmidt had used roughly the same idea in other papers), then one sees that a crucial ingredient is a p-adic reduction lemma. In particular, for a prime p and a positive integer k such that p^k || h, one can transform the equation F(x,y) = h into at most d = \deg F many equations of the form F_i(x,y) = hp^{-k}. Indeed, this is where the power of d appears in Stewart’s theorem. Another crucial input of Stewart, arguably the most novel part of his paper, is that he shows that the equation F(x,y) = n has few solutions provided that n is sufficiently small with respect to various invariants of F; in particular, he does not require that n = 1.

The deficit of Stewart’s argument, as with all other related results, is that it fundamentally obtains the ‘wrong’ bound: one should expect fewer solutions to the equation F(x,y) = h when h is large, not more. In particular, all arguments involve some sort of reduction or descent argument and reducing the equation to one where the right hand side is a small integer, thereby susceptible to various diophantine approximation arguments, at a cost that is not so severe. This type of argument is unlikely to be useful in resolving Stewart’s conjecture.

However, recent groundbreaking work due to Manjul Bhargava offer hope. In this paper, Bhargava shows that most hyperelliptic curves in the sense of some natural density with respect to the coefficients of the curve written as z^2 = f(x,y) with f a binary form with integral coefficients and even degree, have no rational points. One of the most ingenious constructions in the paper is an explicit representation of a binary form f(x,y) whenever the curve

z^2 = f(x,y)

has a rational point. Indeed, if the curve above has a rational point (z_0, x_0, y_0) say, then there exist integer matrices A,B \in \text{GL}_{2n}(\mathbb{Z}) with 2n = \deg f such that

f(x,y) = (-1)^n \det(Ax - By).

He then showed that most binary forms cannot be represented in the above way, thereby showing that most hyperelliptic curves have no rational points.

If one carries out Stewart’s p-adic reduction argument all the way down to 1, we see the following. Start with a generic equation F(x,y) = h, apply a descent argument, and end up with a bunch of equations of the form \mathcal{F}(x,y) = 1. In particular, the curve defined by

\mathcal{F}(x,y) = z^2

has a rational point! Therefore, it must be one of the rare binary forms with a representation of the form (-1)^n \det(Ax - By). I call this effect ‘funneling’, because we start with a generic, typical object and end up with a special object. Therefore, all of the initial data had to pass through a ‘general to special’ funnel. Since the final object is so rare, the initial object could not be too abundant; otherwise we would ‘violate’ Bhargava’s result.

Of course, the above paragraph is too coarse and fuzzy to formulate precise statements, much less proving a rigorous theorem. However, I still believe that this `funnel’ should exist and that it will lead to a solution to Stewart’s conjecture.

If you have any insights or ideas, please feel free to contact me, I would be extremely pleased if this question gets an answer.