Reduced susceptibility to SARS-CoV-2 in metropolitan regions

The coronavirus pandemic is wreaking public health, social, and economic havoc across the globe, and to date a variety of strategies have been implemented to attempt to control the spread of disease [1, 2]. A critical unknown for policy planning is the number of people who have been infected and are no longer susceptible [3]. Tests for active SARS-CoV-2 infection or antibody presence can provide an indication, but both are prone to selection bias, under-representative population sampling and insufficient reliability [4, 5]. Here, we present an alternative to determine residual susceptibilities based on the analysis of observed population-wide disease dynamics data. For four highly-affected countries, we directly compared the dynamics in the largest metropolitan regions with the rest of the countries. We show that substantial susceptibility reductions are measurable in the metropolitan regions, which all continued in a phase of exponential growth of case numbers for a relatively longer time before public health interventions were introduced. Compared to these interventions, the reduction in metropolitan region susceptibility had a substantial role in the post-growth decline in infection rates. Reduced population susceptibility has far reaching consequences on future policy responses and disease forecasts including vaccine trial planning and, in the case of a second epidemic wave, higher population-normalised mortality rates for non-metropolitan regions.

Coronaviruses have long been an endemic source of human and animal infections. A novel coronavirus SARS-CoV, the cause of severe acute respiratory syndrome (SARS), appeared in 2002, but due to aggressive early containment measures [6,7], SARS-CoV was eradicated. Another novel coronavirus MERS-CoV appeared in 2012 and, while not eradicated, the disease is contained with a small number of new cases mainly limited to animal sources in the Middle East [8]. Coronavirus SARS-CoV-2 appeared in late 2019 [9][10][11], causing the illness COVID-19 which can be fatal.
Due largely to human mobility patterns and infection characteristics, COVID-19 has become a pandemic [12], and the death toll continues to rise.
Knowing the extent of population exposure to SARS-CoV-2 has important implications for public policy measures. Yet, while SARS-CoV-2 has an extensive global reach, the portion of populations currently and previously infected remains unclear. Viral assays only capture recent infections, and even then can be unreliable. Serologic antibody testing to ascertain levels of prior exposure is burdensome, with unreliable testing validity and the potential to lead to systematic under-counting [13,14]. Tracking epidemic data across populations and evaluating the time evolution of key markers such as death rates provides a powerful alternative devoid of these shortcomings. We utilise this approach to estimate the fraction of populations remaining susceptible to infection, which is a key parameter governing the course of an epidemic.
Generally, when a fully susceptible population is exposed to a new and contagious pathogen, an epidemic starts with an initial seeding phase when a small number of individuals are infected. Unconnected local outbreaks with unpredictable spread can occur due to variability in individual behaviour, the effects of potential 'super-spreading' events, and the non-uniform success of containment measures. This is followed by a phase of free exponential growth, during which 'track and trace' based containment is no longer feasible. Exponential spread of infections I(t) = e (β−γ)t over time t happens at a rate 1/τ = β − γ > 0, when the average rate of new secondary infections caused by each primary infectious individual (β) exceeds the rate of recovery (γ). 1 By reported daily death counts [15], the disease dynamics of COVID-19 have fulfilled the condition of exponential growth in nearly all parts of the world. While onset of the exponential phase varies due to different seeding times, we find that the time constant of exponential growth is very similar over a wide geographical range. Fig. 1a illustrates this using data from four highly-affected countries: Spain, Italy, the United Kingdom, and the United States [16]. The exponential growth is sustained as long as S, defined as ratio of susceptible individuals to the whole population, is close to 100%, and β and γ are kept constant. Slowing spread and ultimately reversal to an exponential decline can be driven by Observed impact of susceptibility reduction on infection dynamics Variability in the nature and timing of public health interventions and resulting population behaviours are eventually reflected in variations across countries in the slowdown and exponential decline of the daily death counts. Fig. 1b illustrates, for the specific example of the United Kingdom and London, the timelines of public awareness (measured through frequencies of relevant internet search queries [18]), government restriction stringency (measured as aggregate 'Oxford' stringency index [19]), and visits to public meeting places (restaurants [20], grocery stores, pharmacies, and public transit stations [21]). Qualitatively, all measures correlate with one another, with both voluntary behavioural changes and the stringency of public health interventions resulting in reduced frequency of visits to locations of high infection risk.
While it is difficult to quantitatively associate a timedependent reduction of β(t) with these data, the observed location data indicate that the public response between areas within one country is similar in timing and magnitude of deviation from baseline. Due to the seeding of the epidemic in each country's main metropolitan region (Italy -Milan/Lombardy 3 ; Spain -Madrid; England -London 4 ; USA -New York City) occurring earlier than in the rest of the country, the time between epidemic onset and reduction of β (driven by public behaviour change) differs in an otherwise comparable situation. We therefore consider pairwise comparisons of main metropolitan regions and rest of countries a suitable tool to investigate the relative impact of reductions in S.  [16,22] for the four investigated countries, broken down into the data from the main metropolitan region and the rest of the country.
Each of the corresponding eight data sets shows an initial exponential growth phase, followed by a transition to an exponential decline. Seeding of the epidemic consistently happened earlier in the metropolitan regions than elsewhere in each country, leading to higher population-normalised daily death counts in these areas. Yet, the time constant of the exponential growth is equal within errors for each metropolitan-country pair (except in the United States due to its heterogeneity), suggesting that the contact rate among a fully susceptible population is not critically (lower panels), (even when taking into account a delayed impact of ≈ 20 days of their effect on the daily death counts [25,26]), implies that both β(t) and S(t) do not vary much during this phase.
In this case, the decay time constants can be used to yield the ratio of fractions of the population that remain susceptible to the virus, provided β = βm = βc remains equal across each country. We directly compare the relative timing of the epidemic evolution with public policy and behaviour changes (Fig. 2). The results corroborates the finding that a reduction of S had already begun to reduce new infection rates in the metropolitan regions when public behaviour changes started to have a significant impact. The difference in the effect of S for all eight geographic areas is shown in Fig. 3, where the peak number of daily deaths is plotted against the time delay between when visits to public meeting places are reduced and when death rates begin to slow (see Methods). The country-level data cluster around a common delay time that is similar to the reported average time between viral exposure and death [25,26]. This suggests that the main Figure 3: The peak daily death counts plotted against the days between public behaviour change and the end of exponential growth leading to a peak rate. A correction factor from excess death data was applied to the reported rates (open circles) to derive the expected rates (solid circles), which are linked. The 4 countries exclude the corresponding metropolitan regions. The inset illustrates, for the example of England, a death count calibration by comparing reported COVID-19 deaths to the all-cause excess death data.
driver for the initial reduction in death (infection) rates is changes in public behaviour. This is not the case for the metropolitan region data, which in turn cluster around a common peak daily death count. This clustering suggests that the larger early spread of the disease was sufficient to reduce infection and death rates through reduction of S in addition to the effects based on behavioural changes.

Modelling and forecasting
To further investigate the dynamics of the epidemic in each area, we employ the well-established compartmental Susceptible-Infected-Recovered (SIR) model [27,28], and solve the differ-  We also model the hypothetical scenario of a full return to pre-intervention conditions, constituting the second wave (dashdotted lines d1). In this projection, infection rates begin to rise but the second peak for the metropolitan regions is reduced with respect to the first peak, while the second peak for countries is of higher magnitude than the first one, which is because the height of the peak scales with S.
In Fig. 4 (inset) the resulting evolution of S(t) for London and the rest of England is shown, together with the results of serology antibody tests [29], which have recently been suspected to underestimate the actual (yet potentially temporary) immunity levels due to the lack of sensitivity of the tests to T-cell based immunity [13]. In both methods, the metropolitan region exhibits a lower absolute value of susceptibility when compared with the rest of the country.

Discussion
In conclusion, we have found that the dynamics of the COVID-19 pandemic have been influenced both by policy interventions and by reduced population susceptibility, with a relatively stronger contribution from susceptibility changes in metropolitan regions.
Our data provide an upward revision to the reported prevalence estimates derived from laboratory testing. In addition, while infection and thus death rates are a function of S and β, the effect of S reduction is more durable, and β will rise when restrictions are relaxed unless protective measures prove to be fully compensatory. If they are not, a second peak may occur. The reduction in S would not necessarily avoid this, but would mitigate peak height. However, our data also indicate that non-metropolitan areas are relatively more vulnerable in a second wave, which has implications for the distribution of healthcare resources.
Turning now to the exponential decay phase, if the fraction of population that is susceptible S is instead different from 1, and has some other constant value, then the contact rate is simply modified accordingly and the final characteristic exponential decay time is given by provided β f is also constant (for example during a period with little or no change in public behaviour). This allows us to examine the ratio of susceptible populations between a metropolitan region (m) and the rest of its corresponding country (c) through by measuring the time constants of the exponential declines. If β f is further assumed to take the same absolute value in the two areas, this equation simplifies to providing a direct way to estimate the ratio of relative susceptibilities in each area.
Relating public health interventions (PHIs) to epidemic data Variations in the stringency of country-wide public health interventions among countries have been extensively analysed by the Oxford group [11]. These data are not available at a regional (sub-country) level, and are not a direct measure of actual public behavior. We use cellular device location data [12] to overcome both these limitations, and to evaluate the effect of PHIs.
Such data are available for several high infection risk location categories, including retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. We plot these location data to show the relative change compared to the baseline value from the first five weeks of 2020 for each of the eight areas studied. Due to similarities of the relative timings and changes in magnitude of these measures, in principle any of them could be used as proxy for β(t). Based on the comparably low noise level as well as the consistent and clear functional shape of the location data at public transit stations, we chose these data for our analyses. Transit data, shown in Fig. 2 (lower panels), includes use of subway stations, taxi stands, sea ports, and other travel-related locations.
Empirically we find that the transit station location data are well described by a sigmoid function in two steps. First, we individually fit the data at early and late times to a linear function using ordinary least-squares regression to obtain Du and D l . These values are then used as fixed parameters of the full function in a second fitting stage. We use the full fits to extract both the peak value of the fit to D(t) and the time t 10% when D(t) reaches 10% of the value that would have been reached at the same time if the exponential growth trajectory had continued. This is a measure of when the dynamics has significantly departed from the exponential regime. This point is located close to the peak, but is not biased by the slope of the exponential decline in the same way that the peak time is. The error of t 10% is determined again as the one standard deviation functional prediction interval.
We use the relative delay ∆t = t 10% − t 50% as a measure of the influence of public behaviour change on β and the rate of new exposure to the virus. The expectation for a PHI-driven scenario, versus an effect of a reduction in S, is that ∆t is comparable to the typical time from exposure to death, which is on the order of 20 days [13,14].
Corrected death counts For COVID-19 dynamics, the attributed daily death rates are a more reliable metric than the number of confirmed cases, as they are not dependent on testing practices, and should be less prone to under-reporting. However, daily death rates are still subject to variations in reporting methods and in how each region defines what is classified as a COVIDrelated death [15]. Alternatively, excess deaths can be calculated by examining how many additional deaths (from all causes) have occurred above a baseline value that would be expected for the same time of year had the epidemic not happened [16]. While excess deaths statistics include deaths indirectly related to COVID-19 (for example, 'collateral' deaths due to healthcare systems being overwhelmed, or reduced visits to emergency departments), and the baseline value for expected deaths may not exactly reflect the current situation [17] (for example, in lockdown conditions the number of road traffic accidents will be lower than typical historical values), the number of excess deaths is nevertheless widely agreed to be the most reliable indicator to reflect the state of the epidemic, and alleviates some of the shortcomings associated with the reported daily deaths data [15,18].
We compare the time evolution of D(t) (aggregated to weekly level) to that of weekly excess deaths during the spring of 2020 relative to the median value of the historical data up to the past five years, where available [19]. Accounting for the delay that it takes in processing and registering the deaths in each case, we apply a constant correction factor to the reported daily deaths.
These factors faithfully reflect the registered excess deaths, and by area were 1.9 (Italy), 2.2 (England), 1.4 (Spain), 1.5 (US), and we found that these same factors applied equally well to each of the corresponding metropolitan regions, respectively. For the purpose of fitting analysis, we continue to work with the scaled reported deaths because these data are available at a daily granularity, whereas the excess deaths are only provided weekly.
Typically, these models require many input parameters such as fine-grained demographic and populations' behavioural details.
They are often critically dependent on these parameters with corresponding large variations between studies in terms of predicted outcomes and inevitable divergence from actual observations made after the forecasts [25]. In our work we instead aimed to understand the reduction of S and its impact in a largely model-independent way. In order to still derive quantitative trends and estimates, we chose the SIR model as the simplest epidemic model that takes time-dependent susceptibility (immunity) into account.
The well-established SIR framework is a deterministic model where the dot indicates a derivative with respect to time. These equations have been solved analytically for a constant β in [26].
Since public policy interventions and public behaviour changes (whether or not they are influenced by these interventions) affect the value of β, we introduce a time dependence of that parameter into the model β = β(t). Since no known intervention affects the duration of infectiousness, we treat γ as constant. Following the empirically found functional shape of the public response as representatively measured through use of public transit, we also model β(t) as a sigmoid with The shape of this function implies that there was only one period of relevant change to β from its unrestricted initial value βi to a later final value β f . This is justified as long as public policy relaxations do not take effect on D(t) and as long as compliance with them remains high (constant). We therefore restrict our main analysis to times that are not affected by such changes and only consider D(t) data until the end of May 2020, as no significant corresponding changes in TS(t) are discernible prior to early-May (considering the delay between exposure and death).
This simple model is based on several idealised assumptions.
It does not account for inhomogeneity of the population or of the mixing dynamics [27], and so stochastic, regional and demographic effects are averaged out to effective parameters. In particular, it is known that µ is a parameter whose value strongly increases with age, but also with strong dependence on health status [28]. This means that not only the demographic composition of an investigated population needs to be considered, but also the dynamics of epidemic spread relative to it (e.g. the timing of seeding the virus in settings or population groups with spe- Additionally in this process, coupling is introduced between each metropolitan region and its corresponding country by constraining the values of β(t) over time to be equal between the areas in each pair, as informed by the location data ( Fig. 1b and Fig. 2  Free optimisation parameters then relate to timing of epidemic seeding and the nature of the change in β (tint, tw).
To account for demographic differences, we use the distributions of ages from population statistics in each of the eight geographical areas along with the reported age-dependence of the IFR [13,28] Table 1: Exponential growth/decline time constants. Linear fits to the logarithm of daily deaths per million population data are used to avoid heteroskedastic bias. Fits to the initial growth phase for the total country level data in Fig. 1a produce time constant values which are all consistent with each other, with a mean ofτ = (3.4 ± 0.2) days. Country level data are then broken down into main metropolitan region and rest of country (Fig. 2), with corresponding fits to both the exponential growth and decline phases listed. Errors are 2 standard deviation confidence intervals (CI) from fits.  Table 2: Output values of the SIR model with a time-dependent β(t). The results of the pairwise-coupled optimisation procedure in each country are shown for the asymptotic (long-time) value of susceptibility S and resultant infection fatality rate µ. The coupling between metropolitan region and rest of country is set by constraining country-wide equal levels of public health intervention (β = β m = β c ), and a fixed ratio of infection fatality rates (derived from population age structure statistics). The ratio of the values found for S in the two areas is compared to the model-independent estimate derived from the time constants of exponential decline of the daily death counts.