logbook.txt

SIAM Chapter ICMC - Resources

#---------------------------------------------------------------------------------------------------
# Where are we heading?
#---------------------------------------------------------------------------------------------------

  What is the problem being approached by the group? {OBJECTIVE}
   To develop an epidemiological model to inform the selection of interventions to contain COVID-19 spread in the Brazilian territory. In other words, to develop a model to assign a relative measure of efficiency to available interventions.

  Why are we spending our time to investigate this problem? {MOTIVATION}
    To contribute to ongoing efforts to ameliorate socioeconomic impacts of COVID-19.

  Materials: health surveillance data obtained from an online platform [1]. {CONFIRM}

  Method: the group will adopt a number of methods to tackle the problem:

    {STATE; I believe we are moving in this direction}
    (1) A machine learning method to fit a SIRD-like model to surveillance data. Contact rate assumed as being variable, other model parameters assumed constant in time. The aim is short-term forecast of new infections, recoveries, and deaths.

    {LIMITATION}
    It must noted that this arrangement lies on an implicit premise: the assumption that no intervention will be made by health authorities. {CONTROVERSE} This premise severely circumscribes the use of the model to indicate that some intervention ought to be made to avoid some unwanted development, such as excess demand for bed-days, ICU-days, or medical supplies. The model would be more useful if it could support the selection (and calibration) of available interventions to contain the spread of the disease (as in [10]). This would require a change in perspective: from a predictive task to a simulational task.
    {KEYWORDS; SIRD model, macro-level; machine learning; risk management}

#---------------------------------------------------------------------------------------------------
# SIRD-like epidemiological models
#---------------------------------------------------------------------------------------------------

  Typical variables of SIRD-like models:

    [Population, N]: The number of individuals residing at a territorry
    [Suscetible, S(t)]: The number of individuals susceptible to infection at time t
    [Infectious (or Infective, or Infected), I(t)]: The number of individuals that are (possibly) transmitting the disease to other individuals at time t
    [Recovered, R(t)]: The number of individuals that recovered from the disease and are now immune to new infection. Also, recovered individuals are not transmitting the disease to other individuals
    [Deceased, D(t)]: The number of individuals that died as an outcome of the infection
    -- Note: the above are counting variables

    [$\beta$]: transmission rate [1] or contact rate [3]  -- with dimensionality 1/([u][T])
    [$\gamma_r$]: recovery rate [1]                       -- with dimensionality 1/[T]
    [$\gamma_d$]: death rate [1] or mortality rate        -- with dimensionality 1/[T]
    -- Note: the above are model parameters

    [$R_0$]: Basic case reproduction number, $R_0 = \frac{\beta}{\gamma}$  -- with dimensionality 1/[u] {CONFIRM; I was expecting an adimensional quantity}

    [attack rate]: {TO DO} [10]

  Making explicit the assumptions of the SIRD model:
    (1) {TO DO} see [4,5]

  Typical SIR-like models:

    {TO DO} check wikipedia
      https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology
      https://en.wikipedia.org/wiki/List_of_COVID-19_simulation_models

    {TO DO} check wolfram
      [18:18, 21/05/2021] +55 16 99108-5626: https://community.wolfram.com/groups/-/m/t/1896178

    [SIRDC] Susceptible-Infectious-Resolving-Dead-reCovered [3]
    [SQUIDER] [6]

  A couple of distinctions that are relevant to clarify typical sources of modelling errors in epidemiology:

    (1) epidemic vs endemic models [4]: "Epidemic models are used to describe rapid outbreaks that occur in less than one year, while endemic models are used for studying diseases over longer periods, during which there is a renewal of susceptibles by births or recovery from temporary immunity."

    (2) macro-level vs micro-level models: macro-level models seek to describe the disease dynamics at the population level (e.g., SIR-like models have population-level variables), while micro-level models seek to simulate the disease dynamics at the individual level (e.g. individual-based models [11]). {CONTROVERSE} What do you mean by "level"? See discussion in [12].

      (2.1) Ferguson's research group at the Imperial College has being using the latter: "We parameterize an individual-based simulation model of pandemic influenza transmission for GB and the US using high-resolution population density data and data on travel patterns. We extend the model by incorporating realistic seeding of infection (via international travel) in the modelled countries, and by explicitly modelling air travel within the US (air travel being relatively insignificant in GB due to its much smaller size). The model represents transmission in households, schools and workplaces, and the wider community." [10]

      {TO DO} Ask IBGE/CS if 2022 nationwide census will collect data to localise the model in [10].

#---------------------------------------------------------------------------------------------------
# Questions and attempted answers
#---------------------------------------------------------------------------------------------------

Do the resulting surveillance data look like the data being daily reported in the news outlets?
  Yes, at least for country-level.

Looking at the surveillance data, is it possible to identify different "waves"?
  First, we need an operational definition of "wave". In [19], the authors seem to identify an epidemic wave with as an accelerated growth in the curve of new cases, namely ∆C(t), that can be visually identified. This is (to me) an unsatisfactory definition, but let's play along. The authors in [21] propose "a working definition and operationalization of waves to enable common ground to understand and communicate COVID-19 crisis": "An upward (downward) period refers to a period of at least n (=14) consecutive days when Rt is bigger (smaller) than 1."
  Answer: I was able to see two waves in Manaus, as authors in [19] indicate, roughly according to criteria in [21].

As a result from Projeto S, all adult individuals from Serrana, SP, have been vaccinated. Health authorities argue that deaths have decreased 95% as a result of massive immunisation [22]. Compared to Itapevi, SP, which has not received a similar intervention, this decrease is hardly seen in the curve for D(t). What is going on?
  Answer: I don't know.

Is it true that higher mortality rate has been consistenly observed in regions with high concentration of anti-vaxxers?
  Answer: I don't know.

The success of any data-drive model depends, of course, on the quality of available data. Are we rationally justified to believe that the available data is adequate to attain our research objective?
  Well, I would bet that both measured variables, number of new cases and deaths, are badly underestimated. Evidence for the latter can be found in [18].

Have SIR-like models been recently employed in the literature on Epidemiology?
  Yes, it seems:
  https://link.springer.com/article/10.1007/s10654-021-00748-2
  https://www.sciencedirect.com/science/article/pii/S0895435621000871
  https://www.nature.com/articles/s41598-020-80007-8

Have SIR-like models been taught to students in health-related courses?
  Well, I could not find any reference to SIR-like models in Gordis (new version) [13] and Stefens [14], which are undergraduate textbooks used in courses on epidemiology [15]. I wonder why is that so.

An important assumption of the SIR model is that recovered individuals are permanently immune to the disease. How much confidence are we rationally warranted that this is the case with COVID-19? After all, there is evidence suggesting that recurrence and reinfection may occur:
  https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00662-0/fulltext
  https://www.journalofinfection.com/article/S0163-4453(21)00043-8/fulltext (Brazil)
  https://www.sciencedirect.com/science/article/pii/S187603412100006X
  https://www.sciencedirect.com/science/article/pii/S1047279721000612?via%3Dihub
  https://www.nature.com/articles/d41586-020-02948-4

#---------------------------------------------------------------------------------------------------
# References
#---------------------------------------------------------------------------------------------------

[1] Fabio Amaral and Wallace Casaca and Cassio M. Oishi and José A. Cuminato. "Towards Providing Effective Data-Driven Responses to Predict the Covid-19 in São Paulo and Brazil." Sensors 21, no. 2 (2021): 540. https://www.mdpi.com/1424-8220/21/2/540
{ICMC}{SIRD model}{see ref 41 for stiff/non-stiff SODEs; more on this in https://www.mathworks.com/company/newsletters/articles/stiff-differential-equations.html; https://en.wikipedia.org/wiki/Stiff_equation}

[2] P. H. P. Cintra and M. F. Citeli and F. N. Fontinele. "Mathematical models for describing and predicting the covid-19 pandemic crisis." arXiv preprint arXiv:2006.02507 (2020). https://arxiv.org/abs/2006.02507
{UnB}{SEIR model, SIR model}

[3] Fernández-Villaverde, Jesús, and Charles I. Jones. Estimating and simulating a SIRD model of COVID-19 for many countries, states, and cities. No. w27128. National Bureau of Economic Research, 2020. https://www.nber.org/system/files/working_papers/w27128/w27128.pdf
{US National Bureau of Economic Research}{SIRD model; SIRDC, SEIRD, SEIR}

[4] Hethcote, Herbert W. "The mathematics of infectious diseases." SIAM review 42, no. 4 (2000): 599-653. https://doi.org/10.1137/S0036144500371907

[5] Okabe, Yutaka, and Akira Shudo. "A Mathematical Model of Epidemics - A Tutorial for Students." Mathematics 8, no. 7 (2020): 1174. https://www.mdpi.com/2227-7390/8/7/1174

[6] Khan, Z., Van Bussel, F., & Hussain, F. (2020). A predictive model for Covid-19 spread – with application to eight US states and how to end the pandemic. Epidemiology and Infection, 148, E249. doi:10.1017/S0950268820002423 {SIR model, SQUIDER model}

[7] Adam, David. "Special report: The simulations driving the world's response to COVID-19." Nature 580, no. 7802 (2020): 316-319. https://www.nature.com/articles/d41586-020-01003-6

[8] Ferguson, Neil M., Daniel Laydon, Gemma Nedjati-Gilani, Natsuko Imai, Kylie Ainslie, Marc Baguelin, Sangeeta Bhatia et al. "Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. Imperial College COVID-19 Response Team." Imperial College COVID-19 Response Team (2020). https://spiral.imperial.ac.uk:8443/handle/10044/1/77482
{Imperial College}

[9] Halloran, M. Elizabeth, Neil M. Ferguson, Stephen Eubank, Ira M. Longini, Derek AT Cummings, Bryan Lewis, Shufu Xu et al. "Modeling targeted layered containment of an influenza pandemic in the United States." Proceedings of the National Academy of Sciences 105, no. 12 (2008): 4639-4644. https://doi.org/10.1073/pnas.0706849105
{Imperial College}

[10] Ferguson, Neil M., Derek AT Cummings, Christophe Fraser, James C. Cajka, Philip C. Cooley, and Donald S. Burke. "Strategies for mitigating an influenza pandemic." Nature 442, no. 7101 (2006): 448-452. http://doi.org/10.1038/nature04795
{Imperial College}

[11] Railsback, Steven F., and Volker Grimm. Agent-Based and Individual-Based Modeling: A Practical Introduction. Princeton; Oxford: Princeton University Press, 2011.

[12] Woodward, James. "Levels: What are they and what are they good for?." Levels of Analysis in Psychopathology: Cross Disciplinary Perspectives (2020): 424-449.

[13] David Celentano and Moyses Szklo. 2018. Gordis Epidemiology (6th ed.). Elsevier, Philadelphia, PA, US.

[14] Stefens: https://usp.minhabiblioteca.com.br/#/books/9788595023154/cfi/8!/4/4@0.00:0.00

[15] Curso de epidemiologia no e-disciplinas; HEP0143 (bibliografia usa o livro texto do Gordis);

[16] COVID, IHME, and Christopher JL Murray. "Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months." MedRxiv (2020). https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full

[17] A. S. Peddireddy et al., "From 5Vs to 6Cs: Operationalizing Epidemic Data Management with COVID-19 Surveillance," IEEE International Conference on Big Data (Big Data), 2020, pp. 1380-1387, doi: 10.1109/BigData50022.2020.9378435.

[18] Painel de análise do excesso de mortalidade por causas naturais no Brasil, CONASS - conselho Nacional de Secretários de Saúde. 31 de maio de 2021.https://www.conass.org.br/indicadores-de-obitos-por-causas-naturais/

[19] Naveca, F.G., Nascimento, V., de Souza, V.C. et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nat Med (2021). https://doi.org/10.1038/s41591-021-01378-7

[20] (WHO) Weekly situation report, edition 41 of May 25, 2021
{Variant of Concern (VOC); Variant of Interest (VOI)}

[21] Zhang, Stephen X., Francisc Arroyo Marioli, and Renfei Gao. "A Second Wave? What Do People Mean By COVID Waves?-A Working Definition of Epidemic Waves." MedRxiv (2021).

[22] https://butantan.gov.br/noticias/projeto-s-imunizacao-em-serrana-faz-casos-de-covid-19-despencarem-80-e-mortes-95

#---------------------------------------------------------------------------------------------------
# Additional resources
#---------------------------------------------------------------------------------------------------

A machine learning method that can be explored to solve the inverse problem in hands:
  @Andre Lima  acho que é esse o artigo sobre o método usado no problema de inversão de dados sísmicos https://arxiv.org/pdf/2008.12833.pdf
  É esse mesmo. E acho que o código que ele usou está aqui: https://github.com/gabrielspadon/ReGENN

Surveillance archives (I was not able to download data from these tools)
  [20:17, 21/04/2021] Diany Pressato: https://github.com/CSSEGISandData/COVID-19
  [20:17, 21/04/2021] Diany Pressato: http://www.spcovid.net.br
  [7/19/2021 1:20:16 PM] https://brasil.io/dataset/covid19/caso_full/

Books

  (book) Finite difference methods for ordinary and partial differential equations
  (book) Mathematical models in population biology and epidemiology (2nd ed)
  (book) Stochastic population and epidemic models

#---------------------------------------------------------------------------------------------------
# Other resources pointed by team members
#---------------------------------------------------------------------------------------------------

Geometria Diferencial Discreta: https://www.youtube.com/playlist?list=PL9_jI1bdZmz0hIrNCMQW1YmZysAiIYSSS

Social impacts of COVID-19: https://www.un.org/development/desa/dspd/world-social-report/2021-2.html

[20:18, 21/04/2021] Diany Pressato:
  https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html#scipy.integrate.odeint

[19:34, 05/05/2021] Diany Pressato: https://colab.research.google.com/drive/11neEJ-HC2DFwPRdZ9gMH7QFH5_E9cwor

[20:17, 05/05/2021] Diany Pressato: resuminho da reunião:
https://drive.google.com/file/d/11H7pxtwMvtfPtrpoMWMKrDzcdBBGHdlM/view?usp=sharing

#---------------------------------------------------------------------------------------------------
# Ongoing tasks
#---------------------------------------------------------------------------------------------------

Tarefa que eu peguei:
1-) estamos sem os dados dos recovered. Seria interessante olharmos a referência dos dados do Infotracker para o cálculo dos recovered, baseada na metodologia da Universidade de Melbourne ( http://covid19forecast.science.unimelb.edu.au/ ); resumindo, precisamos estimar o parâmetro Rdata;

  Conforme nota publicada em https://www.viser.com.br/covid-19/sp-covid-info-tracker, "para aferir A(t) e Rec(t) tal como apresentada na plataforma on-line, foi considerada a Metodologia 2". Essa metodologia é descrita em [17].

[6/27/2021 12:41:45 PM] Found the code of the model used by the Imperial College team to produce the simulations that were presented to the UK government back in 16 March 2020:

  (1) the code is available in this repository:
    https://github.com/mrc-ide/covid-sim/tree/7282c948b940c8bd90d6afaa1575afb3848aa8b5

  (2) for a little bit of useful history, see:
    ./articles/Modeling the pandemic - The simulations driving the world's response to COVID-19.pdf
    ./articles/Influential pandemic simulation verified by code checkers.pdf


7/15/2021 3:00:26 PM

Beginning to tackle the problem of modelling the dynamics of the COVID epidemic. Some ideas:

# create class to tackle predictive tasks for univariate time series
# (1) you pass a set of univariate time series -- e.g., SIRD series for several cities
# (2) specify a (set of) model(s) -- we want to explore ensembles of (interpretable) estimators
# (3) fit the model to the each given series (and have it assessed -- save test results)
# (4) ask for a prediction of lenght h
# (5) create graphical aids to help making sense of the obtained results
# (*) document the premises and relevant literature along the code
#     (for easier reporting, in case this leads to a publication)

# path to follow: threads/thrusts of development

# (1) let's start by using some of the series in the ICMC LABIC's dataset,
#     then move to Brasil.io COVID dataset. To this effect, we need to change the preprocess.py module:
#     (a) we need a map to relate territories to their estimated populations
#         (and maybe also some indicators gathered from the last general elections?)
#     (b) we need to create an option to have time series segmented by territory
#         (i.e., each territory is combined with their SIRD series)

# (2) select a single, simple-to-understand (i.e., interpretable) model,
#     then move towards memory-based models,
#     then move towards representation-based models,
#     and then to ensemble of models
#     -- an initial list of candidate models:
#        kNN-TSPi,
#        SIRD,
#        SARIMA,
#        kNN-SAX,
#        kNN-SAX with our probabilistic representation,
#        ReGENN (https://drive.google.com/file/d/1Lq3j3PwwBG_4ekDOoQHHkhWGbGY01DM9/view)
#               (https://arxiv.org/pdf/2008.12833.pdf)
#        models based on ideas explored by Geoff Webb's team
#        (https://www.youtube.com/watch?v=SOnHXymw48k&t=1071s)

# (3) let's start by replicating Parmezan et al.'s framework of evaluation,
#     and then move towards performance metrics used in Funk et al.
      -- we need a dataset of time series. let's use series of different cities

# *** Is it true that [the dynamics of the disease in City-1 is more similar to that of City-2 than that of City-3 if City-1 is nearer to City-2 than to City-3]?

# *** Can we use the change in direction (i.e., increase or decrease) in a series for the City-1 to predict the change in direction for a series for a neighbouring City-2? I am seeking a way to approximate systematic population movements. Spadon's methods may be useful in answering this question.

# (4) this one is a bit tricky. let's adopt h=14 (2 weeks). Rationale:
#     (a) Funk et al. concluded that predictions beyond this timeframe are no good to orient public policy
#     (b) It seems that 2 weeks is a reasonable timeframe to update/review public policies in Brazil;
#         e.g., SS/SP used this timeframe in their intervention plan: http://www.legislacao.sp.gov.br/legislacao/dg280202.nsf/5fb5269ed17b47ab83256cfb00501469/35ea1f3341ab9b9c83258577004cd65e?OpenDocument&Highlight=0,64.994


# ideas to communicate and engage
# (1) [gamification] given a basic framework (developed here), let's set up (or join an existing) a Kaggle competition; create a video to explain the framework and use our meetings to address doubts and check progress

[7/17/2021 11:19:58 PM] Downloaded Parmezan's dataset from http://sites.labic.icmc.usp.br/icmc_tspr/index.php/datasets

[7/19/2021 12:31:18 PM]
This thread will be freezed until next week, so I outline the next steps to be taken.
(1) The inverse.py script succeeded in solving the inverse problem for COVID data (from Brasil-io), but it fails with simulated data (generated using SIRD.py script). It seems that it has to do with the fact that the simulation uses RK2 to simulate the SIRD model, and Euler's method is being used in the inverse.py script. Next step: adapt inverse.py to use RK2 as well. This will require going back to the bench to reformulate the estimation and reconstruction procedures.

(2) Inspecting the results for the inverse problem for the simulated data made me realise that we are seeking an 'theoryfull' estimator (in Valiant's sense), instead of a 'theoryless' estimator (like machine-learning or statistical methods), because we need interpretable models in health applications.

(3) What about inviting Spadon to give us a talk about ReGENN?