From 1ae7f724bc1ee907661cbaedd8796a0dee56d370 Mon Sep 17 00:00:00 2001 From: bizovi Date: Sun, 23 Feb 2025 13:35:30 +0200 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- search.json | 2 +- sim/1_intro.html | 61 +++++++++++++++++++++++++++++++++++++++++++++++- sitemap.xml | 2 +- 4 files changed, 63 insertions(+), 4 deletions(-) diff --git a/.nojekyll b/.nojekyll index 2ec94bd..e469210 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -6b2ba647 \ No newline at end of file +98128fdb \ No newline at end of file diff --git a/search.json b/search.json index ec159ac..b9746c4 100644 --- a/search.json +++ b/search.json @@ -267,7 +267,7 @@ "href": "sim/1_intro.html#stories-and-case-studies", "title": "Simulation of economic processes", "section": "Stories and case-studies", - "text": "Stories and case-studies\nSo far, you studied probability, statistics, operations research, and economics from an ultimately mathematical and procedural point of view (i.e. steps to solve a problem). This is a solid foundation to have, but it needs to be balanced out with practical aspects of modeling, data analysis, and programming.\nI will be using stories and real world case-studies to highlight the practical relevance of theoretical ideas like Central Limit Theorem (CLT), Law of Large Numbers (LLN), p-values, conditioning, common distributions, etc. Moreover, by simulation, we’ll reinforce our knowledge and understanding, which will help us avoid most common pitfalls in statistical modeling.\nA funny thing is that we can use simulation to better understand probability and statistical inference, and at the same time, probability justifies why simulation works. I can’t emphasize enough how much your learning will improve if you will apply what you learn here in your econometrics, data analysis, and quantitative economics classes. If you do not trust me, trust Richard Feynman, who said: “What I cannot build, I do not understand.”\nYou can think of this course as having two parts. First, we use simulation as a problem-solving approach to gain insight into an economic problem. The second part develops specialized methods for estimation, sampling, approximation, and optimization – which can be viewed as tools to overcome a range of technical problems in statistical modeling.\n\n\n\n\n\n\nThe bare minimum\n\n\n\nSimulation is perhaps the most beginner-friendly course you had, because we need to know just two things to get started.\n\nHow to generate iid, uniformly distributed pseudo-random numbers. This problem is solved, since all programming languages have good RNG (random number generators).16\nHow to generate samples from any probability distribution which is “well-behaved”. Fortunately, a theorem called “Universality of the Uniform” proves that we can and gives us the method for doing so. In R or Python we have access to efficient implementations for the vast majority of distributions we’ll ever need.\n\nThis is not sufficient to understand why simulation works, how to apply it effectively, or how to sample from complicated distributions which don’t have a closed-form solution. However, you can still go a long way in practice with these two simple facts.\n\n\n\n\n16 You should still know how are they computed behind the scenes and what happens when they are not-so-random", + "text": "Stories and case-studies\nSo far, you studied probability, statistics, operations research, and economics from an ultimately mathematical and procedural point of view (i.e. steps to solve a problem). This is a solid foundation to have, but it needs to be balanced out with practical aspects of modeling, data analysis, and programming.\nI will be using stories and real world case-studies to highlight the practical relevance of theoretical ideas like Central Limit Theorem (CLT), Law of Large Numbers (LLN), p-values, conditioning, common distributions, etc. Moreover, by simulation, we’ll reinforce our knowledge and understanding, which will help us avoid most common pitfalls in statistical modeling.\nA funny thing is that we can use simulation to better understand probability and statistical inference, and at the same time, probability justifies why simulation works. I can’t emphasize enough how much your learning will improve if you will apply what you learn here in your econometrics, data analysis, and quantitative economics classes. If you do not trust me, trust Richard Feynman, who said: “What I cannot build, I do not understand.”\nYou can think of this course as having two parts. First, we use simulation as a problem-solving approach to gain insight into an economic problem. The second part develops specialized methods for estimation, sampling, approximation, and optimization – which can be viewed as tools to overcome a range of technical problems in statistical modeling.\n\n\n\n\n\n\nThe bare minimum\n\n\n\nSimulation is perhaps the most beginner-friendly course you had, because we need to know just two things to get started.\n\nHow to generate iid, uniformly distributed pseudo-random numbers. This problem is solved, since all programming languages have good RNG (random number generators).16\nHow to generate samples from any probability distribution which is “well-behaved”. Fortunately, a theorem called “Universality of the Uniform” proves that we can and gives us the method for doing so. In R or Python we have access to efficient implementations for the vast majority of distributions we’ll ever need.\n\nThis is not sufficient to understand why simulation works, how to apply it effectively, or how to sample from complicated distributions which don’t have a closed-form solution. However, you can still go a long way in practice with these two simple facts.\n\n\n16 You should still know how are they computed behind the scenes and what happens when they are not-so-randomFirst, we’ll need to set-up the programming environment (R and RStudio), create a script or (quarto) notebook and we’re ready to go! Take your time to understand how to navigate the IDE, run commands, investigate the errors, and read the documentation. We want to solve problems and program, thus techical issues like how to run a line of code or where can I find the output shouldn’t stand in our way.\n\n\n\n\n\n\nThe full-luxury development setup\n\n\n\n\nR v4.4.2 (later than 4.3.x)\nRStudio as our main IDE\nQuarto or Rmarkdown for literate programming (only needed towards the end of the course)\nInstalling tidyverse will get us most needed packages:\n\ndplyr for data wrangling, purrr for functional programming, and stringr / glue for making our life easier with text\nggplot for data visualization\n\n\nR is a beginner-friendly language, but has many gotchas because of its weak data types. Tidyverse is an important ecosystem of packages which solves a lot of the issues in base R and makes our life easier and coding much more pleasant.\nIf you’re really serious about becoming a data scientist or a ML engineer, you will have to study and practice a lot on your own. This is a non-exhaustive list of practical skills you will need in the future.\n\ngit & github for versioning your code and collaborating on a code-base\nrenv for managing environments and packages\nduckdb to practice SQL and data engineering on a analytical, columnar, in-process database\ndata.table and arrow for processing huge amounts of data\nhow to build interactive applications in Shiny and model serving APIs in plumbr\nhow to use the command line interface and automate stuff in Linux\nhow to package and deploy your training and prediction pipelines, possibly to a cloud provider\n\n\n\nWithout further ado, here are the stories and case-studies we’re going to discuss and implement in code, along with theoretical ideas they highlight.\nWe’ll start with a warm-up. Birthday paradox is a straightforward, but counter-intuitive result which is a good opportunity to review key concepts from combinatorics and apply the naive definition of probability. We’ll derive the analytical solution, visualize it, and compare with our simulation. This simple exercise will teach us most things we will need for the near future: how to sample, generate sequences, work with arrays, functions, data frames, and repeat a calculation many times.\n\nNewsvendor problem and inventory optimization is a great case-study of decision-making under uncertainty\nShowing up to a safari. This is a cute story which will teach us about the intuitions behind Binomial distribution, probability trees, independence, and sensitivity analysis\nSimulations which exemplify convergence in probability and the law of large numbers. We’ll discuss why Monte Carlo methods work and what happens if we forget to take into consideration sample size\nUS schools, the most dangerous equation. This is a great example of what can go wrong when we draw conclusions from data via a sloppy analysis.17 CLT is just one of key theoretical ideas from statistics which could’ve prevented the policy makers to start a costly and wasteful project.\nQuality control and measurement error. We’ll discuss the story of Gosset in Guiness beer factories, the original purpose for which the t-test was invented, the philosophy and practice of hypothesis testing. A key idea is one of action in the long run and error control (not being too wrong too often). This perspective of statistics justifies our definition of “changing your actions in the face of evidence”.\nWhat p-values can we expect? Most people misunderstand the idea of p-values. We will use an example of a scientific study and its probability to find a true, significant finding. In simulation, we can see the distribution of p-values, which is impossible to know for sure in practice, even after a rigorous meta-analysis.\nWikipedia A/B test and Landon vs Roosevelt elections. These stories serve as drills so that you remember how to calculate confidence intervals for proportions and interpret them correctly. They also serve as a warning of what can go wrong if we fail to randomize.\nBayes rule, medical testing, and coding bugs. Bayesian thinking and the idea of updating your beliefs is key for rational decision-making under uncertainty. You will also get into the habit of articulating and elliciting your prior knowledge (before seeing the data) about a problem.\nBeta-Binomial model of left-handedness and quality control. This will be perhaps the only time in your bachelor’s degree where you will encounter a fully Bayesian approach to inference (and not just an application of Bayes rule). We will learn what probability distributions are appropriate in modeling proportions and a principled approach to deal with misclassification error.\n\nI will mention how can we model counts of asthma deaths and kidney cancer with the Gamma-Poisson model, and how it can be applied to customer purchasing behavior.\nBy a similar reasoning we’ll see how can we model waiting times in a Starbucks by a Gamma-Exponential model. This is precisely the reason why you studied all of those discrete and continuous distributions in your probability class – they have a purpose and are useful!\n\nLinear regression and confounding. You probably heard a hundred times that correlation doesn’t imply causation. I will show three simple causal mechanisms and graphs of influence which can mislead us into a wrong conclusion: confounders, mediators, and colliders. We’ll discuss real-world studies about divorce rates, gender discrimination, and ethical intuitions.\n\nIn the context of linear regression, we’ll use bootstrap as an alternative way to compute confidence intervals. It’s a must-have technique in your quantitative toolbox, which will be useful any time you don’t have a theoretically proven sampling distribution.\n\nJustifying the sample size for an A/B test. Power calculations is what trips up most product managers and analysts, who end up confused by the online calculators or complicated formulas (which I’ve seen mostly misused). This is where simulation shines and will be helpful in choosing the appropriate sample size of data we need to collect (or how long the experiment should run), while being very clear about our assumptions and expectations.\n\n17 I really like the metaphor of “fooled by randomness”At last, after we get a sense how Markov Chain Monte Carlo works, we will gain a new superpower – to sample from unknown distributions. We can apply it to the modeling of grouped / clustered / correlated data, like in a classical case-study of radon (toxic gas) concentrations in U.S. homes in various counties.\nThe idea of partial pooling will help us to automatically adjust our inferences (means and variances) with respect to the sample size of each group (county). This is an example of a multilevel or hierarchical model, for which Bayesian inference has many benefits. However, we don’t have analytical solutions and will have to use a much more sophisticated algorithm to generate samples from the posterior distribution of parameters.\nTreat this topic of multilevel modeling as the culmination of the course, which also serves as a powerful complementary approach to what you study in econometrics.", "crumbs": [ "| 1. Business |", "~ Simulation of economic processes" diff --git a/sim/1_intro.html b/sim/1_intro.html index f02f1b7..9045a13 100644 --- a/sim/1_intro.html +++ b/sim/1_intro.html @@ -594,9 +594,68 @@

Stories and case-

This is not sufficient to understand why simulation works, how to apply it effectively, or how to sample from complicated distributions which don’t have a closed-form solution. However, you can still go a long way in practice with these two simple facts.

+

16 You should still know how are they computed behind the scenes and what happens when they are not-so-random

First, we’ll need to set-up the programming environment (R and RStudio), create a script or (quarto) notebook and we’re ready to go! Take your time to understand how to navigate the IDE, run commands, investigate the errors, and read the documentation. We want to solve problems and program, thus techical issues like how to run a line of code or where can I find the output shouldn’t stand in our way.

+
+
+
+ +
+
+The full-luxury development setup +
+
+
+
    +
  • R v4.4.2 (later than 4.3.x)
  • +
  • RStudio as our main IDE
  • +
  • Quarto or Rmarkdown for literate programming (only needed towards the end of the course)
  • +
  • Installing tidyverse will get us most needed packages: +
      +
    • dplyr for data wrangling, purrr for functional programming, and stringr / glue for making our life easier with text
    • +
    • ggplot for data visualization
    • +
  • +
+

R is a beginner-friendly language, but has many gotchas because of its weak data types. Tidyverse is an important ecosystem of packages which solves a lot of the issues in base R and makes our life easier and coding much more pleasant.

+

If you’re really serious about becoming a data scientist or a ML engineer, you will have to study and practice a lot on your own. This is a non-exhaustive list of practical skills you will need in the future.

+
    +
  • git & github for versioning your code and collaborating on a code-base
  • +
  • renv for managing environments and packages
  • +
  • duckdb to practice SQL and data engineering on a analytical, columnar, in-process database
  • +
  • data.table and arrow for processing huge amounts of data
  • +
  • how to build interactive applications in Shiny and model serving APIs in plumbr
  • +
  • how to use the command line interface and automate stuff in Linux
  • +
  • how to package and deploy your training and prediction pipelines, possibly to a cloud provider
  • +
+
+
+

Without further ado, here are the stories and case-studies we’re going to discuss and implement in code, along with theoretical ideas they highlight.

+

We’ll start with a warm-up. Birthday paradox is a straightforward, but counter-intuitive result which is a good opportunity to review key concepts from combinatorics and apply the naive definition of probability. We’ll derive the analytical solution, visualize it, and compare with our simulation. This simple exercise will teach us most things we will need for the near future: how to sample, generate sequences, work with arrays, functions, data frames, and repeat a calculation many times.

+ +

17 I really like the metaphor of “fooled by randomness”

At last, after we get a sense how Markov Chain Monte Carlo works, we will gain a new superpower – to sample from unknown distributions. We can apply it to the modeling of grouped / clustered / correlated data, like in a classical case-study of radon (toxic gas) concentrations in U.S. homes in various counties.

+

The idea of partial pooling will help us to automatically adjust our inferences (means and variances) with respect to the sample size of each group (county). This is an example of a multilevel or hierarchical model, for which Bayesian inference has many benefits. However, we don’t have analytical solutions and will have to use a much more sophisticated algorithm to generate samples from the posterior distribution of parameters.

+

Treat this topic of multilevel modeling as the culmination of the course, which also serves as a powerful complementary approach to what you study in econometrics.

-

16 You should still know how are they computed behind the scenes and what happens when they are not-so-random

+ diff --git a/sitemap.xml b/sitemap.xml index 8a46226..84d4e92 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -26,7 +26,7 @@ https://course.economic-cybernetics.com/sim/1_intro.html - 2025-02-23T09:15:44.546Z + 2025-02-23T11:33:45.854Z https://course.economic-cybernetics.com/references.html