Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10_inference_in_bayesian_networks_1 #13

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.ini
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
201 changes: 201 additions & 0 deletions notebooks/10_inference_in_bayesian_networks_1/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Inference in Bayes Nets 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done with your Lecture Note! You've done a great job!

These few comments will help your LN get the best possible score:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. We appreciate your suggestions.

  • Images sizes were specified, but Github ignored them. I replaced the div HTML tag with p, which is now correctly shown.
  • The whole document is now justified, and yes, it looks much better!

About the second suggestion, Github does not support math equations (there is an open issue for that), but Mr. Zehtab (@vahidzee ) have mentioned in Telegram that equations will be shown up in the Webifier after deployment. We have already tested the equations in our editor (VSCode), and they seem sound. By the way, if there is still a problem with them, we will be happy to solve them.

## Table of Contents

- [Introduction](#introduction)
- [Inference by Enumeration](#inference-by-enumeration)
- [Algorithm Explanation](#algorithm-explanation)
- [Algorithm Steps](#algorithm-steps)
- [Algorithm Pseudocode](#algorithm-pseudocode)
- [Algorithm Time Complexity](#algorithm-time-complexity)
- [Algorithm Example](#algorithm-example)
- [Inference by Variable Elimination (Marginalizing Early)](#inference-by-variable-elimination-marginalizing-early)
- [Algorithm Explanation](#algorithm-explanation-1)
- [Algorithm Steps](#algorithm-steps-1)
- [Algorithm Pseudocode](#algorithm-pseudocode-1)
- [Algorithm Time Complexity](#algorithm-time-complexity-1)
- [Ordering Polytree Variables for VE](#ordering-polytree-variables-for-ve)
- [Cut-set Conditioning](#cut-set-conditioning)
- [Algorithm Example](#algorithm-example-1)
- [Conclusions](#conclusions)
- [References](#references)

<div style="margin: auto; width: 50%;"><img src="./assets/intro.png" /></div>

## Introduction

The basic task of a Bayesian network is to compute the posterior probability distributions for a set of query variables, given an observation of a set of evidence variables. This process is known as inference, but is also called Bayesian updating, belief updating or reasoning. There are two ways to approach this, either exact or approximate. Both approaches are worst-case NP-hard. An exact method obviously gives an exact result, while an approximate method tries to approach the correct outcome as close as possible. In this lecture, we discuss the exact inference method. Approximate (Sampling) method will be discussed in the next lecture.

## Inference by Enumeration

The enumeration algorithm is a simple, brute-force algorithm for computing the distribution of a variable in a Bayes net. In this algorithm, we partition all Bayes net variables into three groups:

1. evidence variables
2. hidden variables
3. query variables

This algorithm takes query variables and evidence variables as input, and outputs the distribution of query variables. The evidence $e$ is whatever values you already know about the variables in the Bayes net. Evidence simplifies your work because instead of having to consider those variables’ whole distributions, you can assign them particular values, so they are no longer variables, they are constants. In the most general case, there is no evidence.

### Algorithm Explanation

This algorithm has to compute a distribution over $X$, which, because $X$ is a discrete variable, means computing the probability that $X$ takes on each of its possible values (the values in its domain). The algorithm does this simply by looping through all the possible values, and computing the probability for each one. Note that if there is no evidence, then it is literally just computing the probabilities $P(X=x_i)$ for each $x_i$ in $X$’s domain. If there is evidence, then it is computing $P(e, X=x_i)$ for each $x_i$ in $X$’s domain – that is, it is computing the probability that $X$ has the given value $x_i$ and the evidence is true – so in that case, we use the law of conditional probability, which says that $$P(X=x_i | e) = \frac{P(e, X=x_i)}{P(e)}$$. Once we have computed $P(e, X=x_i)$ for all $x_i$, we can just normalize those values to get the correct distribution $P(X | e)$.

### Algorithm Steps

We can summarize the explained algorithm in the following steps:

1. Select the entries consistent with the evidence.
2. Sum out the hidden variables to get joint distribution of query and evidence variables.
3. Normalize the distribution to get the distribution of query variables.

### Algorithm Pseudocode

```python
def enumeration_ask(X, e, bn):
"""
Input:
X: query variable
e: observed values for all variables
bn: given Bayes net
Output:
P(X | e)
"""
q = ProbDist(X) # a ditribution over X, where q(x) = P(X=x)
for xi in X.domain:
q[xi] = enumerate_all(e + [(X, xi)], bn.vars)
return q.normalize()

def enumerate_all(e, vars):
"""
Input:
e: observed values for all variables plus a new variable X assignment
vars: list of all variables
Output:
P(e[-1] | e[:-1])
"""
if not vars.any():
return 1.0
Y = vars[0]
if e.contains(Y):
return probability_condition_parents(Y, e, vars[1:]) * enumerate_all(e, vars[1:])
else:
sum = 0.0
for yi in Y.domain:
sum += probability_condition_parents(Y, e + [(Y, yi)], vars[1:]) * enumerate_all(e + [(Y, yi)], vars[1:])
return sum
```

### Algorithm Time Complexity

In the worst case, we have no evidence, so we have to for loop through all possible values of all variables. Hence, this algorithm has a complexity of $O(d^n)$ where $d$ is the size of the domain of the variables and $n$ is the number of variables.

### Algorithm Example

[Here](https://youtu.be/BrK7X_XlGB8) is a video of the algorithm running on a simple example.

## Inference by Enumeration is Slow

In enumeration, first we find the whole joint distribution, and then we marginalize out the hidden variables. Hence, it is too slow because of the big joint distribution we need to compute.

If we marginalize out the hidden variables in the partial joint distribution, we can get a much faster algorithm. This method is called Inference by Variable Elimination.

<div style="margin: auto; width: 100%;"><img src="./assets/comparison.png" /></div>

## Inference by Variable Elimination (Marginalizing Early)

The point of the variable-elimination algorithm is that it is more bottom-up than top-down. Instead of figuring out the probabilities we need to compute and then computing all the other probabilities that each one depends on, we try to compute probabilities and then compute the other terms that depend on them, and repeatedly simplify the expression until we have something that is in terms of only the variable we’re looking for.

The variable-elimination algorithm uses things called factors. A factor is basically a CPT, except that the entries are not necessarily probabilities (but they would be if you normalized them). You can think of a factor as a matrix with a dimension for each variable, where $Factor[VAL1][VAL2][…]$ is (proportional to) a probability such as
$$P(VAR1=VAL1, VAR2=VAL2, …)$$
or you can think of it as a table with one row for each possible combination of assignments of values to the variables.

We also define two operations on factors:
1. Join
2. Eliminate

Join is used to combine two factors. For example, if we have two factors $F_1$ and $F_2$, we can compute the joint distribution of $F_1$ and $F_2$ by multiplying their probabilities together, corresponding to the variable's value in factors.

Eliminate is used to eliminate a variable from a factor. For example, if we have a factor $F$, and we want to eliminate $X$ from $F$, we can group rows by non-$X$ values and sum probabilities. This is exactly marginalization.

> - Join is exactly like SQL join.
> - Eliminate is like a group by and SUM aggregation function in SQL.

### Algorithm Explanation

This algorithm, like the previous one, takes a variable X and returns a distribution over X, given some evidence e. First, it initializes the list of factors; prior to any simplification, this is just the conditional probability tables for each variable given the evidence e. Then, it joins each factors with it. The summing-out process takes all the factors that depend on a given variable and replaces them with a single new factor that does not depend on that variable (by summing over all possible values of the variable). By the end of the loop, all the variables have been summed out except the query variable X, so then we can just multiply the factors together and normalize to get the distribution.

### Algorithm Steps

1. Initialize the list of factors which are local CPTs instantiated by the evidence.
2. While there are any hidden variables:
- Join all the factors containing the hidden variable.
- Eliminate the hidden variable.
3. Join all remaining factors.
4. Normalize the resulting factor.

### Algorithm Pseudocode

```python
def elimination_ask(X, e, bn):
"""
Input:
X: query variable
e: observed values for all variables
bn: given Bayes net
Output:
P(X | e)
"""
factors = [t.marginalize(X) for t in bn.cpt]
for var in bn.vars:
if var not in e and var != X:
relevant_factors = [f for f in factors if var in f.vars]
for f in relevant_factors:
factors.remove(f)
factors.append(f.eliminate(var, join(relevant_factors)))
return join(factors).normalize()
```

### Algorithm Time Complexity

The computational and space complexity of variable elimination is determined by the largest factor. In the worst case, this algorithm has exponential complexity, like the enumeration algorithm. But variable elimination ordering can greatly affect the largest factor. For example, in the following Bayes net, assuming the query is $P(X_n | Y_1, …, Y_n)$, the largest factor for following orders are different:

- $Z, X_1, …, X_n \rightarrow 2^{n+1}$
- $X_1, …, X_n, Z \rightarrow 2^{2}$

<div style="margin: auto; width: 40%;"><img src="./assets/ve-ordering.png" /></div>

There is no general ordering that provides only small factors.

### Ordering Polytree Variables for VE

Polytree is a directed graph which has no undirected cycles. In this special kind of graph we can introduce an algorithm for ordering nodes in order to achieve small factors, and efficiently compute the joint distribution of the variables.

1. Drop edge directions
2. Pick an arbitrary node as the root
3. Do depth first search from the root
4. Sort the resulting nodes in topological order
5. Reverse the order

Now, if we eliminate variables with this order, we would never get a factor larger than the original factors. This makes the VE algorithm linear time complexity.

### Cut-set Conditioning

We can cut the bayes net at an instantiated variable, and this can transform a multi connected graph into a polytree, for which we can find the order of elimination. If these variables are not actually known, we can set them to each of their possible values and then solve the problem with the polytree. You can see an example of this below.

<div style="margin: auto; width: 70%;"><img src="./assets/conditioning-example.png" /></div>

### Algorithm Example

[Here](https://youtu.be/w4sJ8SazmFo) is a video of elimination of a variable from a set of factors.

## Conclusions

We reviewed two major exact inference algorithms, the enumeration algorithm and the variable elimination algorithm. The enumeration algorithm is a simple algorithm that is easy to understand and implement, but it is not very efficient. On the other hand, the variable elimination algorithm is a more complex algorithm that is more efficient, but it is also harder to understand and implement. For both introduced algorithms, the worst-case time complexity is exponential. So in practice, using sampling is usually a better choice, which you will learn more about in the next lecture.

## References

- [Class Presentation](http://ce.sharif.edu/courses/99-00/1/ce417-2/resources/root/Slides/PDF/Session%2013_14.pdf)
- [Visualizing Inference in Bayesian Networks](http://www.kbs.twi.tudelft.nl/Publications/MSc/2006-JRKoiter-Msc.html)
- [Exact Inference in Bayes Nets](http://courses.csail.mit.edu/6.034s/handouts/spring12/bayesnets-pseudocode.pdf)
- [Variable Elimination](https://ermongroup.github.io/cs228-notes/inference/ve/)
- [Bayesian Networks - Inference (Part II)](https://ccc.inaoep.mx/~esucar/Clases-mgp/Notes/c7-bninf-p2.pdf)
29 changes: 29 additions & 0 deletions notebooks/10_inference_in_bayesian_networks_1/matadata.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
title: Inference in Bayes Nets 1

header:
title: Inference in Bayes Nets 1 # title of your notebook
description: An Introduction to Inference in Bayesian Networks Part 1 (Up to Sampling)

authors:
label:
position: top
content:
- name: Parsa Mohammadian
role: Author
contact:
- link: https://github.com/parsa2820
icon: fab fa-github
- name: Sara Azarnoush
role: Author
contact:
- link: https://github.com/saaz742
icon: fab fa-github
- name: Kasra Amani
role: Author
contact:
- link: https://github.com/iTskAsra
icon: fab fa-github

comments:
label: false
kind: comments
2 changes: 2 additions & 0 deletions notebooks/index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,6 @@ notebooks:
kind: S2021, LN, Notebook
#- notebook: notebooks/17_markov_decision_processes/
- notebook: notebooks/18_reinforcement_learning/
kind: S2021, LN, Notebook
- notebook: notebooks/10_inference_in_bayesian_networks_1/
kind: S2021, LN, Notebook