We continue with the proof of [[Theorem 6|Myhill-Nerode Theorem]] which was started in [[Verification 12]].
We assume
==Goal:==
Contruct a [[Deterministic Finite State Automata|DFA]]
How do we construct such a [[Deterministic Finite State Automata|Automaton]]?
$Q={[u]_{\approx_L} | u \in A^\star}$
What does this mean: It means that we create a state for every equivalence class that exists when using the equivalence relation
$\approx_L$
$q_0=[\epsilon]_{\approx_L}$
We choose the initial state to be the equivalence class of the empty word. i.e
$\forall x \in A^\star \quad | \quad x \approx_L \epsilon$
$F={[u]_{\approx_L} | u \in L}$
We create final states for all equivalence classes which contain at least one word that is part of the language
$L$
-
$\delta=Q \times A \rightarrow Q$ $\delta([u]{\approx_L},a):=[u \cdot a]{\approx_L}$
==But we have a problem!==
We agreed that $[u]{\approx_L}$ is a state $q$ and $[u \cdot a]{\approx_L}$ is a state
In general: for arbitrary [[Equivalence problem]] we could have a case:
But we have a special case: We have a right invariant equivalence as of the definition:
Previously we have just assumed that
To prove that
Our goal is to proove that:
So what is the definition of
==This is a little bit obviouse== but needs to be shown.
Now the last step is to show that
==Note the invariant:==
graph LR
start --> q0
q0(("q0=[epsilon]"))--w-->qw(("qw=[w]"))
qw--a-->qwa(("qwa=[wa]=[w]"))
We proof this by induction by the length of
if
And when is
With this the [[Theorem 6|Myhill-Nerode Theorem]] is proven!
When we talk of learning we normally mean supervised learning. The architecture is fixed and we only optimize the weights of the nodes so that in the end the [[Neural network]] can then do something for you.
But this time we work with [[Deterministic Finite State Automata|Automata]] because it has other applications:
What are the differences: The data: for [[Neural network]]s the input is very often a fixed sized tuple of Real numbers. With [[Deterministic Finite State Automata|Automata]] the size is not fixed, the data can even be infinite.
[[Neural network]] | [[Deterministic Finite State Automata|Automata]] | |
---|---|---|
Data | simple data: |
complex data e.g. : |
what can be learned? | complex functions: |
simple functions: |
architecture | architecture is fixed | states+transition are learned |
The teacher only provides labels and data. But it does not interact with the neural network. ![[Verification 13_image_1.png|300]]
There is a back and forth. The learner asks the teacher and the teacher then responds.
An example: The learner creates a image of a dog. The teacher than says if this is a dog. If not the learner adapts the creation of the image.
Sometimes the active scenario is not ==practial==. Tends to be more efficient.
![[Verification 13_image_2.png|300]]
- 1956 Moore introduced Active learning: Application we know only the output of the machine. We want to figure out how the input looks like.
- 1986 Angluin an algorithm how learn an automaton based on queries.
- 1993 PAC learning find a statistical approximation of a automaton based on queries
- 1995 learn [[omega language]]
- 1996 learn word transformations -> automaton that has a word as input and a word as output
- 2015 Guy from amazon proposes spectral techniques for learning
pause
==Step 1==
The ==teacher== has a secret regular language
Note: Why DFA? Comparing of languages only takes [[P-Time]]
==Step 2==
Learner can choose between two queries while estimating its [[Deterministic Finite State Automata|DFA]]
- either a membership query "is
$w$ in$L_0$ " - or a equivalence query "is
$L(A) = L_0$ " or in other words are$A_0$ and$A$ [[Equivalence problem|equivalent]]
This is a cooperative game: The teacher is not the opponent of the learner.
==step 3== The teacher answers accordingly
- yes if
$w \in L_0$ no if otherwise - Yes if
$L(A) = L_0$ i.e. the Learner wins and the game ends. Otherwise the teacher will return the shortes possible counterexample: He choses a word$w$ so that$w \in (L_0-L(A)) \cup (L(A)-L_0)$ this is the [[symmetric difference]] of the Languages. This are all the words that either do belong to$L_0$ but not to$L(A)$ i.e. ($(L_0-L(A))$) or they belong to$L(A)$ but not to$L_0$ i.e. ($L(A)-L_0$ ).
The fact that the teacher chooses the shortest counterexample has an influence on the complexity of the problem.
==Observation:== A membership querry is not enough for the learner to win. This is in particular about [[Deterministic Finite State Automata|DFA]]s that have an infinite amount of words. Therfore we outlaw this. The learner can only win by a equivalence query.
==Other observation== Can we win only using equivalence queries? We can always win by asking only equivalence queries in finitely many steps!! There are countably many [[Deterministic Finite State Automata|Automata]] therefore we can just go through all of them at some point finding our desired [[Regular Languages|Regular language]].
How fast can we win? How many queries do we need to do till we win only using equivalenece queries? The number of queries is exponential to the number of states of the target (secret) [[Deterministic Finite State Automata|Automata]]. If we want to find a [[Deterministic Finite State Automata|Automata]] with 10 states we have to do before the [[Deterministic Finite State Automata|Automata]] with 1 state, 2 states... till we are at 10 states.
So what do membership queries do?. ==When using equivalence queries and membership queries one can find the automaton in [[P-Time|polynomial time]].==
[!note] [[Theorem 7]] The learner has always a strategy in a number of rounds in the game that is [[P-Time|polynomial time]] in relation to the number of states of
$A_0$
The [[Minimally adequat teacher]] is a perfect teacher that knows the language. Why bother playing the game then? Why should he not just hand over the [[Deterministic Finite State Automata|Automata]] to the learner?
Well this is true and in practice the [[Minimally adequat teacher]] seldomly is present. We do not have any info on how the [[Deterministic Finite State Automata|Automaton]] actually looks like.
We have often a Black box though. Like a machine where we can check if the word send during a membership query is present. For instance we can try the input on the coffemachine where we want to estimate the internal machine.
Then over time we can approximate the [[Deterministic Finite State Automata|Automaton]] by checking again and again the accepted words of the proposed machine of the learner. If the [[Deterministic Finite State Automata|Automaton]] accepts all the words then we can say: Well it looks good. This might be the machine.
In practice this methods are used in ==protocoll state fuzzing==
If there is a regular language you can create it by a union of equivalence classes. Each puzzle is one equivalence class. Each [[Equivalence problem|equivalence class]] contains words.
![[Verification 13_image_3.png|600]]
![[Verification 13_image_4.png|600]]
We will introduce a new equivalence: We do not test the [[Theorem 6|Myhill-Nerode equivalence]] for all words but only for a set of words
![[Verification 13_image_5.png]]
Of course as
If
If we define
What does this mean. That we have a control knob how accurate we want to estimate the Language
Myhill rode is a very fine equivalence. The blue and grey puzzle pieces. The relativeized equivalences are more coarse. The red borders over the puzzle pieces in the image below. ![[Verification 13_image_7.png|600]]
==Definitions==
A learner constructs automata from pairs
[[T-Minimal]]
A set
![[Verification 13_image_8.png|300 ]]
[[T-complete]]
[[T-complete]] means that for every word
![[Verification 13_image_9.png|700]]
I the end we will define the transitions as the arrows and the red dots as the states resulting in a learned automaton that accepts the language
Now the learner can add more states by increasing T. And reduce the states by reducing the number of elements in T.