You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An Attention mechanism is considered Turing Complete if it can model any kind of sequence computation. Global LEAP with $O(1)$ path length and multiple heads should be enough to easily prove that LEAP can replicate any computation within Turing Completeness simply by performing the same steps as Turing Machine (using similar ideas and assumptions as Pérez, J., Marinković, J., & Barceló, P. (2019) like arbitrary precision, infinite recursive steps, and hard attention). Then the local/windowed attention will just allow for more parallel computation if only local computations are needed.
If this can be shown, it may be of less importance to perform one-to-one comparisions with GPT2 as there is theory to back up the expressiveness of the architecture/attention mechanism
The text was updated successfully, but these errors were encountered:
An Attention mechanism is considered Turing Complete if it can model any kind of sequence computation. Global LEAP with$O(1)$ path length and multiple heads should be enough to easily prove that LEAP can replicate any computation within Turing Completeness simply by performing the same steps as Turing Machine (using similar ideas and assumptions as Pérez, J., Marinković, J., & Barceló, P. (2019) like arbitrary precision, infinite recursive steps, and hard attention). Then the local/windowed attention will just allow for more parallel computation if only local computations are needed.
If this can be shown, it may be of less importance to perform one-to-one comparisions with GPT2 as there is theory to back up the expressiveness of the architecture/attention mechanism
The text was updated successfully, but these errors were encountered: