Q-learning-like RL algorithm that utilizes vector-symbolic architecture and hypervectors. Solves the cart-pole environment. Now includes Double-Q learning and a target model to improve the stability of the reward convergence. This code routinely converges to > 200 iterations upright well within 100 about episodes. Max performance goes into the tens of thousands of iterations upright.