Causal performer slower than causal regular attention #66

JamesDeAntonis · 2021-04-23T16:25:53Z

For some reason, our causal performer runs slower than that of causal regular attention. You observe that performer is faster, even in the causal case right? Curious how to troubleshoot this (we don't use the full PerformerLM, just CrossAttention and SelfAttention, not sure if that's relevant)

lucidrains · 2021-04-24T02:40:53Z

@JamesDeAntonis do you mean on training or eval?

JamesDeAntonis · 2021-04-26T21:59:57Z

We observed in both. I heard from here that the reason is caching? Are you still planning to implement it?

lucidrains · 2021-04-27T16:36:16Z

@JamesDeAntonis training is as fast as it can be - basically, if you are training at less than 2048 context length, you should expect it to be same or slower

eval should be really fast though, and that's something i could work on. it should be as fast as an RNN in the end. i'll take a look at it later this week!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Causal performer slower than causal regular attention #66

Causal performer slower than causal regular attention #66

JamesDeAntonis commented Apr 23, 2021

lucidrains commented Apr 24, 2021

JamesDeAntonis commented Apr 26, 2021 •

edited

Loading

lucidrains commented Apr 27, 2021

Causal performer slower than causal regular attention #66

Causal performer slower than causal regular attention #66

Comments

JamesDeAntonis commented Apr 23, 2021

lucidrains commented Apr 24, 2021

JamesDeAntonis commented Apr 26, 2021 • edited Loading

lucidrains commented Apr 27, 2021

JamesDeAntonis commented Apr 26, 2021 •

edited

Loading