https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)
https://arxiv.org/abs/2010.11929
https://arxiv.org/abs/1810.04805
https://arxiv.org/abs/1706.03762
https://arxiv.org/abs/2210.05189
https://arxiv.org/abs/2308.16512
https://github.com/neuralmagic/vllm-flash-attention
https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf