https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture) https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/1810.04805 https://arxiv.org/abs/1706.03762 https://arxiv.org/abs/2210.05189 https://arxiv.org/abs/2308.16512 https://github.com/neuralmagic/vllm-flash-attention https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf