Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
Paper
•
2510.01817
•
Published
•
11
Experimental models with Sparse Query Attention layers. Reducing training time/cost by ~3-10% compared to GQA & MQA, with the same level performance