Article 2 Benchmarking Generative Language Models for Hungarian: Building a Foundation for Reliable Evaluation