openai/whisper-large-v3 · Why does a tiny silence at the start of my audio change Whisper’s transcription?

Hi everyone, I’m using OpenAI’s Whisper for speech recognition.
My audio says “ABC1234,” but sometimes the model outputs “AVC1234.” If I prepend a short silence (e.g., 10ms), it switches to the correct “ABC1234,” but increasing that silence (20ms, 30ms, 40ms, etc.) makes it flip back and forth between “ABC1234” and “AVC1234.”
Even replacing silence with white noise has the same effect.

Has anyone else run into this issue? Why does adding a tiny bit of audio cause such unpredictable changes in the transcription?
Any insights or suggestions would be really helpful!