Why I Refuse to Use AI Audio: Insight from Industry Veteran Edward Ray

Guest Post by Edward Ray


Edward Ray, a veteran audio lead and composer across VR and console games, offers a deeply personal critique of the role AI is starting to play in game audio and what we stand to lose if we let it go unchallenged.

Edward takes it from here…


Each week, it seems, the discourse spirals anew: Generative art, copyright lawsuits, Midjourney prompts, fears of creative obsolescence. The conversation around AI in game development is loud, reactive and unrelenting.

But amid the noise, a strange silence: Barely anyone’s talking about audio.

Sound, that invisible architecture of meaning, the cradle of invisible vines that bypasses reason and strikes directly at the nervous system, is being reshaped by the same automation that’s upending visual pipelines.

Voice-overs, adaptive music stems, procedural ambiences, even entire scores are now within reach of a few clicks and a vaguely worded prompt. The rise of AI-generated music in game development is no longer theoretical, it’s happening. And yet, almost no one’s asking what we stand to lose should we, as humans, fail to push back.

Collaboration Is Not a Conveyor Belt

Obedience is often the enemy of excellence.

The best creative work, the kind that lingers in the mind long after the controller has been set down and the headphones come off, rarely arrives unchallenged.

It is forged in tension, trimmed by argument, elevated through constraint; at the very least it’s formed by suggestion. 

Some of the most vital moments in game development are born not from agreement but from resistance: when someone cares enough about the outcome to say, “I don’t think this is good enough”. A writer disagrees with a level designer, or a combat system clashes with pacing, or it may be when a lighting artist argues for mood over clarity. That is where the real shaping occurs and where taste is defined.

Audio is no exception. AI offers none of this friction. It yields instantly, endlessly, uncritically. It has no sense of narrative weight, no context for emotional pacing, no intuition for when a piece should hold its breath or explode.

What it offers instead is obedience. In creative work, obedience is often the enemy of excellence.

The Composer Between Titans

In milestone meetings, I’m often slotted between the art team and the design team. On one side, meticulously crafted character models, stunning environments, visual ideas that immediately impress...

On the other, clever mechanics, combat refinements, or innovative systems that demand interaction.

Then comes my turn. What I’m presenting is literally invisible.

It’s air moving. It bleeds into the scene, it cues before the player realises they need it. Sound that’s been shaped not just to play, but to react, to breathe alongside the game. It’s difficult to point to. Harder still to demo.

And because of that, audio is often misunderstood as ancillary, a polish pass, a postscript; something that exists to quietly glue the more important departments together. That misunderstanding has had consequences. Subtle at first, now systemic.

The Cult of Inoffensiveness

Passable is not memorable.

In game audio for VR, where immersion depends on detail, this shift is even more dangerous. Too many composers, whether through unclear briefs or pipeline pressure, have internalised the belief that their work should disappear.

That “polished” equates to “unobtrusive”, “ambient” means “passive”. It should be noninvasive, emotionally neutral, or, in my estimation, emotionally neutered.

What we are served is sonic beige. Gruel for the ears. Scores that neither elevate nor disrupt, existing only to fill the space between combat and cutscene. Music written not to say something but to avoid saying the wrong thing, and often saying nothing at all.

This collective retreat into the safe, the loopable, the instantly forgettable has quietly lowered expectations around game audio across the board.

Which is precisely why AI has become such a tempting substitute. Why not let a machine generate something that loops cleanly, hits a tempo mark and avoids offence? It’s easier than ever to get a passable result.

But passable is not memorable. Safe is not the same as effective.

The Missed Opportunity

There’s a quietly damaging belief in game development that music simply needs to be serviceable, that it should support the scene without ever stealing focus, that it should stay out of the way.

On the surface, that sounds reasonable, even humble. But in practice, it has become a way of justifying missed opportunities and mediocrity.

Serviceable music is music that behaves.

It follows the rules, hits the mark, leaves no trace. Games don’t become legendary on the back of functional choices. They endure due to emotional imprint. Their cultural endurance arrives as consequence to the fact that someone, somewhere, fought for a feeling that no one else could articulate at the time.

To reduce music to utility is to undermine what it’s capable of. It exists to summon mood, to shape time, to elevate a moment from interaction to experience. A well-placed cue can turn a mechanic into a memory. A theme can define a character more deeply than dialogue ever could.

A musical decision can say the one thing your scene couldn’t quite express, doing so with finality. Treating music as mere scaffolding doesn’t just sell the composer short, it sells the game short whilst simultaneously diluting your brand.

Music Should Not Blend In, It Should Break Through

The most iconic scores in games don’t apologize for their presence, they define it. They act as narrative guides, emotional anchors, identity markers.

Everyone has an example of a favorite soundtrack that didn’t fade politely into the background… It charged in, claiming the forefront. It owned the emotional arc of the experience for itself.

That kind of impact doesn’t happen by accident. It happens when someone with taste, intuition and a lifetime of listening understands what a moment demands and makes something bolder than any prompt could imagine.

AI can generate, yes. But it cannot insist, argue, or say, “No, that’s not it. Let me show you something better.” That’s where its ceiling lies.

Sonic Identity Is Branding and Biology

Sound is ancient, preverbal, evolutionarily wired. Before we saw the predator, we heard it. Before we even named the storm, we heard and feared its approach.

Music and sound design reach us in ways visual assets cannot. They collapse the distance between thought and feeling, sharpen tension, diffuse anxiety, suggest consequence, wrapping themselves like invisible vines around our psyche.

It’s why a three-note motif can carry an entire game’s legacy. It’s why startup chimes, menu clicks, and footstep textures still matter.

It’s why great sound is not an optional flourish but a foundational tool in player perception.

You cannot brand your game visually and then forget what it sounds like. If you do, you risk forging something akin to a house on a Hollywood set.

Does it look the part? Yes. But you and I both know there’s nothing behind those windows.

The Illusion of Scalability: AI in Game Development Isn’t Creative

Good audio isn’t automatic, because feeling isn’t automatic.

There’s a dangerous assumption in AI integration: creative work scales cleanly, automating five tasks means you can automate fifty, a passable loop means a full soundtrack. In all instances, one is grossly mistaken.

Audio doesn’t scale like asset production and it doesn’t multiply neatly or build upon itself in logical modules. It’s not a problem of volume, it’s a problem of judgment.

A single poor sound cue can collapse tension. A mistimed music swell can undermine a scene’s emotional arc.

Judgment does not scale, it’s honed over time, through mistakes, through taste, through countless invisible decisions no one ever hears but everyone feels.

The Vanishing Ear

There was a time when game scores carried an audible fingerprint, a sense of authorship and perspective. You could hear who made them.

Before the title screen even faded, a single chord or texture could tell you this world had intent. Someone built this soundscape to mean something. 

We are now in danger of losing that, replacing voice with blur, character with convenience.

The Rise of the Great Audio Flattening

This is the quietest threat AI poses to game audio: homogenisation. 

Not failure, not collapse, just sameness, the creep of safety, the soft erosion of distinction until a thousand games sound like slight variations of each other: all serviceable, all polished, all utterly unmemorable. 

The difference between “this works” and “this moves me” is not computational, it’s curatorial.

Curation requires someone with an ear not just for tone or texture but for meaning, someone who knows when a scene needs silence, distortion, or a theme so bold it risks everything, someone who knows when to push back, when to say no, when to fight for a moment no one asked for but everyone will remember.

The Burden of Taste

“You’re not wielding a tool; you’re rolling dice.”

This will sound arrogant. And I’ll say it anyway…

If you can’t explain why a melody left you feeling morose, a harmony felt divine, or a moment of silence spoke louder than a full orchestra, what are you going to tell the machine? 

The only thing you can, “Give me a track that sounds like ___”. And that’s all you’ll ever get: An imitation. By demanding only mimicry, you’ve reduced a crucial element of your game’s personality to a hollow pastiche, a pale tribute to something you can’t even define.

What instructions will you give it when your scene needs restraint, grief that doesn’t read as sadness, or joy that doesn’t feel cartoonish? 

What do you do when your scene asks for something human? 

AI can only give you what you ask for. If you lack the language, the instinct, or the lived experience to define the emotional shape of a moment, you’re not wielding a tool; you’re rolling dice. 

The burden of taste still falls on you. Automation doesn’t absolve anyone from having vision. 

Without that vision, the result won’t just be wrong; it’ll be nothing.

I’m Not Worried

I’m not afraid of AI destroying my livelihood.

Nothing I’ve said here is novel. These are things most people in games already know, even if they find it inconvenient to admit.

If you’re content with beige soundtracks and obedient machines, you were never the person I wanted to work with anyway.

I’m only speaking to those who value resonance. Those who believe sound, when wielded with purpose, can reach places no algorithm can touch.

And No. I Won’t Use AI to Compose.

AI evangelists say it will make life easier. Write faster, prototype quicker, automate the mundane so you can focus on art.

I don’t need it. I don’t struggle to form my ideas any more than I struggle to speak in full sentences. I know what I want and how to make it. I know when it’s wrong and why it’s failing.

I don’t need a tool to choose the right sample and I refuse to let a shortcut castrate the very thing that makes me worth hiring.

Those moments where it doesn’t come easy, that’s precisely the friction the project benefits from.

Edward Ray

Edward Ray is a composer and audio lead with credits across VR, console, and mobile titles. His work has appeared in games co-published by Meta, backed by Global IP’s and developed across Europe. He blends emotional force with technical control and believes game music should move something in you, or it’s just wallpaper.


View his entire porfolio here.

Want More?

Edward Ray dives deeper into these topics—and more from the world of audio engineering—in his YouTube series.

Next
Next

Paintball Gun Update for Monkey Doo