

I don’t know why they don’t, I work in music rather than TV/Film but it infuriated me too! Give me a voice volume control! It would be technically very easy to do implement as a standard but the powers that be just haven’t come together and done it!
I don’t know why they don’t, I work in music rather than TV/Film but it infuriated me too! Give me a voice volume control! It would be technically very easy to do implement as a standard but the powers that be just haven’t come together and done it!
As an audio engineer, this suggestion makes my skin crawl.
Don’t apply any extra compression to your files this, it will ruin them.
Modern audio streaming services and good audio players use loudness normalization to achieve consistent playback loudness. The way they do this is by measuring the integrated loudness of each song and increasing or, in most cases, reducing the playback gain of the song to an arbitrary target (e.g. Spotify has chosen -14LUFS which is pretty quiet when you consider most pop music is mastered to somewhere between -10LUFS and -3LUFS).
OP should just find a better audio player or figure out how to enable loudness normalization.
It’s really weird reading your comment because it reads as if I wrote it.
What kind of audio stuff do you do?
Unfortunately no, audio files are actually really dumb in that they’re basically just a file of 44100 (or 48000 or 96000 etc) amplitude numbers per second.
So there’s nothing really to diff because it’s basically just a squiggly line, set of squiggly lines or, when compressed, a mathematical expression that when decompressed, recreates a squiggly line.
You could isolate the dialog if you got ahold of a version with no dialog at all and then inverse the polarity of that and sum it with the original but it’s unlikely you’ll find a version without any vocals.
Machine learning vocal isolation tools are probably going to be the best way to go about it as a DIY approach. Ultimate Vocal Remover 5 with the demucs 4 algo is great FOSS software to extract vocals and you could sum that with the original track and adjust the gain to get louder dialogue… it would be a lot of work though…