I'm a composer, developer and filmmaker.

YouTube has a lot of... less-than-friendly UI elements, but one large improvement in recent years has been the addition of thumbnails when scrubbing through a video. They make it a lot easier to skip to a specific place in a hurry, without knowing the timestamp. This isn't much use when listening to audio-focused videos, though.

Navigating audio-focused content is much harder, particularly when the video comprises only album art or a cover image. For long videos, like multi-movement classical pieces, it can be a real pain to find the spot you're looking for. Little chirps of audio when scrubbing wouldn't do much to help navigate (and are also really irritating; Adobe Premiere does this and I find it more annoying than helpful). Instead, I think replacing the seek bar with the audio waveform would be really helpful, much like SoundCloud and other places use. Sure, it doesn't give hugely specific information about the audio stream, but it does at least provide reliable landmarks. For multi-movement pieces of music, or even albums uploaded as a single video, dips in the waveform would be a very useful guide for drop-in points.

In fact, this would be good for any video content. The ability to see a sudden spike in volume ahead of time would be nice; I've had plenty of occasions where sudden loud noises have had me scrambling to mute my headphones. I'd be quite happy to have a way to avoid that.

P.S. This is the video I used for the mockup.