Kartik Goel

🎧 Behind the Skip: How Audio Streaming Just Works

May 30, 2025 · 5 min read

Have you ever listened to a song or podcast online, skipped to your favorite part, and it just worked instantly? That magic moment — where you drag the audio bar and the playback is still handled quietly — is thanks to something beautifully orchestrated system inside your browser — streams, buffers, byte ranges, and a little-known hero called the media engine.

Let's break it down and dive deep into how how all this happens under the hood.

PS: Will be explaining the same with the live example of a free audio streaming Platform built by me and my friends. - Rezonance

đź§  First: What Is Audio Streaming?

Imagine you're watching a YouTube video. You don't download the whole thing first — it starts playing right away. That's streaming — the browser is getting chunks of the file bit by bit and playing them as they arrive.

When you press "play", the browser starts downloading just the beginning of the audio.

⏩ What Happens When You click on Play?

When you click the play button, the browser's built-in media engine immediately issues a Range request for the audio URL:

Initial Range Request
GET /path/to/song.mp3
Range: bytes=0-

This header range =0 tells the server, "Start at byte 0 and send me as much as you can based on certain threshold," which is the main essence of streaming too. The server then responds with a 206 Partial Content status and a Content-Range header in the form:

Server Response
Content-Range: bytes X–Y/Z
Content-Type: audio/mp4
Network tab showing detailed headers with Content-Range bytes 0-9776699/9776700
Initial network request when clicking play - Content-Range: bytes 0-9776699/9776700
  • X is the starting byte (0 for the very beginning).
  • Y is the last byte the server will deliver in this chunk over the same open connection—once Y bytes have been sent, the connection can continue streaming until it ends by itself(network issue or abort) or the client pauses.
  • Z is the total size of the audio file in bytes.

Here Y can obviously be much less than Z depending upon the network bandwidth.

⏩ What Happens When You don't seek?

You would be thinking, if you don't seek then obviously the audio will stream till the song/audio ends, Absolutely right Pr0grammer, but we would be covering a bit of how it all happens under the hood.

Since you're not seeking, the TCP connection that's streaming the audio stays open. The browser's powerful media engine takes it from here:

  • It tracks how much data is already buffered and how far ahead it wants to stay.
  • If the amount of downloaded-but-not-yet-played data falls below a threshold (the low-water mark), the engine automatically fetches more data on the stream. [So that's the `buffer` for the "buffer" :P]

You'll notice in your Network tab that the data stream's Content-Length keeps growing as each byte of audio arrives in real time which adds to the total buffer.

Network tab showing the initial audio request when clicking play - single 206 request with 3,342 kB transferred
Network tab showing the initial audio request when clicking play - single 206 request with 3,342 kB transferred
Buffer growing progressively to 5456 kB
Buffer growing progressively, byte - 5456 kB

⏩ What Happens When You seek?

We should know that after parsing just a few header bytes of the audio file, the media engine immediately learns both the file's total duration and its total byte size through which it maps the time and byte and thus knows the byte corresponding to the time stamp.

So, the moment you seek, the engine calculates your new playhead position in bytes by using that same time-to-byte mapping and snapping to the nearest sync point. It then issues a Range header for that offset:

Seek Range Request
GET /path/to/song.mp3
Range: bytes=<newByteOffset>-
Network tab showing new request after seeking - fresh 206 request with 5,456 kB
New request triggered by seeking - browser abandons previous stream and starts fresh from the seek position

Pretty cool, right? What feels like a simple "skip to the good part" moment is actually a smart system working behind the scenes to make it smooth. Now you know what's really going on when you hit play or drag that audio bar. The browser's doing a lot more than just playing sound — it's streaming magic!

In our next blog, we will dive into how the audio from server is streamed to the client side with the magic of node pipeline.

So, Stay tuned! And do like, share if you learned something new from this blog :D