Categories
Software

The SoX of Silence

SoX is, by their own definition, the Swiss Army knife of audio manipulation.

And no doubt it’s full of fun with slicing and dicing and playback and recording and filtering and effects capabilities.

But SoX is a command line tool, which means obscure syntax and parameters in order to get things done.

I’ve been trying off and on for months to try to understand the silence filter from within SoX, which allows one to remove silence from the beginning, middle, or end of the audio. Sounds, simple, doesn’t it?  Well, it should be.

Below is the man page for the silence filter:

silence [-l] above-periods [duration threshold[d|%] [below-periods duration threshold[d|%]]

Removes silence from the beginning, middle, or end of the audio. Silence is anything below a specified threshold.

The above-periods value is used to indicate if audio should be trimmed at the beginning of the audio. A value of zero indicates no silence should be trimmed from the beginning. When specifying an non-zero above-periods, it trims audio up until it finds non-silence. Normally, when trimming silence from beginning of audio the above-periods will be 1 but it can be increased to higher values to trim all audio up to a specific count of non-silence periods. For example, if you had an audio file with two songs that each contained 2 seconds of silence before the song, you could specify an above-period of 2 to strip out both silence periods and the first song.

When above-periods is non-zero, you must also specify a duration and threshold. Duration indications the amount of time that non-silence must be detected before it stops trimming audio. By increasing the duration, burst of noise can be treated as silence and trimmed off.

Threshold is used to indicate what sample value you should treat as silence. For digital audio, a value of 0 may be fine but for audio recorded from analog, you may wish to increase the value to account for background noise.

When optionally trimming silence from the end of the audio, you specify a below-periods count. In this case, below-period means to remove all audio after silence is detected. Normally, this will be a value 1 of but it can be increased to skip over periods of silence that are wanted. For example, if you have a song with 2 seconds of silence in the middle and 2 second at the end, you could set below-period to a value of 2 to skip over the silence in the middle of the audio.

For below-periods, duration specifies a period of silence that must exist before audio is not copied any more. By specifying a higher duration, silence that is wanted can be left in the audio. For example, if you have a song with an expected 1 second of silence in the middle and 2 seconds of silence at the end, a duration of 2 seconds could be used to skip over the middle silence.

Unfortunately, you must know the length of the silence at the end of your audio file to trim off silence reliably. A work around is to use the silence effect in combination with the reverse effect. By first reversing the audio, you can use the above-periods to reliably trim all audio from what looks like the front of the file. Then reverse the file again to get back to normal.

To remove silence from the middle of a file, specify a below-periods that is negative. This value is then treated as a positive value and is also used to indicate the effect should restart processing as specified by the above-periods, making it suitable for removing periods of silence in the middle of the audio.

The option -l indicates that below-periods duration length of audio should be left intact at the beginning of each period of silence. For example, if you want to remove long pauses between words but do not want to remove the pauses completely.

The period counts are in units of samples. Duration counts may be in the format of hh:mm:ss.frac, or the exact count of samples. Threshold numbers may be suffixed with d to indicate the value is in decibels, or % to indicate a percentage of maximum value of the sample value (0% specifies pure digital silence).

The following example shows how this effect can be used to start a recording that does not contain the delay at the start which usually occurs between `pressing the record button’ and the start of the performance:

rec parameters filename other-effects silence 1 5 2%

Huh?

So lets try to clarify some of the mess from the man page.  First a couple of important notes:

  • When specifying duration, use a trailing zero for whole numbers of seconds (ie, 1.0 instead of 1 to specify 1 second). If you don’t, SoX assumes you’re specifying a number of samples.  Who on earth would want to specify samples instead seconds? You got me. Alternatively, you can specify durations of time in the format hh:mm:ss.frac.
  • Use at 0.1% at a minimum for an audio threshold. Even though 0% is supposed to be pure digital silence, with my test file I couldn’t get silence to trim unless I used a threshold larger than 0%. If you’d like, you can specify the threshold in decibels using d (such as -96d or -55d).
  • The realistic values for the above-period parameter are 0 and 1 and values for the below-period parameter are pretty much just -1 and 1. The documentation states that values larger than 1 can be used, but it only really makes sense for files with consistent audio breaks. Just trust me, it’s weird. I’ll get into what those values actually mean in the examples.

Now onto some examples! I’ll be showing you visually what happens to a sound file when we apply the various parameters to the silence filter.

I generated a test sound file with 60 seconds of white noise and then silenced various parts of the clip, leaving me with an audio file that looks like this:

SoX Silence Example (Original File)

Example 1: Trimming silence at the beginning

sox in.wav out1.wav silence 1 0.1 1%

The above-period parameter is first after the silence parameter, and for the sake of this article, it should be set to 1 if you want to use the filter. This example roughly translates to: trim silence (anything less than 1% volume) until we encounter sound lasting more than 0.1 seconds in duration. The output of this command produces the following:

sox in.wav out1.wav silence 1 0.1 1%

We’ve lopped off the silence at the beginning of the clip. For simplicity’s sake, we’ll refer to the 1% threshold as silence from now on.

Example 2: Ignoring noise bursts

sox in.wav out2.wav silence 1 0.3 1%

By changing the duration parameter to 0.3, we tell SoX to ignore the burst of noise at the beginning of the example clip. This produces the following:

sox in.wav out2.wav silence 1 0.3 1%

We can ignore short pops and clicks in audio by adjusting this duration parameter.

Example 3: Stopping recording when no sound detected

sox in.wav out3.wav silence 1 0.3 1% 1 0.3 1%

Now we introduce the below-period parameter it’s respective sub-parameters.  Just like the above-period parameter, just set it to 1 and call it good.  The command above translates to: trim silence until we detect at least 0.3 seconds of noise, and then trim everything after we detect at least 0.3 seconds of silence.

sox in.wav out3.wav silence 1 0.3 1% 1 0.3 1%

This returns a file with just the first 4 seconds of noise (note that we ignore that 0.25 sec burst of noise at the beginning). Where’s the rest of the clip?  Well, it’s gone. Not super practical for post-production of audio, but can be useful when recording live audio, so that SoX stops when it doesn’t encounter sound for a certain number of seconds.

So an aside: if you’re looking to trim silence from the beginning and the end of a audio file, you’ll need to utilize the reverse filter and a temp file like so:

sox in.wav temp.wav silence 1 0.1 1% reverse
sox temp.wav out.wav silence 1 0.1 1% reverse

Don’t forget to delete that temp.wav file when you’re done.

Jakob points out in the comments that you can trim silence from both ends in one fell swoop by chaining the effects like so:

sox in.wav out.wav silence 1 0.1 1% reverse silence 1 0.1 1% reverse

Example 4: Trimming all silence

sox in.wav out4.wav silence 1 0.1 1% -1 0.1 1%

By changing the below-period parameter to -1, we can trim instances of silence in the middle of the clip, by allowing the filter to restart after it detects noise of the specified duration.

sox in.wav out4.wav silence 1 0.1 1% -1 0.1 1%

In my example clip, it’s impossible to detect where the silence used to be, but with an actual podcast or other audio, it should be easier to tell.

Example 5: Ignoring short periods of silence

sox in.wav out5.wav silence 1 0.1 1% -1 0.5 1%

In similar fashion as Example 2, we can instruct SoX to ignore small moments of silence (1/2 second in this example).

sox in.wav out5.wav silence 1 0.1 1% -1 0.5 1%

When trimming silence from podcasts and the like, this prevents you from removing moments when someone stops to take a breath and making the conversation sound too rushed.

Example 6: Shortening long periods of silence

sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

So what if you wanted to just shorten long moments of silence rather than remove them entirely?  Well, you need to add the -l parameter, but it needs to be placed first, before the other parameters for the filter effect. The example above results in trimming all silence longer than 2 seconds down to only 2 seconds long.

sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

Note that SoX does nothing to bits of silence shorter than 2 seconds.

Example 7: Shortening long periods of silence and ignoring noise bursts

sox in.wav out7.wav silence -l 1 0.3 1% -1 2.0 1%

Finally, let’s tie it all together by trimming silence longer than 2 seconds down to 2 seconds long, but ignore noise such as pops and clicks amidst the moments of silence.

sox in.wav out7.wav silence -l 1 0.3 1% -1 2.0 1%

As a result you’ll see that we’ve cropped out the 0.25 seconds of noise at the beginning of the clip, but left the 0.5 seconds of noise in the middle.

For actual usage, you’ll probably want to specify something shorter than 0.3 seconds for the duration if you’re just trying to filter out pops and clicks.

Bonus Example 8: Splitting audio based on silence

sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart

Using SoX’s newfile pseudo-effect allows us to split an audio file based on periods of silence, and then calling restart starts the effects chain over from the beginning. In this example, SoX will split audio when it detects 5 or more seconds of silence. You’ll end up with output files named out001.wav, out002.wav, and so on.

Final Thoughts

There you have it.  This is what I know about the silence filter effect in SoX.  Example 7–where we trim some but not all of the silence and ignore pops and clicks–is ultimately what I was trying to figure out when writing this article, but I figure the other examples have got to be a good reference for somebody me.

The above and below-period values are still mostly a mystery to me.  I may address them in another post, but for now, I’m just going to use this as a cheat sheet in case I forget.

And don’t forget to use the trailing zero when specifying whole seconds. Even while writing this I forgot multiple times.

I welcome thoughts, ideas, comments, and corrections. Please.

(edit 11/14/10 to add names to each of the examples for clarification)
(edit 04/28/11 to add audio splitting example)
(edit 12/06/12 to add one line silence trimming) 

    • sox in.wav out.wav silence 1 0.8 1% 1 1.0 1% : newfile : restart

136 replies on “The SoX of Silence”

Hi,
Really useful tricks to remove silence.
However I have an audio file with so many beep sounds in it and I am looking for a way to split this file into multiple files by removing beep sound from it. (I do have another recording with beep sound which was used to create new file with beep sounds in it). Is there any way to achieve this?
Thanks
Sam

Hello
i want to create a demo song through SOX. I have an original song and a beep sound. I want to add this beep sound to be appended in each 10 second of the original song.

Any idea ?

You may be able to generate 10 seconds of silence and append your beep to it, then merge it all with your song, but it sounds like you might be better suited with a digital audio editing software like Audacity.

Hello !
Can any body please tell me the command that how to trim file from the end
on the bases of silence instead of noise .
Thanks

Thanks for the idea,but still I am not getting the silence at the end of file , the word is loosing at the end.
Here is the script I am using
sox input.wav output.wav reverse silence 1 0.864 1% 1 reverse

Thanks and regards.

I see lots of reversing in there and it’s a little unclear what’s going on. I don’t have the exact command to give you, but you might try breaking out the reversing into separate commands so you can get a clearer picture of what’s going on. Also, I see an extra 1 parameter at the end of your silence filter that might be causing some issues. I’m not a developer of SoX, so anything deeper might be better suited for their mailing lists: http://sourceforge.net/p/sox/mailman/

I’d pretty much given up on figuring out the silence detection until finding this blog post. Thanks!

Any idea why it generates those tiny stub files when splitting the audio? Every time the last file seems to be a few bytes in length. No great issue but it would be nice if it didn’t!

Hi!
I’m trying to extract files with silence instead of usual files(last example). And I haven’t any ideas how I may do it( Jason Navarrete, is exist any way?

Hi, first of all compliments for the great tutorial.

I would like to shorten the silence at the beginning and at the end of the track down to 0.15s, but apparently the command to shorten the silence can’t be applied at the beginning of the track. Indeed sox will try to remove all the silence at the beginning of the track.

If run sox like this:

sox in.wav out.wav silence -l 1 0.1 1% -1 0.15 1%

All the silences are shortened but the one at the beginning of the track is completely removed.

I can run sox like this:

sox in.wav out.wav silence -l 1 0.1 0% -1 0.15 1%

avoiding the silence removal at the beginning of the track if it’s non-absolute silence, but it’s still not a general solution.

Otherwise I can pad the resulting track with silence after processing it, but it would be better to maintain the non perfect silence present in the track.

Any suggestion to improve my solution?

Actually I’m wrong. Calling this:

sox in.wav out.wav silence -l 1 0.1 0% -1 0.15 1%

will have no effect because the 0% threshold affects also the second part of the script, so that the silence needs to surpass both thresholds to be removed from the middle and end of the track. There’s something I don’t understand on the logic of this filter.

Hi, thank you for these great examples. But somehow when i run the sox as described in some of them, it does not use all the parameters.
In fact,
sox in.mp3 out.mp3 silence 1 0.1 1% -1 0.1 1%
and
sox in.mp3 out.mp3 silence 1 0.1 1%
makes the same out.mp3 with only silence from the beginning of the file is cutted out. Even in cmd.exe window both commands are shown as “sox in.mp3 out.mp3 silence 1 0.1 1” without the rest of the parameters in former case. I’m using windows 7. What could be the problem?

The problem is solved. Don’t use bat file for this effect. Bat file doesn’t take into account anything after %.

Your article is really helpful.
BTW, is there any way we can identify start of the audio in a recording? i.e. if we have an audio file with some silence at the start and end of the file and then we want to identify actual point where audio starts in that audio file.
I don’t think silence filter would be useful here.

I trim an item I record from a stream which has a 2 sec pause at the start and the end. I record 10 sec extra at the beginning and the end to allow for streaming slop then use sox to find out how long to first 2 second gap
first trim the track from audio start to 2 seconds silence
first take start of file to 2 sec silence
sox -V3 recorded.wav clipped.wav silence -l 0.5 0.1% 2.0 0.1%
Then find out how long it is.
LENGTH=` sox clipped.wav -n stat 2>&1 | sed -n ‘s#^Length (seconds):[^0-9]*\([0-9.]*\)$#\1#p’ `
Now cut that off the front of the audio
sox -V3 recorded.wav head.wav trim $LENGTH
then we clean the end
sox -V3 head.wav clapped.wav reverse silence -l 0.5 0.1% 1.8 0.1% reverse
TAIL=` sox clapped.wav -n stat 2>&1 | sed -n ‘s#^Length (seconds):[^0-9]*\([0-9.]*\)$#\1#p’ `
Now cut that off the end of the audio
sox -V3 /home/rd/audiolab/head.wav justtheaudiowewant.wav reverse trim $TAIL reverse

Sox wont clip the tail but if you reverse it silence is silence and sox in this case finds the first instance and trims it then reverses the file so it plays head out.

Took me a while but it seems to work OK. My script has >> toa.log on each line so I can track errors.

I also run a silence check over the whole file to make sure we have continuous audio. But thats well covered here already

Hi there
Thanks for this super helpful article,in example 8 do you or anybody have any idea if it is possible to name the new files created based on when the noise was found in the recording?
so rather than having:
out001.wav
out002.wav
etc you would have:
out1:12.wav
out2:34.wav
etc if sound was found at 1 min 12 sec and 2 min 34 seconds, the format of the timestamp doesn’t mater but it would be super useful to be able to remove silence and still maintain that piece of data on when the sound occurred in the recording.

thanks!

Taylor

This is best desc about SoX’s silence for me and the best description which I ever get from web. Thank for your effort.

Hey there, thanks for this, so many years since you wrote it now.
I have this weird bug I wonder if you found? I wrote a batch to remove silence from my podcast (your script), then add intro, outtro, levelator, convert to mp3, add tags, etc.
When I run this command in command line, it works great
sox rawshow.wav x01NoSilence.wav silence 1 0.1 1% -1 0.3 1%
but when I put it in a batch, it seems to run successfully, it generates the new file, but the silences are still there. It’s weird!

I tried diffent methods to remove digital silence from the start and end of a file about an hour long:
WavTrim (Windows GUI program)
sox in.wav out.wav silence 1 0 0 reverse silence 1 0 0 reverse
shntool trim in.wav

SoX somehow removed about 32000 samples more than WavTrim and shntool. WavTrim removed 1 sample more than shntool. I checked with Cool Edit 2000 and the file trimmed by shntool contains nonzero samples (value -1) in at least one channel on the start and the end. Therefore WavTrim seems to be off by one sample. I think shntool is the most accurate.

Brilliant.

I’ve been trying to figure things out for myself regarding the SoX ‘silence’ filter, and here you are with examples already on-line. If you ever figure out (and I know this was written years and years ago) wheat above- and below-period values are for, I’d be happy to read that, too.

Thank you for this write-up! I wanted to split audio based on silence. The audio was a set of children’s stories, each a couple of minutes long, digitally captured as a single file.

After some experimenting, what worked for me was:

sox in.wav out.wav silence 1 0.01 0.1% 1 2.5 1% : newfile : restart

I had to put the first threshold values very low (but not zero), or otherwise it would chop off the start of initial soft consonants like ‘s’ and ‘f’.

(Wow, this article is almost 10 years old! Perhaps someone another 10 years from now may benefit from this.)

Thanks for that. On windows the % character causes problems. I haven’t found how to properly escape it, all of the usual suspects fail. I workaround it by using the -Xd decibels notation.

Also worth noting that :restart:newfile won’t work it has to be : restart : newfile, with a space before and after the colon. That cost me some time that I won’t get back.

Great summation of a very confusing feature. Thanks so much!

Instead of shortening silences, can I find and extend a silent period? Audacity has a plugin that will find a silence longer than a given length the extend the silence by a percentage. I would like to duplicate that using SOX. Any ideas?

I struggle with tuning this to delete silence plus minor clicks at the beginning, but keep everything from the first actual word. In all the examples, you use generated white noise which is a consistent volume. For real world examples, I often find that that if I set the threshold volume level too low, it keeps undesirable clicks, and if I set it too high, it cuts the initial words or parts of words (extended consonant sounds like ‘sh’ and ‘ch’ sounds at the beginning of a word are particularly vulnerable). It’s trivial to get it right for any specific clip, but if the point is to automate silence removal until there is real human speech, something that’s obvious when listening manually, it can be difficult to get that right for all files.

Stranger, in watching the waveforms of what SoX keeps and cuts in Audacity, I don’t quite understand the logic. For example, if I set the volume threshold to -42dB, it cuts initial words, even though those words come through in Audacity peaking at -15dB (much louder than -42dB), but if I set to -48dB, those words survive the operation, but so do very short undesirable clicks. I suspect it has something to do with the duration (I generally use the .1s used here) and volume combined, such that SoX doesn’t recognize audio as being over that -42dB threshold if it drops below that level during the word (like between syllables) preventing it from detecting a full .1s at sufficient volume. Dropping that to 0.08s can help, but I’ve yet to find a single good solution that always works.

i noticed that sometimes that while the silence is removed there is overlap and it makes it sound like two songs are playing at the same time, i used the syntax from example 8 to cut out the silence of a DJ’s mix show for a radio station. which paramater is it i need to allow short noise bursts and also allow for some (i dunno 5 seconds or so) of silence but not so much our silence detector is triggered

Leave a Reply

Your email address will not be published. Required fields are marked *