The SoX of Silence

SoX is, by their own definition, the Swiss Army knife of audio manipulation.

And no doubt it’s full of fun with slicing and dicing and playback and recording and filtering and effects capabilities.

But SoX is a command line tool, which means obscure syntax and parameters in order to get things done.

I’ve been trying off and on for months to try to understand the silence filter from within SoX, which allows one to remove silence from the beginning, middle, or end of the audio. Sounds, simple, doesn’t it?  Well, it should be.

Below is the man page for the silence filter:

silence [-l] above-periods [duration threshold[d|%] [below-periods duration threshold[d|%]]

Removes silence from the beginning, middle, or end of the audio. Silence is anything below a specified threshold.

The above-periods value is used to indicate if audio should be trimmed at the beginning of the audio. A value of zero indicates no silence should be trimmed from the beginning. When specifying an non-zero above-periods, it trims audio up until it finds non-silence. Normally, when trimming silence from beginning of audio the above-periods will be 1 but it can be increased to higher values to trim all audio up to a specific count of non-silence periods. For example, if you had an audio file with two songs that each contained 2 seconds of silence before the song, you could specify an above-period of 2 to strip out both silence periods and the first song.

When above-periods is non-zero, you must also specify a duration and threshold. Duration indications the amount of time that non-silence must be detected before it stops trimming audio. By increasing the duration, burst of noise can be treated as silence and trimmed off.

Threshold is used to indicate what sample value you should treat as silence. For digital audio, a value of 0 may be fine but for audio recorded from analog, you may wish to increase the value to account for background noise.

When optionally trimming silence from the end of the audio, you specify a below-periods count. In this case, below-period means to remove all audio after silence is detected. Normally, this will be a value 1 of but it can be increased to skip over periods of silence that are wanted. For example, if you have a song with 2 seconds of silence in the middle and 2 second at the end, you could set below-period to a value of 2 to skip over the silence in the middle of the audio.

For below-periods, duration specifies a period of silence that must exist before audio is not copied any more. By specifying a higher duration, silence that is wanted can be left in the audio. For example, if you have a song with an expected 1 second of silence in the middle and 2 seconds of silence at the end, a duration of 2 seconds could be used to skip over the middle silence.

Unfortunately, you must know the length of the silence at the end of your audio file to trim off silence reliably. A work around is to use the silence effect in combination with the reverse effect. By first reversing the audio, you can use the above-periods to reliably trim all audio from what looks like the front of the file. Then reverse the file again to get back to normal.

To remove silence from the middle of a file, specify a below-periods that is negative. This value is then treated as a positive value and is also used to indicate the effect should restart processing as specified by the above-periods, making it suitable for removing periods of silence in the middle of the audio.

The option -l indicates that below-periods duration length of audio should be left intact at the beginning of each period of silence. For example, if you want to remove long pauses between words but do not want to remove the pauses completely.

The period counts are in units of samples. Duration counts may be in the format of hh:mm:ss.frac, or the exact count of samples. Threshold numbers may be suffixed with d to indicate the value is in decibels, or % to indicate a percentage of maximum value of the sample value (0% specifies pure digital silence).

The following example shows how this effect can be used to start a recording that does not contain the delay at the start which usually occurs between `pressing the record button’ and the start of the performance:

rec parameters filename other-effects silence 1 5 2%

Huh?

So lets try to clarify some of the mess from the man page.  First a couple of important notes:

  • When specifying duration, use a trailing zero for whole numbers of seconds (ie, 1.0 instead of 1 to specify 1 second). If you don’t, SoX assumes you’re specifying a number of samples.  Who on earth would want to specify samples instead seconds? You got me. Alternatively, you can specify durations of time in the format hh:mm:ss.frac.
  • Use at 0.1% at a minimum for an audio threshold. Even though 0% is supposed to be pure digital silence, with my test file I couldn’t get silence to trim unless I used a threshold larger than 0%. If you’d like, you can specify the threshold in decibels using d (such as -96d or -55d).
  • The realistic values for the above-period parameter are 0 and 1 and values for the below-period parameter are pretty much just -1 and 1. The documentation states that values larger than 1 can be used, but it only really makes sense for files with consistent audio breaks. Just trust me, it’s weird. I’ll get into what those values actually mean in the examples.

Now onto some examples! I’ll be showing you visually what happens to a sound file when we apply the various parameters to the silence filter.

I generated a test sound file with 60 seconds of white noise and then silenced various parts of the clip, leaving me with an audio file that looks like this:

SoX Silence Example (Original File)

Example 1: Trimming silence at the beginning

sox in.wav out1.wav silence 1 0.1 1%

The above-period parameter is first after the silence parameter, and for the sake of this article, it should be set to 1 if you want to use the filter. This example roughly translates to: trim silence (anything less than 1% volume) until we encounter sound lasting more than 0.1 seconds in duration. The output of this command produces the following:

sox in.wav out1.wav silence 1 0.1 1%

We’ve lopped off the silence at the beginning of the clip. For simplicity’s sake, we’ll refer to the 1% threshold as silence from now on.

Example 2: Ignoring noise bursts

sox in.wav out2.wav silence 1 0.3 1%

By changing the duration parameter to 0.3, we tell SoX to ignore the burst of noise at the beginning of the example clip. This produces the following:

sox in.wav out2.wav silence 1 0.3 1%

We can ignore short pops and clicks in audio by adjusting this duration parameter.

Example 3: Stopping recording when no sound detected

sox in.wav out3.wav silence 1 0.3 1% 1 0.3 1%

Now we introduce the below-period parameter it’s respective sub-parameters.  Just like the above-period parameter, just set it to 1 and call it good.  The command above translates to: trim silence until we detect at least 0.3 seconds of noise, and then trim everything after we detect at least 0.3 seconds of silence.

sox in.wav out3.wav silence 1 0.3 1% 1 0.3 1%

This returns a file with just the first 4 seconds of noise (note that we ignore that 0.25 sec burst of noise at the beginning). Where’s the rest of the clip?  Well, it’s gone. Not super practical for post-production of audio, but can be useful when recording live audio, so that SoX stops when it doesn’t encounter sound for a certain number of seconds.

So an aside: if you’re looking to trim silence from the beginning and the end of a audio file, you’ll need to utilize the reverse filter and a temp file like so:

sox in.wav temp.wav silence 1 0.1 1% reverse
sox temp.wav out.wav silence 1 0.1 1% reverse

Don’t forget to delete that temp.wav file when you’re done.

Example 4: Trimming all silence

sox in.wav out4.wav silence 1 0.1 1% -1 0.1 1%

By changing the below-period parameter to -1, we can trim instances of silence in the middle of the clip, by allowing the filter to restart after it detects noise of the specified duration.

sox in.wav out4.wav silence 1 0.1 1% -1 0.1 1%

In my example clip, it’s impossible to detect where the silence used to be, but with an actual podcast or other audio, it should be easier to tell.

Example 5: Ignoring short periods of silence

sox in.wav out5.wav silence 1 0.1 1% -1 0.5 1%

In similar fashion as Example 2, we can instruct SoX to ignore small moments of silence (1/2 second in this example).

sox in.wav out5.wav silence 1 0.1 1% -1 0.5 1%

When trimming silence from podcasts and the like, this prevents you from removing moments when someone stops to take a breath and making the conversation sound too rushed.

Example 6: Shortening long periods of silence

sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

So what if you wanted to just shorten long moments of silence rather than remove them entirely?  Well, you need to add the -l parameter, but it needs to be placed first, before the other parameters for the filter effect. The example above results in trimming all silence longer than 2 seconds down to only 2 seconds long.

sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

Note that SoX does nothing to bits of silence shorter than 2 seconds.

Example 7: Shortening long periods of silence and ignoring noise bursts

sox in.wav out7.wav silence -l 1 0.3 1% -1 2.0 1%

Finally, let’s tie it all together by trimming silence longer than 2 seconds down to 2 seconds long, but ignore noise such as pops and clicks amidst the moments of silence.

sox in.wav out7.wav silence -l 1 0.3 1% -1 2.0 1%

As a result you’ll see that we’ve cropped out the 0.25 seconds of noise at the beginning of the clip, but left the 0.5 seconds of noise in the middle.

For actual usage, you’ll probably want to specify something shorter than 0.3 seconds for the duration if you’re just trying to filter out pops and clicks.

Bonus Example 8: Splitting audio based on silence

sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart

Using SoX’s newfile pseudo-effect allows us to split an audio file based on periods of silence, and then calling restart starts the effects chain over from the beginning. In this example, SoX will split audio when it detects 5 or more seconds of silence. You’ll end up with output files named out001.wav, out002.wav, and so on.

Final Thoughts

There you have it.  This is what I know about the silence filter effect in SoX.  Example 7–where we trim some but not all of the silence and ignore pops and clicks–is ultimately what I was trying to figure out when writing this article, but I figure the other examples have got to be a good reference for somebody me.

The above and below-period values are still mostly a mystery to me.  I may address them in another post, but for now, I’m just going to use this as a cheat sheet in case I forget.

And don’t forget to use the trailing zero when specifying whole seconds. Even while writing this I forgot multiple times.

I welcome thoughts, ideas, comments, and corrections. Please.

(edit 11/14/10 to add names to each of the examples for clarification)
(edit 04/28/11 to add audio splitting example)

    • sox in.wav out.wav silence 1 0.8 1% 1 1.0 1% : newfile : restart

This entry was posted in Software and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

29 Comments

  1. mocha
    Posted December 2, 2009 at 1:01 am | Permalink

    This is excellent. Just what I was looking for thanks!

  2. Posted January 24, 2010 at 1:54 pm | Permalink

    This post was extremely useful. Thanks a million.

  3. John
    Posted March 20, 2010 at 12:15 pm | Permalink

    sox in.wav temp.wav silence 1 0.1 1% reverse
    sox temp.wav out.wav silence 1 0.1 1% reverse

    THANKS : works great for trimming, happy to find I was not the only one looking for an answer !

    question : how would you batch it

    • jason
      Posted March 21, 2010 at 8:18 am | Permalink

      Making a batch file would simply be replacing the file parameter with %1 like the following:

      sox %1 temp.wav silence 1 0.1 1%% reverse
      sox temp.wav trimmed-%1 silence 1 0.1 1%% reverse

      Which would turn your file.wav into trimmed-file.wav.
      (Note that in order to escape the percent sign for 1%, you’ll need to use two of them)

  4. Posted November 12, 2010 at 11:53 am | Permalink

    Good job. Thank you very much. It was hard to understand from the original documentation, your article is perfect!

    • lenik
      Posted December 23, 2010 at 9:28 pm | Permalink

      **EDIT** Your web site though it’s UTF-8, however doesn’t display Chinese character correctly, FYI.

  5. ???
    Posted December 23, 2010 at 9:26 pm | Permalink

    Great cheatsheet. I’m using this script to auto start/stop recording conference talkings.

  6. ???
    Posted December 23, 2010 at 9:26 pm | Permalink

    Great cheatsheet. I’m using this script to auto start/stop recording conference talkings.

  7. Posted February 1, 2011 at 8:37 pm | Permalink

    Thanks for helping to relax my forehead after 3 hours with the cryptic manpage. Talk about too close to the software. Any chance you could help me split a file based on silence?

    It basically works, but I need to parts to be the same length as the whole. The silence filter is stripping out chunks bigger than the threshold regardless of their length (as advertised).

    sox.exe "file.wav" "file_out.wav" silence 0 0 0.8 5% : pad 0.8 newfile : restart

    As you can see, I tried using pad, but as I’m sure you know, will replace any sized chunk with 0.8 seconds. The goal is to take a spoken audio file and break it up roughly by sentences for captioning.

    Thanks again for the great article.

    • Posted February 1, 2011 at 10:33 pm | Permalink

      The following split my test file into seven output files, where the last file was a 4kb stub without any audio, and the first six clips being all the 3+ seconds of noise:

      sox in.wav out.wav silence 1 0.8 1% 1 1.0 1% : newfile : restart

      The 0.25 and 0.5 second bursts were ignored. I tried using pad, but then the restart chain didn’t seem to process.

      Hope that helps.

      • Posted February 2, 2011 at 3:33 pm | Permalink

        The guidance is great but still no luck. The silence gaps are being trimmed out of the files. To be correct the split files should be able to be merged back together to reproduce the original file.

        I’ve found utilities like mp3splt that do this, but I’d have to compress to mp3, split the files, then convert to wav again. There’s a major loss in fidelity and processing overhead.

        I’d appreciate any insights if inspiration strikes. Thanks again.

        • Posted February 2, 2011 at 4:37 pm | Permalink

          Ah, I misunderstood what you were trying to do. But sadly, it doesn’t look like SoX will allow you to keep the silence intact–at least not that I was able to figure out.

          One suggestion might be to encode the wav to a high bitrate mp3 or lossless ogg, and then crank it through mp3split, or to convert to a low res mp3 (because it’s faster), use mp3split to detect and output the silence points, and then back to SoX and trim what you need from the original wav.

          Granted, that’s some funky overhead, but if you can get the process down, at least it would be automated.

          Good luck.

          • Posted February 3, 2011 at 1:05 am | Permalink

            Thanks a million. I actually tried Mp3splt yesterday and had to walk away because of the processing overhead and loss of fidelity going to MP3 and back. I didn’t think to use it for scouting the break points then let SoX do the chopping. That’s really smart.

            I actually have another step in my processing chain to detect the length of the chopped wav files that I’d be able to pitch, so I’d break even with the extra step anyways. Nice. Big thanks.

  8. Stephen Talley
    Posted February 21, 2011 at 7:41 am | Permalink

    Wow, thank you for deciphering that man page. It was extremely helpful.

  9. Posted March 22, 2011 at 1:50 pm | Permalink

    HI i wanted to know how to set recording device to mic (rear panel) i have realtek soound card.

    I tried -t waveaudio “Mic”
    but it did not detect the device and returned error. I could try “High Definition Audio” but all recording devices including mic and stereo output have this description. So how would you do it.

    If anyone who is good at using rec (or sox) please can i have your email or msn?

    • Posted March 23, 2011 at 7:38 am | Permalink

      I believe you’ll set the recording device using the Windows mixer and then

      sox -d recfile.wav

      And it should just work…

  10. Posted April 8, 2011 at 2:20 pm | Permalink

    Thanks very much for this, jason. I took your batch file and made the following bash script:

    #!/bin/bash

    for f in *.wav
    do
    sox "$f" "temp.wav" silence 1 0.1 1% reverse
    sox "temp.wav" "$f" silence 1 0.1 1% reverse
    done

    which simply trims the silence off both ends of all .wav files in the “$PWD”.

  11. Lasky
    Posted April 28, 2011 at 12:49 am | Permalink

    Thanks a lot for excellent explanation of that sox’s effect. Very useful.

    I did think about use the silence effect to split one huge file into set of files. The cut point would be a period of silence longer then some parameter (i.e. 5 s).

    Do you think it is possible using sox??

    Poul

    • Posted April 28, 2011 at 8:26 am | Permalink

      sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart
      is the way to split audio using SoX. I’ve added it as an example in the article.

  12. DickN
    Posted May 16, 2011 at 6:04 pm | Permalink

    That’s exactly the information I’ve been looking for. I’m a new SOX user and I have piles of sound files I’m converting to another format. Some of them have clicks at the beginning and some have long tails of dead air. This fixes both, so I can process the whole library in one batch.

    You don’t need the temp file. SOX allows multiple effects to be listed and will execute them serially. Thus,

    sox in.wav out.wav silence 1 0.1 1% reverse silence 1 0.1 1% reverse

    is legal and works just like your example. I added ‘norm’ before the first ‘silence’. It’s also a good idea to add ‘–no-clobber’ before the input file name, especially if you’re running this from a batch process.

  13. Joel
    Posted November 21, 2011 at 7:28 am | Permalink

    Good Job. Thanks for taking time to write and share this doc.

  14. Gerry
    Posted February 4, 2012 at 4:23 pm | Permalink

    Thanks for posting these notes – sox is a great tool.
    A question if I may… Is there a simple way to trim all silence except for say 100ms from the start of a file?

    • Posted February 5, 2012 at 9:33 am | Permalink

      My suggestion would be just to pad the output file with some silence after you’re done trimming it out. It adds an extra step, but if you can batch it out it shouldn’t matter much…
      sox infile.wav outfile.wav pad 0.1

  15. Martin
    Posted March 1, 2012 at 2:05 am | Permalink

    Thanks a lot. Ist was extremly helpfull!!

    Greetings
    Martin

  16. Ken
    Posted May 2, 2012 at 10:22 pm | Permalink

    Extremely useful, but….

    I want split my file.
    It start with a sound, not a silence, and I don’t want delete the initial sound.

    This command delete the first sound:
    sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart

    suggesions

    • Posted May 3, 2012 at 11:48 am | Permalink

      This command should keep the original sound if it’s more than 0.5 seconds long. You might try it with another file or a test file to make sure you’ve got all the parameters correct.

  17. capsula4
    Posted May 9, 2012 at 1:54 pm | Permalink

    I know this may sound pointless, but theres a way to actually keep the silence files? By this, I think I would be able to create a “noise” profile and then clean the audio.

    What I’m actually doing for getting a possible noise profile is extract 0.4 seconds of the beginning of an analog recording:
    sox f1.wav f2.wav trim 0 0.40
    sox f2.wav -n noiseprof noise.prof
    sox f1.wav f3.wav noisered noise.prof 0.3

    Finally I split by taking silence into account:
    sox f3.wav f4.wav silence 1 0.2 5% 1 0.1 5% : newfile : restart

    I just would like to first get the silence files instead of trimming.

    • Posted May 9, 2012 at 8:49 pm | Permalink

      Don’t think you can extract the silence using sox, at least I’m not aware of how to do it. I think you’d need some sort of specialized audio analysis utility to do that. Good luck though!

One Trackback

  1. [...] also learned a lot more about how to use the “silence” option in sox thanks to this blog post which I suggest you read to understand how to tweak the parameters.  The way I hardcoded the [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>