Categories
Software

The SoX of Silence

SoX is, by their own definition, the Swiss Army knife of audio manipulation.

And no doubt it’s full of fun with slicing and dicing and playback and recording and filtering and effects capabilities.

But SoX is a command line tool, which means obscure syntax and parameters in order to get things done.

I’ve been trying off and on for months to try to understand the silence filter from within SoX, which allows one to remove silence from the beginning, middle, or end of the audio. Sounds, simple, doesn’t it?  Well, it should be.

Below is the man page for the silence filter:

silence [-l] above-periods [duration threshold[d|%] [below-periods duration threshold[d|%]]

Removes silence from the beginning, middle, or end of the audio. Silence is anything below a specified threshold.

The above-periods value is used to indicate if audio should be trimmed at the beginning of the audio. A value of zero indicates no silence should be trimmed from the beginning. When specifying an non-zero above-periods, it trims audio up until it finds non-silence. Normally, when trimming silence from beginning of audio the above-periods will be 1 but it can be increased to higher values to trim all audio up to a specific count of non-silence periods. For example, if you had an audio file with two songs that each contained 2 seconds of silence before the song, you could specify an above-period of 2 to strip out both silence periods and the first song.

When above-periods is non-zero, you must also specify a duration and threshold. Duration indications the amount of time that non-silence must be detected before it stops trimming audio. By increasing the duration, burst of noise can be treated as silence and trimmed off.

Threshold is used to indicate what sample value you should treat as silence. For digital audio, a value of 0 may be fine but for audio recorded from analog, you may wish to increase the value to account for background noise.

When optionally trimming silence from the end of the audio, you specify a below-periods count. In this case, below-period means to remove all audio after silence is detected. Normally, this will be a value 1 of but it can be increased to skip over periods of silence that are wanted. For example, if you have a song with 2 seconds of silence in the middle and 2 second at the end, you could set below-period to a value of 2 to skip over the silence in the middle of the audio.

For below-periods, duration specifies a period of silence that must exist before audio is not copied any more. By specifying a higher duration, silence that is wanted can be left in the audio. For example, if you have a song with an expected 1 second of silence in the middle and 2 seconds of silence at the end, a duration of 2 seconds could be used to skip over the middle silence.

Unfortunately, you must know the length of the silence at the end of your audio file to trim off silence reliably. A work around is to use the silence effect in combination with the reverse effect. By first reversing the audio, you can use the above-periods to reliably trim all audio from what looks like the front of the file. Then reverse the file again to get back to normal.

To remove silence from the middle of a file, specify a below-periods that is negative. This value is then treated as a positive value and is also used to indicate the effect should restart processing as specified by the above-periods, making it suitable for removing periods of silence in the middle of the audio.

The option -l indicates that below-periods duration length of audio should be left intact at the beginning of each period of silence. For example, if you want to remove long pauses between words but do not want to remove the pauses completely.

The period counts are in units of samples. Duration counts may be in the format of hh:mm:ss.frac, or the exact count of samples. Threshold numbers may be suffixed with d to indicate the value is in decibels, or % to indicate a percentage of maximum value of the sample value (0% specifies pure digital silence).

The following example shows how this effect can be used to start a recording that does not contain the delay at the start which usually occurs between `pressing the record button’ and the start of the performance:

rec parameters filename other-effects silence 1 5 2%

Huh?

So lets try to clarify some of the mess from the man page.  First a couple of important notes:

  • When specifying duration, use a trailing zero for whole numbers of seconds (ie, 1.0 instead of 1 to specify 1 second). If you don’t, SoX assumes you’re specifying a number of samples.  Who on earth would want to specify samples instead seconds? You got me. Alternatively, you can specify durations of time in the format hh:mm:ss.frac.
  • Use at 0.1% at a minimum for an audio threshold. Even though 0% is supposed to be pure digital silence, with my test file I couldn’t get silence to trim unless I used a threshold larger than 0%. If you’d like, you can specify the threshold in decibels using d (such as -96d or -55d).
  • The realistic values for the above-period parameter are 0 and 1 and values for the below-period parameter are pretty much just -1 and 1. The documentation states that values larger than 1 can be used, but it only really makes sense for files with consistent audio breaks. Just trust me, it’s weird. I’ll get into what those values actually mean in the examples.

Now onto some examples! I’ll be showing you visually what happens to a sound file when we apply the various parameters to the silence filter.

I generated a test sound file with 60 seconds of white noise and then silenced various parts of the clip, leaving me with an audio file that looks like this:

SoX Silence Example (Original File)

Example 1: Trimming silence at the beginning

sox in.wav out1.wav silence 1 0.1 1%

The above-period parameter is first after the silence parameter, and for the sake of this article, it should be set to 1 if you want to use the filter. This example roughly translates to: trim silence (anything less than 1% volume) until we encounter sound lasting more than 0.1 seconds in duration. The output of this command produces the following:

sox in.wav out1.wav silence 1 0.1 1%

We’ve lopped off the silence at the beginning of the clip. For simplicity’s sake, we’ll refer to the 1% threshold as silence from now on.

Example 2: Ignoring noise bursts

sox in.wav out2.wav silence 1 0.3 1%

By changing the duration parameter to 0.3, we tell SoX to ignore the burst of noise at the beginning of the example clip. This produces the following:

sox in.wav out2.wav silence 1 0.3 1%

We can ignore short pops and clicks in audio by adjusting this duration parameter.

Example 3: Stopping recording when no sound detected

sox in.wav out3.wav silence 1 0.3 1% 1 0.3 1%

Now we introduce the below-period parameter it’s respective sub-parameters.  Just like the above-period parameter, just set it to 1 and call it good.  The command above translates to: trim silence until we detect at least 0.3 seconds of noise, and then trim everything after we detect at least 0.3 seconds of silence.

sox in.wav out3.wav silence 1 0.3 1% 1 0.3 1%

This returns a file with just the first 4 seconds of noise (note that we ignore that 0.25 sec burst of noise at the beginning). Where’s the rest of the clip?  Well, it’s gone. Not super practical for post-production of audio, but can be useful when recording live audio, so that SoX stops when it doesn’t encounter sound for a certain number of seconds.

So an aside: if you’re looking to trim silence from the beginning and the end of a audio file, you’ll need to utilize the reverse filter and a temp file like so:

sox in.wav temp.wav silence 1 0.1 1% reverse
sox temp.wav out.wav silence 1 0.1 1% reverse

Don’t forget to delete that temp.wav file when you’re done.

Jakob points out in the comments that you can trim silence from both ends in one fell swoop by chaining the effects like so:

sox in.wav out.wav silence 1 0.1 1% reverse silence 1 0.1 1% reverse

Example 4: Trimming all silence

sox in.wav out4.wav silence 1 0.1 1% -1 0.1 1%

By changing the below-period parameter to -1, we can trim instances of silence in the middle of the clip, by allowing the filter to restart after it detects noise of the specified duration.

sox in.wav out4.wav silence 1 0.1 1% -1 0.1 1%

In my example clip, it’s impossible to detect where the silence used to be, but with an actual podcast or other audio, it should be easier to tell.

Example 5: Ignoring short periods of silence

sox in.wav out5.wav silence 1 0.1 1% -1 0.5 1%

In similar fashion as Example 2, we can instruct SoX to ignore small moments of silence (1/2 second in this example).

sox in.wav out5.wav silence 1 0.1 1% -1 0.5 1%

When trimming silence from podcasts and the like, this prevents you from removing moments when someone stops to take a breath and making the conversation sound too rushed.

Example 6: Shortening long periods of silence

sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

So what if you wanted to just shorten long moments of silence rather than remove them entirely?  Well, you need to add the -l parameter, but it needs to be placed first, before the other parameters for the filter effect. The example above results in trimming all silence longer than 2 seconds down to only 2 seconds long.

sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

Note that SoX does nothing to bits of silence shorter than 2 seconds.

Example 7: Shortening long periods of silence and ignoring noise bursts

sox in.wav out7.wav silence -l 1 0.3 1% -1 2.0 1%

Finally, let’s tie it all together by trimming silence longer than 2 seconds down to 2 seconds long, but ignore noise such as pops and clicks amidst the moments of silence.

sox in.wav out7.wav silence -l 1 0.3 1% -1 2.0 1%

As a result you’ll see that we’ve cropped out the 0.25 seconds of noise at the beginning of the clip, but left the 0.5 seconds of noise in the middle.

For actual usage, you’ll probably want to specify something shorter than 0.3 seconds for the duration if you’re just trying to filter out pops and clicks.

Bonus Example 8: Splitting audio based on silence

sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart

Using SoX’s newfile pseudo-effect allows us to split an audio file based on periods of silence, and then calling restart starts the effects chain over from the beginning. In this example, SoX will split audio when it detects 5 or more seconds of silence. You’ll end up with output files named out001.wav, out002.wav, and so on.

Final Thoughts

There you have it.  This is what I know about the silence filter effect in SoX.  Example 7–where we trim some but not all of the silence and ignore pops and clicks–is ultimately what I was trying to figure out when writing this article, but I figure the other examples have got to be a good reference for somebody me.

The above and below-period values are still mostly a mystery to me.  I may address them in another post, but for now, I’m just going to use this as a cheat sheet in case I forget.

And don’t forget to use the trailing zero when specifying whole seconds. Even while writing this I forgot multiple times.

I welcome thoughts, ideas, comments, and corrections. Please.

(edit 11/14/10 to add names to each of the examples for clarification)
(edit 04/28/11 to add audio splitting example)
(edit 12/06/12 to add one line silence trimming) 

    • sox in.wav out.wav silence 1 0.8 1% 1 1.0 1% : newfile : restart

136 replies on “The SoX of Silence”

sox in.wav temp.wav silence 1 0.1 1% reverse
sox temp.wav out.wav silence 1 0.1 1% reverse

THANKS : works great for trimming, happy to find I was not the only one looking for an answer !

question : how would you batch it

Making a batch file would simply be replacing the file parameter with %1 like the following:

sox %1 temp.wav silence 1 0.1 1%% reverse
sox temp.wav trimmed-%1 silence 1 0.1 1%% reverse

Which would turn your file.wav into trimmed-file.wav.
(Note that in order to escape the percent sign for 1%, you’ll need to use two of them)

Thanks for helping to relax my forehead after 3 hours with the cryptic manpage. Talk about too close to the software. Any chance you could help me split a file based on silence?

It basically works, but I need to parts to be the same length as the whole. The silence filter is stripping out chunks bigger than the threshold regardless of their length (as advertised).

sox.exe "file.wav" "file_out.wav" silence 0 0 0.8 5% : pad 0.8 newfile : restart

As you can see, I tried using pad, but as I’m sure you know, will replace any sized chunk with 0.8 seconds. The goal is to take a spoken audio file and break it up roughly by sentences for captioning.

Thanks again for the great article.

The following split my test file into seven output files, where the last file was a 4kb stub without any audio, and the first six clips being all the 3+ seconds of noise:

sox in.wav out.wav silence 1 0.8 1% 1 1.0 1% : newfile : restart

The 0.25 and 0.5 second bursts were ignored. I tried using pad, but then the restart chain didn’t seem to process.

Hope that helps.

The guidance is great but still no luck. The silence gaps are being trimmed out of the files. To be correct the split files should be able to be merged back together to reproduce the original file.

I’ve found utilities like mp3splt that do this, but I’d have to compress to mp3, split the files, then convert to wav again. There’s a major loss in fidelity and processing overhead.

I’d appreciate any insights if inspiration strikes. Thanks again.

Ah, I misunderstood what you were trying to do. But sadly, it doesn’t look like SoX will allow you to keep the silence intact–at least not that I was able to figure out.

One suggestion might be to encode the wav to a high bitrate mp3 or lossless ogg, and then crank it through mp3split, or to convert to a low res mp3 (because it’s faster), use mp3split to detect and output the silence points, and then back to SoX and trim what you need from the original wav.

Granted, that’s some funky overhead, but if you can get the process down, at least it would be automated.

Good luck.

Thanks a million. I actually tried Mp3splt yesterday and had to walk away because of the processing overhead and loss of fidelity going to MP3 and back. I didn’t think to use it for scouting the break points then let SoX do the chopping. That’s really smart.

I actually have another step in my processing chain to detect the length of the chopped wav files that I’d be able to pitch, so I’d break even with the extra step anyways. Nice. Big thanks.

HI i wanted to know how to set recording device to mic (rear panel) i have realtek soound card.

I tried -t waveaudio “Mic”
but it did not detect the device and returned error. I could try “High Definition Audio” but all recording devices including mic and stereo output have this description. So how would you do it.

If anyone who is good at using rec (or sox) please can i have your email or msn?

Thanks very much for this, jason. I took your batch file and made the following bash script:

#!/bin/bash

for f in *.wav
do
sox "$f" "temp.wav" silence 1 0.1 1% reverse
sox "temp.wav" "$f" silence 1 0.1 1% reverse
done

which simply trims the silence off both ends of all .wav files in the “$PWD”.

Thanks a lot for excellent explanation of that sox’s effect. Very useful.

I did think about use the silence effect to split one huge file into set of files. The cut point would be a period of silence longer then some parameter (i.e. 5 s).

Do you think it is possible using sox??

Poul

sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart
is the way to split audio using SoX. I’ve added it as an example in the article.

That’s exactly the information I’ve been looking for. I’m a new SOX user and I have piles of sound files I’m converting to another format. Some of them have clicks at the beginning and some have long tails of dead air. This fixes both, so I can process the whole library in one batch.

You don’t need the temp file. SOX allows multiple effects to be listed and will execute them serially. Thus,

sox in.wav out.wav silence 1 0.1 1% reverse silence 1 0.1 1% reverse

is legal and works just like your example. I added ‘norm’ before the first ‘silence’. It’s also a good idea to add ‘–no-clobber’ before the input file name, especially if you’re running this from a batch process.

Thanks for posting these notes – sox is a great tool.
A question if I may… Is there a simple way to trim all silence except for say 100ms from the start of a file?

My suggestion would be just to pad the output file with some silence after you’re done trimming it out. It adds an extra step, but if you can batch it out it shouldn’t matter much…
sox infile.wav outfile.wav pad 0.1

Extremely useful, but….

I want split my file.
It start with a sound, not a silence, and I don’t want delete the initial sound.

This command delete the first sound:
sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart

suggesions

This command should keep the original sound if it’s more than 0.5 seconds long. You might try it with another file or a test file to make sure you’ve got all the parameters correct.

I know this may sound pointless, but theres a way to actually keep the silence files? By this, I think I would be able to create a “noise” profile and then clean the audio.

What I’m actually doing for getting a possible noise profile is extract 0.4 seconds of the beginning of an analog recording:
sox f1.wav f2.wav trim 0 0.40
sox f2.wav -n noiseprof noise.prof
sox f1.wav f3.wav noisered noise.prof 0.3

Finally I split by taking silence into account:
sox f3.wav f4.wav silence 1 0.2 5% 1 0.1 5% : newfile : restart

I just would like to first get the silence files instead of trimming.

Don’t think you can extract the silence using sox, at least I’m not aware of how to do it. I think you’d need some sort of specialized audio analysis utility to do that. Good luck though!

Hi guys,

Is it possible with SOX detect all silence periods <0.2sec and save to file like
mm:ss:1/100
01:51:57 <– detect silence <0.2sec
02:39:57 <– detect silence <0.2sec

Cheers,

Edin

I think he’s trying to just create files that have the timestamps of the silence instances. Is there a way to save files with more information than outX.wav?

I don’t believe you can set timestamps natively with sox, however you could use a shell script or batch file to generate the timestamp for the filename when sox starts, or use a post-process to extract the time information from the file and rename it.

Hi Jason,

Thanks very much for the useful guide! I have a question that maybe you could help with. I want to do the inverse of your “example 6:”
sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

I would love to be able to lengthen the silence say from 1 second to anywhere from 7 to 30 seconds. A fixed length is OK but a random length within a range (e.g. 7-30sec) would be most ideal.

Thanks again!

Hi Jason,

Thanks very much for the useful guide! I have a question that maybe you could help with. I want to do the inverse of your “example 6:”
sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

I would love to be able to lengthen the silence say from 1 second to anywhere from 7 to 30 seconds. A fixed length is OK but a random length within a range (e.g. 7-30sec) would be most ideal.

I have tried altering the command in your example to increase the time, but it did not work.

Thanks again!

A suggestion if you’re wanting to lengthen the silence might be to split the file based on silence (ie, Example 8), generate some silence using sox (sox -n -r 44100 -c 2 silence.wav trim 0.0 7.0) and then reassemble the split files, injecting the silence.

Out of curiosity, what are you doing that makes you need more silence instead of less?

O nice! I just saw this reply. I will give it a shot. I don’t know how it will work because I am working with long audio files 30min to 1hr long at least.

So here’s the deal, what I am trying to do modify some chants for meditation. You know how the they say that music is just punctuated silence, well I am trying to have more of that and that’s why I want the duration unpredictable. I want to start with natural pauses in the chanting, and then lengthen them to a few seconds.

Anyway, I hope that answered your question. Thanks for your help and I’ll try your suggestion.

cheers …

Hi, thanks for your tutorial!

I’m successfully trimming silence at the beginning/end of an audio file, the only problem is that the resulting audio file is half the bitrate (from 128Kbps to 64Kbps).
Shouldn’t sox preserve the bitrate by default when applying filters?
Thanks

In order to be clearer, this is what ffprobe tells about the original file:

Input #0, mp3, from ‘fsn.mp3’:
Duration: 00:02:59.46, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, mono, s16, 128 kb/s

Then I launch:
# sox fsn.mp3 fsn-trimmed.mp3 silence 1 0.1 1%

ffprobe tells about the trimmed file:

Input #0, mp3, from ‘fsn-trimmed.mp3’:
Duration: 00:02:59.12, start: 0.000000, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, mono, s16, 64 kb/s

A tiny period of silcnce has been removed, but the trimmed file is half the bitrate of the original file.

I tried with other files, sometimes sox preserves the original bitrate, other times it’s reduced.
Any idea why?
Thanks

I generally operate SoX with raw wav files rather than MP3s or other lossy formats. I know that SoX can interact with the LAME libraries now, but I don’t know how well it handles all the parameters.

You may be better off re-compressing the file using LAME after you’ve trimmed, and either de-compressing first before SoX or just outputting an un-compressed wav. I don’t know what’s going on in your specific situation, but SoX may be evaluating an ABR/VBR encoded mp3 a little funky, or may just be defaulting to some parameters when it hands it back off to LAME.

Thanks Jason.

I’m experimenting a bit with mp3splt, it seems to be quite easy to use and works well with mp3 files.
The following command trim leading and trailing silence (you may specify a threshold and other parameters):

mp3splt -r in.mp3

I use sox to record my voice, trim silence at the beginning and stop recording when silence last at least 0.5 second. It’s simple, but what can I do to left 0.5 second at the beginning, not trim all. For example if I have 2 seconds of silence at the beginning and I want to trim 1.5 second and left 0.5. Do something with reverse option or there is simple solution?
Please, help me.

I think about my above problem and only solution that I figured out is reverse file, left 0.5 second of silence at the end, trim this silence, reverse again and past at the beginning normal file. Any suggestion how I can do the same in different way? I would be very grateful.

This was extremely helpful in working up a SoX command line for shortening lecture recordings by removing silence, changing tempo (to 1.20) and pitch shifting down slightly (to make the tinny quality of the upper range frequencies in a lecture recording a bit less harsh. Eventually I’ll work out EQ/compander settings for that bit instead.)

For those still confused about adjusting the “remove silence from middle” settings, but familiar with audio production effect settings, I found it helpful to think of the above-period parameters as the “release” or “defeat” settings and the below-period as the “attack.” Which is to say that the second parameter tells SoX how quiet the audio must be, for how long, to start removing silence. And the first parameter tells SoX how loud the audio must become again, and for how long, to stop removing silence once it has triggered.

In order to avoid drop outs and stuttering when trying to remove silence from a lecture recording (with significant background noise) I found it helpful for the “attack” settings to kick in at a louder level than the “release,” i.e. to make it such that we stop removing silence at a lower volume (while the audio level is rising) than we start removing silence (while the audio level is falling). I also found it helpful to use a very short duration for the release–it turned out that drop outs and cutouts etc were often caused by clipping the first syllable of a word after a removed silence, because the speakers voice isn’t immediately loud enough to defeat the silence removal, not because the filter was triggering too aggressively at the end of a phrase. Using 0.01 was much better than 0.1 for above-period duration.

This has been working well for me:

sox $mp3file-temp.wav ./short/$mp3file-temp-short.wav silence 1 0.05 -46d -1 0.7 -40d tempo -s 1.20 pitch -107

I also obtained much better results by decompressing the source MP3 to a temporary WAV file, performing the processing on the WAV, and then re-compressing to MP3 (despite the impact of reapplying a lossy format).

Thanks for the page, helped me a lot!

One addition: You don’t need a temp file for removing leading and trailing silence, just combine the effects:
sox in.wav out.wav silence 1 0.1 1% reverse silence 1 0.1 1% reverse

What I needed was to just remove the trailing silence:
sox in.wav out.wav reverse silence 1 0.1 0.1% reverse

1% cut off too early, so I used 0.1% instead, which gave me good results.

Hi Jason, thank you really much for this great examples and explanation! It helps me a lot.

By the way, I’m having a strange problem trying to remove some noise from the begining of a file. This file stats with 0.05 complete silence at the begin, then a burst noise with a 0.03 duration, then another silence of 0.02 duration and finally the real noise which I like to keep.

When running sox with this line
sox sample.wav output.wav silence 1 0.15 1%

It only removes the first silence of 0.05 duration, but not the burst noise of 0.03 and the second silence of 0.02 duration, and I don’t know why. Maybe you could help me? I upload sample file here http://www.sendspace.com/file/utwa12

Thanks in advance!

Jose,

Looking at your file (http://i.imgur.com/tjFfA.png) there may not be a good way to do what you’re trying to accomplish.

SoX *should* be able to filter out that second click, but you’d need to use the below period settings as well. I fiddled with it, and it may be that since the resolution of the noise is so small (under 0.5 seconds) that it isn’t able to reliably detect noise vs silence.

I was able to manually increase the silence and it did start to work, but I don’t know where that cutoff is.

You may be able to better handle these with a pop/click filter in Audacity or another similar program.

Jason, thank you very much for taking time to see my example! I relly appreciate that.
I think I didn’t understand your answer very well because you mention a second click. What I’m trying to accomplish is to remove the first click (between 0.05 and 0.08 in the picture), and not the second one that is in the middle of the noise (between 0.10 and 0.12). I already tried with the below parameter but didn’t help either.

But maybe you are talking about the first click and it was only a mispell 😉
Kind regards,

Jose, I still don’t have a good answer for you on that. It may have to do with the resolution of the silence, given that it’s in the hundredths of seconds. The SoX Mailing List might have some better insight in that case.

I was able to get it to trim, but I had to stretch the audio, which again leads me to suspect some sort of resolution/timing issue. I tried to adjust the ratios once i got it working with the stretched file, but that didn’t work.

Take look at how the trimming worked for me here: http://i.imgur.com/d3lRD.png

Jason you rock! Once again thanks for taking time to make that test! And sorry for the long delay, I didn’t saw your response until now.

I like your stretch solution, I think I could use something like that. But I will also ask in the sox mailing list as you suggest.

By the way, do you made that stretching also with sox?

Best regards

Great article, thanks. I am trying to do something similar using the vox feature in sox. But, not sure as to how to do it. It is on a linux platform and have audio fed into my sound card, I want the file to be saved to disk only when vox conditions are met and then arm waiting for next audio activity. (eg, like a scanner, but my case is intercom audio). :newfile: restart chain works fine for this, but the problem is the filename gets indexed like xxx.wav. I am looking for sox to be able to use the timestamp in the filename or even return the filename it wrote, so that I can modify the filename and catalog. Basically, I want to be able to catalog these file into a DB like mysql by timestamp, so one can go an retrieve audio between time/date… is any of this doable ? I am good with linux/perl/shell programming so my issue is not there, it is what sox can return to the shell upon completion. Any help will be appreciated by experts out here.

Even though this was published about 3 and a half years ago it is still timely and helpful today. I just wanted to add that I almost gave up on this because sox was not taking out the silence (using the example above to remove all silence from a file). In fact the output was about the same as the input. HOWEVER by experimenting with the parameters for about an hour (especially increasing thresholds over 3%)I was able to get it work very well. I am reducing hours of fire dept radio traffic to about 2 % of real time! A great way to keep up with what is going on without being shackled to a scanner radio.

Glad to hear this article has been helpful for you! It sounds like your ‘silence’ has some noise in it, such you need to set your silence threshold to 3%. I’ve seen similar issues with onboard audio. It got a whole lot better for me using USB audio or a PCI sound card.

Your instructions “broke the silence”, as it were….

I’m using it in a loop to do speech-to-text translation using this two-command shell script that records audio till the user stops speaking, and sends that data to google’s translation engine to turn it into speech. The recording works fine, but the translation returns garbage. The quality of the audio seems perfectly fine to my ears, but apparently google doesn’t think so. Look at the “utterance” value from the JSON object returned:

rec new.flac silence 1 0.3 1% 1 1.0 1%

wget –post-file=’new.flac’ \
–no-check-certificate \
–user-agent=’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7′ \
–header=’Content-Type: audio/x-flac; rate=44100;’ \
-O ‘recognized.json’ \
‘https://www.google.com/speech-api/v1/recognize?client=chromium&lang=ar-QA&maxresults=10’

Never mind — I copied and pasted the http line to get to google’s api, and hadn’t noticed there was a lang parameter. I changed it to en-US and it works perfectly now.

This is a fantastic reference, thank you!

Unfortunately the main interest I have is in splitting a .wav file into pieces based on silence, and I can’t seem to make it work. I’m using this command:

sox testfile.wav outputter.wav silence 1 0.3 1% 1 2.0 1% : newfile : restart

The “testfile.wav” is a voice recording where I put 4 seconds of silence right in the middle of it to test. I expect to get “outputter001.wav” and “outputter002.wav as the result, but instead of I’m getting “outputter.wav” only. This new file is the first part of “testfile.wav” – right up until the silence just like it should be, but there is no second file with the other half of the original recording.. any idea what might be wrong?

Thank you for any help you can give me, SoX is very new to me but this would be amazing if I could make it work 🙂

Nevermind – turns out my version of SoX was ancient. After updating to the new version, this works as expected. Thanks again!

I’m trying to do something similar to “Example 6: Shortening long periods of silence”: I’d like to trim the leading silence in an audio file, but leave 0.5 seconds of it.

In your Example 6, you mention “Note that SoX does nothing to bits of silence shorter than 2 seconds.”, but it looks like it does trim the beginning silence completely, all 3 seconds of it.

Is there any way to alter the command to leave a certain amount of silence at the beginning of the file?

Thanks for the excellent tutorial!

I can’t say for certain, but you may be able to add ‘pad 0.5 0’ at the beginning or the end of the filter chain to put back the silence that was trimmed out. If you can’t do it inline, then you can just run it through sox again as part of a batch file. Let me know if that works one way or the other.

This is such a good introduction and guide to this extremely useful program!

Thanks for taking the time to break it down for us!!

Excellent guide! Thank you very much for providing it! I’m trying to use sox silence filter to monitor line in audio for voice recognition purposes, but I’m yet to find a solution for a specific problem. Perhaps someone could shed some light on it. I need files to be split after 0.5 secs of silence is detected. For that Example 8 would do it:

silence 1 0.5 1% 1 0.5 1% : newfile : restart

But I also need files to be of 5 secs in maximum total length, so recognition app doesn’t take too long to answer. This one however I don’t think silence filter alone could do it, so I tried trim filter, both before and after silence filter, like this:

trim 0 5 silence 1 0.5 1% 1 0.5 1% : newfile : restart
silence 1 0.5 1% 1 0.5 1% trim 0 5 : newfile : restart

In the first case it returned 5 seconds of empty audio files when there was no audio. However if sound was detected in the middle of this 5 seconds run, the trim filter wouldn’t give me 5 seconds of audio, but only the remaining time left.

In the second case it wouldn’t return empty audio files, but once audio was detected it wouldn’t stop filling the file with silence until it had 5 seconds in total, even though there was no audio anymore after 1 second of noise duration, for example. In this case I wanted sox to split this 1 second of audio file, after 0.5 secs of silence then restart the effects chain over.

I’m not sure how you’d do it with the effects chain, but perhaps you can reprocess the file with sox again after the fact. Not ideal, I know, but there’s a possibility there.

Yeah, I thought of that, but the only way of doing it was if all files had 5s. The problem is that for the recognition app to work continuously sox has to stop recording after 0.5s of silence detection, and restart effect chain, otherwise parts of audio would be lost. Perhaps this isn’t possible with sox. But thanks anyway.

Correction: just managed to make it work:

silence 1 0.5 1% trim 0 5 silence 1 0.2 2% 1 0.5 2%

Thanks once again Jason!

Can someone tell me why this is not working. It is not honouring the -1 and therefor not removing middle silence. When I try command line on an individual file it seems fine but when trying to do batch I have no luck.. It’s honouring the 1 and any other commands I’ve toyed with, it’s the the “-” from the “-1” is disappearing???

cd C:\Program Files (x86)\sox-14-4-1
for /f “tokens=1-2 delims=.” %%i in (‘dir /b z:\library\reviews\processing\*.wav’) do (sox.exe “z:\library\reviews\processing\%%i.wav” “z:\library\reviews\%%i.wav” silence 1 0.1 0.2% -1 0.5 0.2%)

Hello Jason,

Very informative here, Thank you. I just wanted to ask if anyone knows of a way to find tracks in a library with no silence at the end of tracks and no fade out. My mate wants me to try to sort out his music library. He has many many vinyl rips that don’t have fade out or silence at the end of the track and he doesn’t like it when they just suddenly stop, he would rather they fade out. I would like to find only the tracks that do not fade out at all and just abrubtly stops. Using your advice here I have already adopted my script to fade out tracks, but I don’t want to do it needlessly to all tracks in his very large library. Any advice?? Thanks for any help from anyone, It’s greatly appreciated. I would like to use SOX for this if possible.

Singtoh

Hello everybody,
does somebody know, how to simply shorten silence at the beginning? My idea is that I will add some noise part at the beginning and then shorten it but it is not an estetic solution. Thanks

Hi,

great introduction to the subject of sox audio tool.
I’m specially interested by the last added section (8) about splitting base on silence.

After reading and reading your article, I’m still in troubles to success my split.

I have a 6 seconds audio file with 6 parts of speech (which consist in onomatopoeia).

I want to split the audio into 6 distincts audio each containing one sound.

my settings are :

sox file_in file_out silence 1 0.01 17% 6 0.27 17%

I don’t want to cut eventual silence at the beginning, my threshold is around 17% (there is some noise).

I’m facing 2 problems :

first one, this command line give me often 7 outputs, the last with only silence
second, for some samples, I get only 5 parts (but I guess the reason is a duration to long)

any suggestion ?

thanks for your reply

Hi, i tried combining chopping my audio file using silence filter but I end up with the last file containing unchopped silence at the end I tried multiple combinations of but I’m not able to get the result.
I end up running it twice:
sox 1.mp3 1_.wav silence 1 0.1 1% 1 0.5 1% : newfile : restart
sox 1_005.wav 1_005a.wav reverse silence 1 0.1 1% reverse

Any suggestions?

This all sounds great. UNFORTUNATELY (using my version of sox which gives, in response to sox –version the string
sox: SoX v
but I guess it’s 14.4.1 because that’s the version in MacPorts, where I got it from) sox simply does not work the way that’s described here.
You won’t see the difference if you test against white noise, maybe even against music, but it is very obvious when you test against speech.

Let’s take the command line of example 7 and run it against some clean speech, for example an audiobook from a CD, or a clean speech podcast without background music.

What SHOULD happen, according to the description?
“trimming silence longer than 2 seconds down to 2 seconds long, but ignore noise such as pops and clicks amidst the moments of silence”.
This means that if there are no long delays in the audio (and two seconds is a LONG delay for speech) there should be no change in the audio.

What ACTUALLY happens? You will notice that the resultant file is notably shorter in duration, and the speech sounds like it is too fast, with the starts of words missing. The 2 second limit that is supposed to be limiting the fragments of audio considered for dropping simply does not work the way it is meant to. You can try to change this by, eg, changing the 1% amplitude to .1%, but you won’t get much improvement.

I have spent a LOT of time experimenting with this effect on different files, trying to get it to work, and have never succeeded. My conclusion is that it is simply buggy — the code does not conform to what the man page says, and is happy to throw away fragments of silence that are extremely short, regardless of the -l parameter. Basically it seems to behave like a combination of example4 (for extremely short silences, like the silence between words) and example7 (once silences get long enough to exceed the 2 second threshold).

Leave a Reply

Your email address will not be published. Required fields are marked *