Welcome, guest | Sign In | My Account | Store | Cart

This script is used to processes a batch job of commands in parallel. The script dispatches new commands up to a user specified limit, each generated from a template provided on the command line using arguments taken from STDIN. The basic combining semantics are similar to "xargs -1", though support for multiple arguments and parallel processing of commands are provided.

Bash, 210 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
#! /bin/bash
#  Creator: Kevin L. Sitze
#  Created: March 24, 2010
#  Summary: Perform a command in parallel against multiple arguments.

function syntax()
{
    cat <<EOF 2>&1
usage: $(basename "$0") [-j jobs] command cmd_args...

Run a command with varying arguments in parallel.  Arguments for
each dispatched command are expected to come from stdin.  Without
any other modifiers exactly one argument per dispatched command is
taken from stdin and appended to the end of the argument list
indicated on the initial command line.

For example: issuing the command

    $(basename "$0") mv --target-directory=target --verbose < filelist

where filelist is a list of files one per line would cause all the
files to be transferred using individual mv commands as if the user
had typed the shell command

    while read filename ; do
	mv --target-directory=target --verbose "\$filename" &
    done < filelist

The operational difference between the above while statement and a
command issued via this program is the job control aspect of ensuring
that only a limited number of parallel tasks are actually active at
any particular moment.

Multiple command arguments may be provided per process using the
argument positional modifiers "\$1", "\$2", ...	 A positional modifier
indicates the relative offset of a line from stdin to insert into the
dispatched command.  When positional modifiers are used, the largest
modifier value indicates the number of lines to read from stdin for
each dispatched command (though the -n option can change this value).
Only explicitly indicated modifiers will actually be substituted.

The above mv command could thus be rewritten as follows:

    $(basename "$0") mv --verbose '\$1' target < filelist

The mv command below will take the first and third lines out of every
four lines on STDIN and move the file named in the first argument to
the fourth argument.  The second and fourth lines are discarded.

    $(basename "$0") -n 4 mv --verbose '\$1' '\$3' < filelist

Be sure to escape the dollar sign ('\$') from the shell.

-h	    This help text.
-j JOBS	    Total number of parallel jobs that may be issued at once.
-n ARGC	    Number of lines to read from STDIN for each command.
-v	    Verbose mode: prints a description of each job issued.
-o TARGET   Send output of subprocesses to the indicated file.
	    Output is appended to TARGET only upon completion
	    of the subprocess.

EOF
    exit 1
}

argc=1
jobs=2
verbose=false
while getopts hj:n:o:v ARG
do
    case $ARG in
    h)	syntax
	;;
    j)	jobs=$OPTARG
	;;
    n)	argc=$OPTARG
	;;
    o)	outfile=$OPTARG
	;;
    v)	verbose=true
	;;
    \?) exit 2
	;;
    esac
done
shift `expr $OPTIND - 1`

if [ $# -lt 1 ]
then
    echo "Error: a command to execute is required"
    exit 1
fi

declare -a command		# contains the command template
declare -a source		# index of line to transfer
declare -a target		# index of command argument
for param in "$@"
do
    if [[ "$param" =~ \$[[:digit:]]+$ ]]
    then
	(( argc < ${param#$} )) && argc=${param#$}
	source[${#source[*]}]=$(( ${param#$} - 1 ))
	target[${#target[*]}]=${#command[*]}
    fi
    command[${#command[*]}]="$param"
done

if [[ ${#source[*]} -eq 0 ]]
then
    source[0]=0
    target[0]=${#command[*]}
fi


declare -a argv

####
#  Read $argc lines from stdin
function readlines()
{
    local i
    for (( i = 0; i < argc; ++i ))
    do
	read argv[$i] || return 1
    done
    return 0
}

####
#  Generate the next command
function generator()
{
    local i
    local n=${#source[*]}
    for (( i = 0; i < n; ++i ))
    do
	command[${target[$i]}]="${argv[${source[$i]}]}"
    done
}

function append_output()
{
    if [[ x"${outfile}" != x"" ]]
    then
	if [[ -s "$2" ]]
	then
	    $verbose && echo "$1"
	    cat "$2"
	fi >> "${outfile}"
	rm -f "$2"
    fi
}

running=0
function reaper()
{
    if read -u 3 cpid
    then
        wait $cpid	# harvest the child
        (( --running ))	# and schedule next
        append_output "command line" "${fifoPath}"/cmd_"${cpid}"
        append_output "standard err" "${fifoPath}"/cmd_"${cpid}"_err
        append_output "standard out" "${fifoPath}"/cmd_"${cpid}"_out
    fi
}

####
#  Use a FIFO to determine when to harvest a subprocess.  Each
#  subprocess is evaluated such that a child PID is printed to
#  a FIFO in order to signal subprocess completion.  A
#  replacement subprocess is then dispatched.

fifoPath=${TMPDIR-/tmp}/$(basename "$0" .sh)_${LOGNAME}
fifoName="${fifoPath}"/job_control_$$
[ -d "${fifoPath}" ] || mkdir --mode=0700 "${fifoPath}"
mkfifo --mode=0700 "${fifoName}"

# open FIFO for read/write
exec 3<>"${fifoName}"
rm -f "${fifoName}"

while readlines
do  # perform job control only if a new job is pending
    while (( running >= jobs ))
    do
	reaper
    done
    generator
    $verbose && echo "${command[@]}"
    (
        if [ x"$outfile" == x"" ]
        then
	    "${command[@]}"
	else
	    echo "${command[@]}" > "${fifoPath}"/cmd_"${BASHPID}"
	    stdout="${fifoPath}"/cmd_"${BASHPID}"_out
	    "${command[@]}" 1>"${stdout}" 2>&1
	fi ; result=$?
	echo "${BASHPID}" 1>&3
	exit $result
    ) &

    (( ++running ))
done

while (( running > 0 ))
do
    reaper
done
exit 0

The latest machines all have multi-CPU cores, are you effectively using them? Most programs that exist today are written with only a single thread of execution, so you're only using less than an eighth of the processing power available to you in that quad-hyperthreaded i7 you bought last month.

Are you an amateur photographer or a web designer and need to process a bunch of images in similar ways?

Are you a stock analyst who runs a preprocessing script on the thousands of securities to determine which ones you're going to focus on for the upcoming week?

Are you a developer and have any kind of batch processes you want to run in parallel?

Then this script is for you!

This is the complete script I discuss in my blog.

2 comments

Markdown link to my blog (next time I'll RTFM...)

Ole Tange 13 years, 10 months ago  # | flag

parallel.sh has several problems. Output runs together:

(echo foss.org.my; echo debian.org; echo freenetproject.org) | parallel.sh traceroute

Composed commands are not supported:

seq 1 5 | parallel.sh sleep {} ';' echo {}

GNU Parallel http://www.gnu.org/software/parallel/ does not suffer from this.

Created by Kevin L. Sitze on Wed, 31 Mar 2010 (MIT)
Bash recipes (41)
Kevin L. Sitze's recipes (8)

Required Modules

  • (none specified)

Other Information and Tasks