This script is used to processes a batch job of commands in parallel. The script dispatches new commands up to a user specified limit, each generated from a template provided on the command line using arguments taken from STDIN. The basic combining semantics are similar to "xargs -1", though support for multiple arguments and parallel processing of commands are provided.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | #! /bin/bash
# Creator: Kevin L. Sitze
# Created: March 24, 2010
# Summary: Perform a command in parallel against multiple arguments.
function syntax()
{
cat <<EOF 2>&1
usage: $(basename "$0") [-j jobs] command cmd_args...
Run a command with varying arguments in parallel. Arguments for
each dispatched command are expected to come from stdin. Without
any other modifiers exactly one argument per dispatched command is
taken from stdin and appended to the end of the argument list
indicated on the initial command line.
For example: issuing the command
$(basename "$0") mv --target-directory=target --verbose < filelist
where filelist is a list of files one per line would cause all the
files to be transferred using individual mv commands as if the user
had typed the shell command
while read filename ; do
mv --target-directory=target --verbose "\$filename" &
done < filelist
The operational difference between the above while statement and a
command issued via this program is the job control aspect of ensuring
that only a limited number of parallel tasks are actually active at
any particular moment.
Multiple command arguments may be provided per process using the
argument positional modifiers "\$1", "\$2", ... A positional modifier
indicates the relative offset of a line from stdin to insert into the
dispatched command. When positional modifiers are used, the largest
modifier value indicates the number of lines to read from stdin for
each dispatched command (though the -n option can change this value).
Only explicitly indicated modifiers will actually be substituted.
The above mv command could thus be rewritten as follows:
$(basename "$0") mv --verbose '\$1' target < filelist
The mv command below will take the first and third lines out of every
four lines on STDIN and move the file named in the first argument to
the fourth argument. The second and fourth lines are discarded.
$(basename "$0") -n 4 mv --verbose '\$1' '\$3' < filelist
Be sure to escape the dollar sign ('\$') from the shell.
-h This help text.
-j JOBS Total number of parallel jobs that may be issued at once.
-n ARGC Number of lines to read from STDIN for each command.
-v Verbose mode: prints a description of each job issued.
-o TARGET Send output of subprocesses to the indicated file.
Output is appended to TARGET only upon completion
of the subprocess.
EOF
exit 1
}
argc=1
jobs=2
verbose=false
while getopts hj:n:o:v ARG
do
case $ARG in
h) syntax
;;
j) jobs=$OPTARG
;;
n) argc=$OPTARG
;;
o) outfile=$OPTARG
;;
v) verbose=true
;;
\?) exit 2
;;
esac
done
shift `expr $OPTIND - 1`
if [ $# -lt 1 ]
then
echo "Error: a command to execute is required"
exit 1
fi
declare -a command # contains the command template
declare -a source # index of line to transfer
declare -a target # index of command argument
for param in "$@"
do
if [[ "$param" =~ \$[[:digit:]]+$ ]]
then
(( argc < ${param#$} )) && argc=${param#$}
source[${#source[*]}]=$(( ${param#$} - 1 ))
target[${#target[*]}]=${#command[*]}
fi
command[${#command[*]}]="$param"
done
if [[ ${#source[*]} -eq 0 ]]
then
source[0]=0
target[0]=${#command[*]}
fi
declare -a argv
####
# Read $argc lines from stdin
function readlines()
{
local i
for (( i = 0; i < argc; ++i ))
do
read argv[$i] || return 1
done
return 0
}
####
# Generate the next command
function generator()
{
local i
local n=${#source[*]}
for (( i = 0; i < n; ++i ))
do
command[${target[$i]}]="${argv[${source[$i]}]}"
done
}
function append_output()
{
if [[ x"${outfile}" != x"" ]]
then
if [[ -s "$2" ]]
then
$verbose && echo "$1"
cat "$2"
fi >> "${outfile}"
rm -f "$2"
fi
}
running=0
function reaper()
{
if read -u 3 cpid
then
wait $cpid # harvest the child
(( --running )) # and schedule next
append_output "command line" "${fifoPath}"/cmd_"${cpid}"
append_output "standard err" "${fifoPath}"/cmd_"${cpid}"_err
append_output "standard out" "${fifoPath}"/cmd_"${cpid}"_out
fi
}
####
# Use a FIFO to determine when to harvest a subprocess. Each
# subprocess is evaluated such that a child PID is printed to
# a FIFO in order to signal subprocess completion. A
# replacement subprocess is then dispatched.
fifoPath=${TMPDIR-/tmp}/$(basename "$0" .sh)_${LOGNAME}
fifoName="${fifoPath}"/job_control_$$
[ -d "${fifoPath}" ] || mkdir --mode=0700 "${fifoPath}"
mkfifo --mode=0700 "${fifoName}"
# open FIFO for read/write
exec 3<>"${fifoName}"
rm -f "${fifoName}"
while readlines
do # perform job control only if a new job is pending
while (( running >= jobs ))
do
reaper
done
generator
$verbose && echo "${command[@]}"
(
if [ x"$outfile" == x"" ]
then
"${command[@]}"
else
echo "${command[@]}" > "${fifoPath}"/cmd_"${BASHPID}"
stdout="${fifoPath}"/cmd_"${BASHPID}"_out
"${command[@]}" 1>"${stdout}" 2>&1
fi ; result=$?
echo "${BASHPID}" 1>&3
exit $result
) &
(( ++running ))
done
while (( running > 0 ))
do
reaper
done
exit 0
|
The latest machines all have multi-CPU cores, are you effectively using them? Most programs that exist today are written with only a single thread of execution, so you're only using less than an eighth of the processing power available to you in that quad-hyperthreaded i7 you bought last month.
Are you an amateur photographer or a web designer and need to process a bunch of images in similar ways?
Are you a stock analyst who runs a preprocessing script on the thousands of securities to determine which ones you're going to focus on for the upcoming week?
Are you a developer and have any kind of batch processes you want to run in parallel?
Then this script is for you!
This is the complete script I discuss in my blog.
Markdown link to my blog (next time I'll RTFM...)
parallel.sh has several problems. Output runs together:
Composed commands are not supported:
GNU Parallel http://www.gnu.org/software/parallel/ does not suffer from this.