Running things in parallel in BASH

09 Mar 2009 00:21

Suppose you have a nice script that does its job pretty well, but you figured out, that running certain parts of scripts in parallel would speed things up.

This can be the option, when you send a bunch of files to an Internet service, that is generally fast, but the connection sequence is quite slow, so uploading 100 files one after one causes the script to wait 100 times to quickly upload a file.

Other situation could be when you have multi-core machine, for example you have eight processing units, but use only one in your script, and you have a bunch of files to compile or to process in some CPU-expensive manner.

We'll use only BASH to smartly parallelize the tasks and speed up the slow part of your script.

First of all you need to know how many jobs in parallel you want (if you have 8 cores and CPU-expensive part of script, having more than 8 jobs does not help, probably a number between 4 and 8 will do best in this case).

#!/bin/bash

PROC_NUM=4

Generally, we'll ensure, than no more than PROC_NUM processes are forked into background and run another task. If there are PROC_NUM processes running in the background we'll wait a (fraction of) second and check again.

#!/bin/bash

PROC_NUM=4

function run_task() {
    # task to run
    # can be more than one-line
    # can take parameters $1, $2, ...
}

function run_parallel() {
    while [ `jobs | grep Running | wc -l` -ge $PROC_NUM ]; do
        sleep 0.25
    done

    run_task "$@" &
}

run_task "$@" passes all the parameters passed to run_parallel to run_task. You can use "$@" in run_task to pass all the parameters to external command! The "$@" is the best choice when you have spaces, dollars and other special characters in parameters. It doesn't transform anything, it's completely safe (probably the only short way to pass all the parameters).

There are only two things left: invoking the run_parallel and synchronizing the tasks — you need to know when ALL the tasks ended, right?

#!/bin/bash

PROC_NUM=4

function run_task() {
    # task to run
    # can be more than one-line
    # can take parameters $1, $2, ...
}

function run_parallel() {
    while [ `jobs | grep Running | wc -l` -ge $PROC_NUM ]; do
        sleep 0.25
    done

    run_task "$@" &
}

function end_parallel() {
    while [ `jobs | grep Running | wc -l` -gt 0 ]; do
        sleep 0.25
    done
}

# script content

cd /some/where/you/want

# now the parallel operations
# for example in some while

find | while read file; do
    run_parallel "$file"
done

# now you want to continue when ALL parallel tasks ended

end_parallel

# the linear script code again

cd /some/where/else
make something

That's all! Though, there is a different approach to this:

#!/bin/bash

function parallel() {
    local PROC_NUM="$1"
    local SLEEP_TIME="$2"
    shift; shift
    while [ `jobs | grep Running | wc -l` -ge $PROC_NUM ]; do
        sleep $SLEEP_TIME
    done
    "$@" &
}

This function acts as a wrapper to a non-parallel command and runs it in the background assuring that no more than PROC_NUM processes run at once. If there are PROC_NUM processes running in the background, the wrapper waits SLEEP_TIME to re-check the number of background jobs.

Invoking:

parallel PROC_NUM SLEEP_TIME /usr/bin/some-command arguments ...

so

parallel 4 0.5 ls -R /tmp

means: run ls -R /tmp in the background if there is no more than 3 processes already run in the background. Otherwise wait 0.5 seconds and try again. Then run ls -R /tmp if there is no more than 3 processes already run in the background. Otherwise wait 0.5 seconds and try again. Then run ls -R /tmp if …

Quite nice, isn't it?

Comments: 5

Nice BASH Random Implementation

05 Mar 2009 20:20

Today I wrote something like this in BASH:

echo $(($(printf '%d' "'`head -c 1 /dev/urandom | base64 | tr A D`")%4))

This prints a random number from 0 to 3. How it works?

Firstly the command below prints one random byte using special Unix /dev/urandom device:

head -c 1 /dev/urandom

Piping this to | base64 gives us base64 representation of the byte.

Base64 is a method of encoding 8-bit data as human-readable strings using only 64 visible characters (letters, digits and some !@ and stuff). Whitespace is always ignored when reading base64 string. The standard is really nice, because you can for example read a base64-encoded file with a phone or send a printout of it with a classic mail. Because all the characters are human-readable, one can enter them and decode the original message. The format is used widely for sending emails.

It is important for us, that running base64 on a random byte gives us 2 bytes that are only "normal" characters, ie don't have any special meaning in any context (like some white characters may have).

Now comes the tricky part:

printf '%d' "'`head -c 1 /dev/urandom | base64`"

printf '%d' "'a" would give us 97 — this is the ASCII code of letter a. Remember we get two bytes of data after base64? No problem, printf cares in this case only about the first character.

As you may notice the output of this is one of these:

  • 43
  • a number between 47 and 57
  • a number between 65 and 90
  • a number between 97 and 122

This gives us 64 possibilities. Great! 6 bits is just the same as 64.

This is when we get to BASH arithmetics. This is how it works:

echo $((7*12))

This should print 84 obviously. Knowing that enclosing a string in $( and ) causes BASH to run the enclosed command and return its results (just like using the backticks: `command`) this is everything.

The % means modulo in BASH arithmetic (just like in C, Java, Python, PHP, …), so this:

echo $(($(printf '%d' "'`head -c 1 /dev/urandom | base64`")%4))

prints the number we generated (one of 43, 47…57, 65…90, 97…122) modulo 4. This needs to be 0, 1, 2 or 3.

Let's now check what the probability of receiving each of the digits.

Suppose /dev/urandom prints every possible byte with equal probability. We'll now analyze each byte (possibly) generated by /dev/urandom, it's base64 representation, ASCII code of first byte of the representation and the modulo 4 of it. We'll use a table for it:

/dev/urandom byte base64 code of the first letter of base64 the same modulo 4
(code 0) AA== 65 1
(code 1) AQ== 65 1
(code 2) Ag== 65 1
(code 3) Aw== 65 1
(code 4) BA== 66 2
(code 5) BQ== 66 2
(code 6) Bg== 66 2
(code 7) Bw== 66 2
(code 8) CA== 67 3
(code 9) CQ== 67 3
(code 10) Cg== 67 3
(code 11) Cw== 67 3
(code 12) DA== 68 0
(code 13) DQ== 68 0
(␌⎺␍␊ 14) Dg== 68 0
(code 15) Dw== 68 0
(code 16) EA== 69 1
(code 17) EQ== 69 1
(code 18) Eg== 69 1
(code 19) Ew== 69 1
(code 20) FA== 70 2
(code 21) FQ== 70 2
(code 22) Fg== 70 2
(code 23) Fw== 70 2
(code 24) GA== 71 3
(code 25) GQ== 71 3
(code 26) Gg== 71 3
(code 27) Gw== 71 3
(code 28) HA== 72 0
(code 29) HQ== 72 0
(code 30) Hg== 72 0
(code 31) Hw== 72 0
(code 32) IA== 73 1
! (code 33) IQ== 73 1
" (code 34) Ig== 73 1
# (code 35) Iw== 73 1
$ (code 36) JA== 74 2
% (code 37) JQ== 74 2
& (code 38) Jg== 74 2
' (code 39) Jw== 74 2
( (code 40) KA== 75 3
) (code 41) KQ== 75 3
* (code 42) Kg== 75 3
+ (code 43) Kw== 75 3
, (code 44) LA== 76 0
- (code 45) LQ== 76 0
. (code 46) Lg== 76 0
/ (code 47) Lw== 76 0
0 (code 48) MA== 77 1
1 (code 49) MQ== 77 1
2 (code 50) Mg== 77 1
3 (code 51) Mw== 77 1
4 (code 52) NA== 78 2
5 (code 53) NQ== 78 2
6 (code 54) Ng== 78 2
7 (code 55) Nw== 78 2
8 (code 56) OA== 79 3
9 (code 57) OQ== 79 3
: (code 58) Og== 79 3
; (code 59) Ow== 79 3
< (code 60) PA== 80 0
= (code 61) PQ== 80 0
> (code 62) Pg== 80 0
? (code 63) Pw== 80 0
@ (code 64) QA== 81 1
A (code 65) QQ== 81 1
B (code 66) Qg== 81 1
C (code 67) Qw== 81 1
D (code 68) RA== 82 2
E (code 69) RQ== 82 2
F (code 70) Rg== 82 2
G (code 71) Rw== 82 2
H (code 72) SA== 83 3
I (code 73) SQ== 83 3
J (code 74) Sg== 83 3
K (code 75) Sw== 83 3
L (code 76) TA== 84 0
M (code 77) TQ== 84 0
N (code 78) Tg== 84 0
O (code 79) Tw== 84 0
P (code 80) UA== 85 1
Q (code 81) UQ== 85 1
R (code 82) Ug== 85 1
S (code 83) Uw== 85 1
T (code 84) VA== 86 2
U (code 85) VQ== 86 2
V (code 86) Vg== 86 2
W (code 87) Vw== 86 2
X (code 88) WA== 87 3
Y (code 89) WQ== 87 3
Z (code 90) Wg== 87 3
[ (code 91) Ww== 87 3
\ (code 92) XA== 88 0
] (code 93) XQ== 88 0
^ (code 94) Xg== 88 0
_ (code 95) Xw== 88 0
` (code 96) YA== 89 1
a (code 97) YQ== 89 1
b (code 98) Yg== 89 1
c (code 99) Yw== 89 1
d (code 100) ZA== 90 2
e (code 101) ZQ== 90 2
f (code 102) Zg== 90 2
g (code 103) Zw== 90 2
h (code 104) aA== 97 1
i (code 105) aQ== 97 1
j (code 106) ag== 97 1
k (code 107) aw== 97 1
l (code 108) bA== 98 2
m (code 109) bQ== 98 2
n (code 110) bg== 98 2
o (code 111) bw== 98 2
p (code 112) cA== 99 3
q (code 113) cQ== 99 3
r (code 114) cg== 99 3
s (code 115) cw== 99 3
t (code 116) dA== 100 0
u (code 117) dQ== 100 0
v (code 118) dg== 100 0
w (code 119) dw== 100 0
x (code 120) eA== 101 1
y (code 121) eQ== 101 1
z (code 122) eg== 101 1
{ (code 123) ew== 101 1
| (code 124) fA== 102 2
} (code 125) fQ== 102 2
~ (code 126) fg== 102 2
 (code 127) fw== 102 2
� (code 128) gA== 103 3
� (code 129) gQ== 103 3
� (code 130) gg== 103 3
� (code 131) gw== 103 3
� (code 132) hA== 104 0
� (code 133) hQ== 104 0
� (code 134) hg== 104 0
� (code 135) hw== 104 0
� (code 136) iA== 105 1
� (code 137) iQ== 105 1
� (code 138) ig== 105 1
� (code 139) iw== 105 1
� (code 140) jA== 106 2
� (code 141) jQ== 106 2
(code 142) jg== 106 2
(code 143) jw== 106 2
� (code 144) kA== 107 3
� (code 145) kQ== 107 3
� (code 146) kg== 107 3
� (code 147) kw== 107 3
� (code 148) lA== 108 0
� (code 149) lQ== 108 0
� (code 150) lg== 108 0
� (code 151) lw== 108 0
� (code 152) mA== 109 1
� (code 153) mQ== 109 1
� (code 154) mg== 109 1
� (code 155) mw== 109 1
� (code 156) nA== 110 2
� (code 157) nQ== 110 2
� (code 158) ng== 110 2
� (code 159) nw== 110 2
� (code 160) oA== 111 3
� (code 161) oQ== 111 3
� (code 162) og== 111 3
� (code 163) ow== 111 3
� (code 164) pA== 112 0
� (code 165) pQ== 112 0
� (code 166) pg== 112 0
� (code 167) pw== 112 0
� (code 168) qA== 113 1
� (code 169) qQ== 113 1
� (code 170) qg== 113 1
� (code 171) qw== 113 1
� (code 172) rA== 114 2
� (code 173) rQ== 114 2
� (code 174) rg== 114 2
� (code 175) rw== 114 2
� (code 176) sA== 115 3
� (code 177) sQ== 115 3
� (code 178) sg== 115 3
� (code 179) sw== 115 3
� (code 180) tA== 116 0
� (code 181) tQ== 116 0
� (code 182) tg== 116 0
� (code 183) tw== 116 0
� (code 184) uA== 117 1
� (code 185) uQ== 117 1
� (code 186) ug== 117 1
� (code 187) uw== 117 1
� (code 188) vA== 118 2
� (code 189) vQ== 118 2
� (code 190) vg== 118 2
� (code 191) vw== 118 2
� (code 192) wA== 119 3
� (code 193) wQ== 119 3
� (code 194) wg== 119 3
� (code 195) ww== 119 3
� (code 196) xA== 120 0
� (code 197) xQ== 120 0
� (code 198) xg== 120 0
� (code 199) xw== 120 0
� (code 200) yA== 121 1
� (code 201) yQ== 121 1
� (code 202) yg== 121 1
� (code 203) yw== 121 1
� (code 204) zA== 122 2
� (code 205) zQ== 122 2
� (code 206) zg== 122 2
� (code 207) zw== 122 2
� (code 208) 0A== 48 0
� (code 209) 0Q== 48 0
� (code 210) 0g== 48 0
� (code 211) 0w== 48 0
� (code 212) 1A== 49 1
� (code 213) 1Q== 49 1
� (code 214) 1g== 49 1
� (code 215) 1w== 49 1
� (code 216) 2A== 50 2
� (code 217) 2Q== 50 2
� (code 218) 2g== 50 2
� (code 219) 2w== 50 2
� (code 220) 3A== 51 3
� (code 221) 3Q== 51 3
� (code 222) 3g== 51 3
� (code 223) 3w== 51 3
� (code 224) 4A== 52 0
� (code 225) 4Q== 52 0
� (code 226) 4g== 52 0
� (code 227) 4w== 52 0
� (code 228) 5A== 53 1
� (code 229) 5Q== 53 1
� (code 230) 5g== 53 1
� (code 231) 5w== 53 1
� (code 232) 6A== 54 2
� (code 233) 6Q== 54 2
� (code 234) 6g== 54 2
� (code 235) 6w== 54 2
� (code 236) 7A== 55 3
� (code 237) 7Q== 55 3
� (code 238) 7g== 55 3
� (code 239) 7w== 55 3
� (code 240) 8A== 56 0
� (code 241) 8Q== 56 0
� (code 242) 8g== 56 0
� (code 243) 8w== 56 0
� (code 244) 9A== 57 1
� (code 245) 9Q== 57 1
� (code 246) 9g== 57 1
� (code 247) 9w== 57 1
� (code 248) +A== 43 3
� (code 249) +Q== 43 3
� (code 250) +g== 43 3
� (code 251) +w== 43 3
� (code 252) /A== 47 3
� (code 253) /Q== 47 3
� (code 254) /g== 47 3
� (code 255) /w== 47 3

(Some of the symbols in the first column may appear not visible or otherwise look strange. This is normal, there are many characters in ASCII that has (or had) some special meaning.)

Let's count have many 0s, 1s, 2s and 3s did we get:

0 60
1 68
2 64
3 64

This is not ideal, because you get statistically slightly more 1s than 0s, but if you don't care too much, this is all!

If you care however, here is some solution. We need to get 4 results that give us 1 and convert them to some that gives 0:

echo $(($(printf '%d' "'`head -c 1 /dev/urandom | base64 | tr A D`")%4))

Notice the tr A D. This changes each A to D in the base64 output. Thus four rows of table above (for 0 to 4 codes) should behave like the 12 to 15 codes, thus giving 0 at the end instead of 1.

We're done. Let's enclose the procedure in function clause and create a sample code that tells us how many 0s, 1s, 2s and 3s where hit within 100 shots.

#!/bin/bash

function random() {
    echo $(($(printf '%d' "'`head -c 1 /dev/urandom | base64 | tr A D`")%4))
}

ret=`i=0; time while [ $i -lt 100 ]; do random ; i=$((i+1)); done`

echo -n '0: ' ; echo "$ret" | grep 0 | wc -l
echo -n '1: ' ; echo "$ret" | grep 1 | wc -l
echo -n '2: ' ; echo "$ret" | grep 2 | wc -l
echo -n '3: ' ; echo "$ret" | grep 3 | wc -l

The program also shows how much time did it take to generate the numbers. Adjust the 100 to your needs ;).

UPDATE: having this article posted on reddit programming gave me actually much much better ways of doing a random function (see in comments). Thank you for your replies!

Comments: 6

Wikidot Outage

04 Mar 2009 22:22

Today Wikidot encountered a small break in its operation. After over an hour, we managed to get everything back to normal. The part that took the longest time was (who would guess?) filesystem check (after 280 days without check).

Normally starting up a machine takes a minute or two and is almost indistinguishable from a network outage or some other temporary failures. But with Wikidot having as many files that our users upload the operation of checking the filesystem takes long time.

Not even because of today's crash, I must confess, we have plans of decentralizing the service and moving it to more distributed environment to let it be (even) more reliable. Even including the crash we have still very high uptime, that would satisfy just everyone. But not us. We aim at having 100% (or more ;-) ) uptime, and make things totally fault-tolerant.

I must say we are really really sorry for what happened today but in the same time I must ensure that we really care about you — the Users — as many of you have noticed for sure. I hope you still believe in us :).

Comments: 0

Wikidot Search Launched

02 Mar 2009 21:28

After about three months of indexing (because Wikidot is BIG) all content that's hosted on Wikidot, the time came to launch the new search system.

I described the system extensively in blog post titled New search for Wikidot's gonna rock.

The new search system replaced the old Google-powered one.

The main advantages over Google Search Engine that has been used till now are:

  • search in public sites + those you are a member of
  • semantic search: tags, title are more important when searching
  • simple to sophisticated queries
    • "blog site:community" — search for anything with "blog" in it but only on site community.wikidot.com
    • "tags:wikidot site:quake" — search for pages tagged "wikidot" on my site (quake.wikidot.com)
  • poor quality content filtered

Things left to do:

small.jpg
  • add site preview (thumbnail) like the one on right to the search results
  • explain in simple words the search syntax
  • promote good and/or active sites (give them higher rank)
  • decrease delay from editing to updating search index (should be max 5 minutes)
  • create custom search module searching in a bunch of (related) sites — this would be nice if you keep separate sites for different areas of a project like private site for project members and public one for project users. The search is intelligent enough to filter restricted items from search results if someone is not member of the private site the items come from.

Comments: 2

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License