I’ve used bash for a long time. Even the rather old versions have some array support, and between bash functions, bash aliases, and external commands, you can get a lot of things done in the shell. You just have to tolerate a lot of quirks. I’ve recently run into something that I can’t do with bash and it might be time to switch to zsh or something more radically different.

The Thing I Want

I’d like to generalize this loop, which iterates over the lines in the output of a command:

OLDIFS=$IFS; IFS=$'\n'
for LINE in $(echo some command; echo that generates multiline output); do
    echo someprefix $LINE
done
IFS=$OLDIFS

The generalized syntax that I’d like would be something along these lines:

forlines in $(echo a multi-line; echo command thingie; echo goes here); do
	echo blah $LINE
done

And of course, the output would look like this:

blah a multi-line
blah command thingie
blah goes here

This would use the shell’s do…done looping control syntax and do whatever I want in each iteration. BTW We can ignore the “in” literal, because I don’t strictly care about it, but I do think it makes the commmand more readable without making it harder to use.

The Attempts (and why they don’t work)

This would get it done, but the syntax is invalid.

alias forlines="IFS=$'\n' for LINE in"

Why is that invalid? As the bash manual says:

The environment for any simple command or function may be augmented temporarily by prefixing it with parameter assignments, as described above in PARAMETERS. These assignment statements affect only the environment seen by that command.

The “for” statement is not a “simple command.” What else can we try? The bash manual recommends that complex operations be implemented as shell functions instead of aliases. This is generally a good idea because aliases are only typically expanded in interactive shell commands. And aliases don’t have a concept of arguments or parameters. So, what can be done? Here’s one attempt that changes the apparent “for” loop into something else…

# This goes in the shell profile:
declare -a FORLINES_lines=()
FORLINES_iteration() {
    # local -a FORLINES_lines=()
    local IFS=$'\n'
	
    # Is this the first iteration?
    if [ -z "$FORLINES_counter" ]; then
        export FORLINES_counter=-1
        FORLINES_lines=()
        for LINE in $2; do
            FORLINES_lines+=($LINE)
        done
    fi
	
    FORLINES_counter=$((FORLINES_counter + 1))
    if [ $FORLINES_counter -lt ${#FORLINES_lines[@]} ]; then
        # update the iteration variable
        export LINE="${FORLINES_lines[$FORLINES_counter]}"
        return 0
    else
        # this is the last iteration, so we clean things up
        unset FORLINES_counter;
        unset LINE
        FORLINES_lines=()
        return 1
    fi
}
alias forlines="while FORLINES_iteration"	

# This is how you run it:
forlines in "$(echo 'line one'; echo 'line two')"; do echo this is $LINE; done

So we’ve changed the apparent “for” loop into a while loop that calls shell a function. For each iteration, the function sets a global variable called LINE and returns non-zero when there are no lines left. This looks really really promising.

$ forlines in "$(date; sleep 1; date)"; do echo the date is now $LINE; done
the date is now Tue Nov 12 12:09:59 PST 2013
the date is now Tue Nov 12 12:10:01 PST 2013
$ 

But there’s a fatal flaw and you’ll hear it if you run this command on an OS X machine.

$ forlines in "$(say iteration; date; sleep 1; date)"; do echo the date is now $LINE; done
the date is now Tue Nov 12 12:11:50 PST 2013
the date is now Tue Nov 12 12:11:53 PST 2013
$

Your output will look correct, but you’ll hear the word “iteration” three times. The argument to our hidden behind-the-scenes shell function is being evaluated once for each newline in the output. That means if you do something like this…

$ forlines in "$(grep -v 'tcp' /etc/services | egrep -v '^#')"; do echo service $LINE; done

…your grep commands are each being run 4,926 times (on Mac OS X v10.9). That’s a whole barrel of fail. So how can we avoid that? Here’s one ugly approach based on shell functions and global storage and multiple steps:

# This goes in the shell profile
declare -a FORLINES_buffer=()
getlines() {
    local IFS=$'\n'
    local LINE
	
    FORLINES_buffer=()
    for LINE in $1; do
        FORLINES_buffer+=($LINE)
    done
}
alias forlines='for LINE in "${FORLINES_buffer[@]}"'

# This is how you'd use it
getlines "$(date; date)"; forlines; do echo the date is $LINE; done

That’s pretty ugly. First, I don’t like adding the “getlines” step before you can iterate over each line, and second we’ve now got a bunch of buffered lines in the shell’s environment variables and we have to do yet another extra step to clear them.

I’m going to rule-out this alternative:

# This goes in your profile:
forlines() {
    local LINE;
    local IFS=$'\n';
	
    for LINE in $2; do
    	eval $4
    done
}
	
# This is how you use it:
forlines in "$(say iteration; date; sleep 1; date;)" do 'echo the date is $LINE'

This code works perfectly, but it gets hard to work with when your iteration code gets more complex because you then have to pass the iteration code as an argument, you have to worry about escaping. You also can’t easily take advantage of readline while you are composing your iteration code.

What’s Next?

If anyone has any feedback or suggestions, please send em! In my next entry I’ll take a look at what some other shells might offer, including ZSH and even ipython.