[SHELL] Looping files in a folder

Wile Ethelbert from the ACME company needs to compute some statistics about all the values in a list of CSV files in the ~/csv folder.

Wile: "That's easy! Let's start looping all the files."

The produced code is:

for f in $(ls ~/csv/*.csv); do  
    echo filename: $f 
done

By executing the script, the output is:

filename: /Users/wilic/csv/-file
filename: 02.csv
filename: /Users/wilic/csv/file
filename: 99.csv
filename: /Users/wilic/csv/file
filename: **100**.csv
filename: /Users/wilic/csv/file
filename: 01.csv

Wile: "Hmmm. Something is wrong. Why do I see files not ending with .csv? Let's check the content of the folder"

$ ls ~/csv
-file 02.csv     file ?99.csv     file **100**.csv file 01.csv

Wile: "Ah! There are files with spaces in the name! I have to double quote it!":

for f in "$(ls ~/csv/*.csv)" ; do  
    echo filename: $f 
done
filename: /Users/wilic/csv/-file 02.csv /Users/wilic/csv/file 99.csv /Users/wilic/csv/file **100**.csv /Users/wilic/csv/file 01.csv

Wile: "No no no... Doing this way ls output is treated as a single line! Let's change approach"

for f in $(find ~/csv -type f -name '*.csv') ; do
    echo filename: $f
done

Wile executes it, but the output disappoints his expectations:

filename: /Users/wilic/csv/file
filename: 99.csv
filename: /Users/wilic/csv/file
filename: 01.csv
filename: /Users/wilic/csv/-file
filename: 02.csv
filename: /Users/wilic/csv/file
filename: **100**.csv

Wile: "Hmmm. Nothing has changed. How can I fix it? Hey Road Runner, can you help me please?"

They start working hard on the issue. The first idea they have together is to change the IFS (Internal Field Separator) value:

IFS=$'\n'
for f in $(ls ~/csv/*.csv) ; do  
    echo filename: $f 
done

They run it:

filename: /Users/wilic/csv/-file 02.csv
filename: /Users/wilic/csv/file
filename: 99.csv
filename: /Users/wilic/csv/file **100**.csv
filename: /Users/wilic/csv/file 01.csv

Wile: "Hmmm. What's happening? What is 99.csv?"
Road Runner: "I think I understand what's happening! The '?' in the 'file ?99.csv' filename is a newline!"
Wile: "What? A filename can contain a newline?"
Road Runner: "Yeah! A filename can contain EVERYTHING but the NUL character"
Wile: "I thought this was an easy job... how can we solve this?"

Solution

There are 4 solutions to this issue

First solution: use the find command with the exec parameter

find ~/csv -type f -name '*.csv' -exec YOURCOMMAND {} \;

This is very handy if you just have to execute one command on each file. It can be used even if YOURCOMMAND accepts the list of files all at once by changing it like this:

find ~/csv -type f -name '*.csv' -exec YOURCOMMAND {} +

Second solution: using glob

If you need to do more complex elaborations or save values in variables to be used outside the loop, you can use bash glob:

for f in ~/csv/*.csv ; do
    [ -e "$f" ] || continue
    echo filename: "$f" 
done

The output will be:

filename: /Users/wilic/csv/-file 02.csv
filename: /Users/wilic/csv/file
99.csv
filename: /Users/wilic/csv/file **100**.csv
filename: /Users/wilic/csv/file 01.csv

You will ask: "Why '[ -e "$f" ] || continue' ?". Because if there are no .csv files in the folder, the for...loop will be executed one time with f=./*.mp3. With that test, we are simply checking that the file exists.

Third solution: using while and find

This solution is handy if you need to recurse subdirectories

while IFS= read -r -d '' f; do
  echo "filename: $f"
done < <(find ~/csv -type f -name '*.csv' -print0)

Fourth solution: using globstar

This solution works only with bash 4 or newer

shopt -s globstar
for f in ~/csv/**/*.csv; do
  echo "filename: $f"
done