ACME Corporation is creating its new CRYPTO exchange platform. They receive crypto data in CSV format where the first column is the crypto coin's short name and the second column is the value in US dollars. The separator is ;
.
For their internal needs, it is required that new output is produced where the short crypto name is replaced with the human-readable name and the separator is replaced with a space.
Wile E. is in charge of such implementation.
Below are the input files:
human-readable.csv
BTC;Bitcoin
ETH;Ethereum
USDT;Tether
USDC;USD Coin
BNB;Binance Coin
BUSD;Binance USD
ADA;Cardano
current-exchange.csv
BTC;19280
ETH;1044
USDT;1
USDC;1
BNB;220.18
BUSD;1
ADA;0.4139
Wile: "Ah.... what a boring task..." After 30 minutes, he provides the team with the following go code that satisfies the requirement:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
func check(e error) {
if e != nil {
panic(e)
}
}
func fileToStringArray(path string) ([]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
var lines []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
return lines, scanner.Err()
}
func stringAryToMap(ary []string) map[string]string {
res := map[string]string{}
for _, s := range ary {
keyValuePair := strings.Split(s, ";")
res[keyValuePair[0]] = keyValuePair[1]
}
return res
}
// First parameter: path to the file containing the coin's human-readable name
// Second parameter: path to the file containing the coin values in US dollars
func main() {
// No sanity checks for readability
descPath := os.Args[1]
valuesPath := os.Args[2]
descriptionsAry, err := fileToStringArray(descPath)
check(err)
valuesAry, err := fileToStringArray(valuesPath)
check(err)
descMap := stringAryToMap(descriptionsAry)
for _, s := range valuesAry {
coinValuePair := strings.Split(s, ";")
fmt.Printf("%s %s\n", descMap[coinValuePair[0]], coinValuePair[1])
}
}
Wile: "Ok. I think I deserve a coffee"
Wile is going to go when he receives an email from "beep.beep@acme.org":
Subject: What a waste of time!
That code is a waste of time. With AWK I could do the same with one line
Wile gets quite mad and answers back:
Subject: Re: What a waste of time!
Ok, I challenge you to make it with a one-liner then!
After a few seconds, Wile receives another email from Beep Beep:
Subject: Re: Re: What a waste of time!
BEGIN{FS=";"} NR==FNR{a[$1]=$2;next} {$1=a[$1]}1
Wile starts laughing and sends another email to Beep Beep:
Subject: Re: Re: Re: What a waste of time!
What happened? The cat walked on your keyboard?
But an answer arrives immediately:
Subject: Re: Re: Re: Re: What a waste of time!
Ah ah... Funny. Then I guess my cat outsmarted you.
Try to believe:
awk 'BEGIN{FS=";"} NR==FNR{a[$1]=$2;next} {$1=a[$1]}1' human-readable.csv current-exchange.csv
Wile runs the provided command and doesn't believe his eyes. It works! So he asks immediately to Beep Beep to come and explain how that magic happened.
Before we can go into explaining the details we will describe the meaning of the awk
keywords we used:
awk
starts reading the input(s). Easy to guess, there is an END block too that is executed after the inputs are processed.awk
automatically splits each line of the input. The line is split using the Field Separator. The value of the Field Separator can be changed/read by setting the value of FS
awk
has processed so far in the current execution.awk
has processed so far in the current file. The difference with NR
is that if awk
receives more than one input, FNR
starts again from 0
on each new file, while NR
is never resettedcontinue
in Java/C++/Go for loop. It immediately passes to the next line of inputSo, here is how BEGIN{FS=";"} NR==FNR{a[$1]=$2;next} {$1=a[$1]}1
works:
I already hear your thoughts: "Ok, nice, but I still don't understand how it does those things".
Let's discover that step by step.
How can we accomplish this? Here NR and FNR will help us.
Since NR will never reset, while FNR will reset on any new file, we will have that, when processing the first file, NR and FNR will have the same value for each line of the input!
So, we can write some code like this:
#!/usr/bin/awk -f
BEGIN{
FS=";"
}
{
if (NR == FNR) {
print "FIRST FILE: "$0
} else {
print "SECOND FILE: "$0
}
}
How can we do that with a one-liner? AWK has the following syntax: 'condition {action}'. condition
is much similar to C++: any non zero value is true
.
By following that syntax, we can try:
$ awk 'BEGIN{FS=";"} NR==FNR { print "FIRST FILE: "$0 } { print "SECOND FILE: "$0 }' human-readable.csv current-exchange.csv
FIRST FILE: BTC;Bitcoin
SECOND FILE: BTC;Bitcoin
FIRST FILE: ETH;Ethereum
SECOND FILE: ETH;Ethereum
FIRST FILE: USDT;Tether
SECOND FILE: USDT;Tether
FIRST FILE: USDC;USD Coin
SECOND FILE: USDC;USD Coin
FIRST FILE: BNB;Binance Coin
SECOND FILE: BNB;Binance Coin
FIRST FILE: BUSD;Binance USD
SECOND FILE: BUSD;Binance USD
FIRST FILE: ADA;Cardano
SECOND FILE: ADA;Cardano
SECOND FILE: BTC;19280
SECOND FILE: ETH;1044
SECOND FILE: USDT;1
SECOND FILE: USDC;1
SECOND FILE: BNB;220.18
SECOND FILE: BUSD;1
SECOND FILE: ADA;0.4139
Ok, it is not what we were expecting. The reason is that the second block of code has no condition, so it is always executed.
We can easily solve that by adding a next
keyword so that awk will immediately jump to the next line:
$ awk 'BEGIN{FS=";"} NR==FNR { print "FIRST FILE: "$0; next } { print "SECOND FILE: "$0 }' human-readable.csv current-exchange.csv
FIRST FILE: BTC;Bitcoin
FIRST FILE: ETH;Ethereum
FIRST FILE: USDT;Tether
FIRST FILE: USDC;USD Coin
FIRST FILE: BNB;Binance Coin
FIRST FILE: BUSD;Binance USD
FIRST FILE: ADA;Cardano
SECOND FILE: BTC;19280
SECOND FILE: ETH;1044
SECOND FILE: USDT;1
SECOND FILE: USDC;1
SECOND FILE: BNB;220.18
SECOND FILE: BUSD;1
SECOND FILE: ADA;0.4139
Now it looks way better!
Let's modify our script now so that it creates the map. That's quite easy:
#!/usr/bin/awk -f
BEGIN{
FS=";"
}
{
if (NR == FNR) {
a[$1]=$2
printf("a[%s]=%s\n", $1, $2)
} else {
print "SECOND FILE: "$0
}
}
Let's do the same thing with a one-liner:
$ awk 'BEGIN{FS=";"} NR==FNR { a[$1]=$2; printf("a[%s]=%s\n", $1, $2); next } { print "SECOND FILE: "$0 }' human-readable.csv current-exchange.csv
a[BTC]=Bitcoin
a[ETH]=Ethereum
a[USDT]=Tether
a[USDC]=USD Coin
a[BNB]=Binance Coin
a[BUSD]=Binance USD
a[ADA]=Cardano
SECOND FILE: BTC;19280
SECOND FILE: ETH;1044
SECOND FILE: USDT;1
SECOND FILE: USDC;1
SECOND FILE: BNB;220.18
SECOND FILE: BUSD;1
SECOND FILE: ADA;0.4139
Outstanding! Now we have our map! We can proceed to the next step!
Again, let's start with updating the script:
#!/usr/bin/awk -f
BEGIN{
FS=";"
}
{
if (NR == FNR) {
a[$1]=$2
} else {
# replace the first column with the mapping
$1=a[$1]
print $0
}
}
And now let's do the same with a one-liner:
$ awk 'BEGIN{FS=";"} NR==FNR { a[$1]=$2; next } { $1=a[$1]; print }' human-readable.csv current-exchange.csv
Bitcoin 19280
Ethereum 1044
Tether 1
USD Coin 1
Binance Coin 220.18
Binance USD 1
Cardano 0.4139
That's it! We now have our output with just a few keystrokes!
However, the original code from Beep Beep was a bit shorter:
Our code: BEGIN{FS=";"} NR==FNR{a[$1]=$2;next} {$1=a[$1];print }
Beep Beep's: BEGIN{FS=";"} NR==FNR{a[$1]=$2;next} {$1=a[$1]}1
How does that work? If you remember at the beginning of the post, I gave a piece of very important information. AWK syntax is as below:
condition {action}
There is now another very important info to know:
condition
is not specified, it defaults to true
action
is not specified, it defaults to print $0
.So let's analyse now Beep Beep's code:
condition | action |
---|---|
NR==FNR | { a[$1]=$2; next } |
1 (implicit) | {$1=a[$1]} |
1 | print $0 (implicit) |
The magic is finally explained: by adding that 1
(always true condition) at the end of the AWK one-liner, Beep Beep forced the execution of the default action!
AWK is very powerful, but sometimes it can look cumbersome. Let's split the one-liner piece by piece and recap what is the meaning of each piece
BEGIN{FS=";"} NR==FNR { a[$1]=$2; next } { $1=a[$1]}1
a
maptrue
. Since no action for this is specified, the default action is executed (print $0
) and prints the whole lineWhen you need to manipulate text, awk
and sed
are your friends and can make you save a lot of time.
It is worth spending some time studying how they work.