parsing data in awk

  • Last Update :
  • Techknowledgy :

awk is both a programming language and text processor that you can use to manipulate text data in very useful ways. In this guide, you’ll explore how to use the awk command line tool and how to use it to process text.,You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!,Linux utilities often follow the Unix philosophy of design. Tools are encouraged to be small, use plain text files for input and output, and operate in a modular manner. Because of this legacy, we have great text processing functionality with tools like sed and awk.,You can use the BEGIN and END blocks to print information about the fields you are printing. Use the following command to transform the data from the file into a table, nicely spaced with tabs using \t:

The basic format of an awk command is:

awk '/search_pattern/ { action_to_take_on_matches; another_action; }'
file_to_parse

Create a favorite_food.txt file which lists the favorite foods of a group of friends:

echo "carrot sandy
wasabi luke
sandwich brian
salad ryan
spaghetti jessica " > favorite_food.txt

Now use the awk command to print the file to the screen:

awk '{print}'
favorite_food.txt

This isn’t very useful. Let’s try out awk’s search filtering capabilities by searching through the file for the text “sand”:

awk '/sand/'
favorite_food.txt
  1. awk '/sand/' favorite_food.txt
Outputcarrot sandy
sandwich brian

Suggestion : 2

Awk's basic syntax is:

awk[options]
'pattern {action}'
file

To get started, create this sample file and save it as colours.txt

name color amount
apple red 4
banana yellow 6
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5

In awk, the print function displays whatever you specify. There are many predefined variables you can use, but some of the most common are integers designating columns in a text file. Try it out:

$ awk '{print $2;}'
colours.txt
color
red
yellow
red
purple
green
purple
brown
brown
yellow

Regular expressions work as well. This conditional looks at $2 for approximate matches to the letter p followed by any number of (one or more) characters, which are in turn followed by the letter p:

$ awk '$2 ~ /p.+p/ {print $0}'
colours.txt
grape purple 10
plum purple 2

Numbers are interpreted naturally by awk. For instance, to print any row with a third column containing an integer greater than 5:

awk '$3>5 {print $1, $2}'
colours.txt
name color
banana yellow
grape purple
apple green
potato brown

Suggestion : 3

The first sed removed the brackets and braces. The second sed removes the double-quotes. The awk command parses the line by comma delimiters and then parses each line by the semi-colon delimiter and if the first variable $1 is equal to the jobState value then print the second $2 variable.,Find jobState. Print the second argument, and remove the double-quotes.,The above methods will be used within the sample scripts since they use the native Linux tools. They typically do not require you to load extra packages or libraries onto the system.,If the results contain an array of values, then you need to loop through each set and parse out the desired value. For example,

json = '{"type":"OKResult","status":"OK","result":{"type":"Job","reference":"JOB-53","namespace":null,"name":null,"actionType":"DB_SYNC","target":"ORACLE_DB_CONTAINER-9","targetObjectType":"OracleDatabaseContainer","jobState":"RUNNING","startTime":"2016-08-12T19:58:59.811Z","updateTime":"2016-08-12T19:58:59.828Z","suspendable":true,"cancelable":true,"queued":false,"user":"USER-2","emailAddresses":null,"title":"Run SnapSync for database \"VDPXDEV1\".","percentComplete":0.0,"targetName":"Oracle_Source/VDPXDEV1","events":[{"type":"JobEvent","timestamp":"2016-08-12T19:58:59.840Z","state":null,"percentComplete":0.0,"messageCode":"event.job.started","messageDetails":"DB_SYNC job started for \"Oracle_Source/VDPXDEV1\".","messageAction":null,"messageCommandOutput":null,"diagnoses":[],"eventType":"INFO"}],"parentActionState":"WAITING","parentAction":"ACTION-238"},"job":null,"action":null}'

echo $json | sed - e 's/[{}]/'
'/g' | awk - v RS = ',' - F: '{print $1 $2}'
"type"
"OKResult"
"status"
"OK"
"result"
"type"
"reference"
"JOB-53"
"namespace"
null
   "name"
null
   "actionType"
"DB_SYNC"
"target"
"ORACLE_DB_CONTAINER-9"
"targetObjectType"
"OracleDatabaseContainer"
"jobState"
"RUNNING"
"startTime"
"2016-08-12T19
"updateTime"
"2016-08-12T19
"suspendable"
true
   "cancelable"
true
   "queued"
false
   "user"
"USER-2"
"emailAddresses"
null
   "title"
"Run SnapSync for database \"VDPXDEV1\"."
"percentComplete"
0.0
   "targetName"
"Oracle_Source/VDPXDEV1"
"events" ["type"
   "timestamp"
   "2016-08-12T19
   "state"
   null "percentComplete"
   0.0 "messageCode"
   "event.job.started"
   "messageDetails"
   "DB_SYNC job started for \"Oracle_Source/VDPXDEV1\"."
   "messageAction"
   null "messageCommandOutput"
   null "diagnoses" []
   "eventType"
   "INFO"
]
"parentActionState"
"WAITING"
"parentAction"
"ACTION-238"
"job"
null
   "action"
null
echo $json | sed - e 's/[{}]/'
'/g' | sed s / \"//g | awk -v RS=',' -F: '$1=="
jobState "{print $2}' 
RUNNING
json = ' {
   "type": "ListResult",
   "status": "OK",
   "result": [{
      "type": "WindowsHostEnvironment",
      "reference": "WINDOWS_HOST_ENVIRONMENT-1",
      "namespace": null,
      "name": "Window Target",
      "description": "",
      "primaryUser": "HOST_USER-1",
      "enabled": false,
      "host": "WINDOWS_HOST-1",
      "proxy": null
   }, {
      "type": "UnixHostEnvironment",
      "reference": "UNIX_HOST_ENVIRONMENT-3",
      "namespace": null,
      "name": "Oracle Target",
      "description": "",
      "primaryUser": "HOST_USER-3",
      "enabled": true,
      "host": "UNIX_HOST-3",
      "aseHostEnvironmentParameters": null
   }],
   "job": null,
   "action": null,
   "total": 2,
   "overflow": false
}
' 
SOURCE_ENV = "Oracle Target"
lines = `echo ${json} | cut -d "[" -f2 | cut -d "]" -f1 | awk -v RS='},{}' -F: '{print $0}' `
while read - r line
do
   #echo "Processing $line"
#echo $line | sed - e 's/[{}]/'
'/g' | sed s / \"//g | awk -v RS=',' -F: '$1=="
name "{print $2}'
TMPNAME = `echo $line | sed -e 's/[{}]/''/g' | sed s/\"//g | awk -v RS=',' -F: '$1=="name"{print $2}' `
#echo "Name: |${TMPNAME}| |${SOURCE_ENV}|"
if [
   ["${TMPNAME}" == "${SOURCE_ENV}"]
]
then
echo $line | sed - e 's/[{}]/'
'/g' | sed s / \"//g | awk -v RS=',' -F: '$1=="
primaryUser "{print $2}'
PRI_USER = `echo $line | sed -e 's/[{}]/''/g' | sed s/\"//g | awk -v RS=',' -F: '$1=="primaryUser"{print $2}' `
break
fi
done << < "$(echo -e "
$lines ")"

echo "primaryUser reference: ${PRI_USER}"
primaryUser reference: HOST_USER - 3
$ which perl
   /
   usr / bin / perl

$ which python
   /
   usr / bin / python