Blog

Learning Grep – Searching for content on Linux/Mac

19 Nov, 2018
Xebia Background Header Wave

The Global Regular Expression Print or Grep is a tool that searches text files for the occurrence of a specified regular expression and outputs any line containing a match to standard output. Grep uses regular expressions or Regex for the matching algorithm. Regex is a symbolic notations used to identify patterns in text and is widely used to process text. Its time to learn some ‘grep’!

Grep on Linux and Mac

When you use the terminal, changes are that you use Linux and Mac and switch back and forth between them. Lets start by explaining that the Grep command is different on Linux than on the Mac. The Mac comes with the BSD version of Grep that can be checked by typing:

$ grep -V
grep (BSD grep) 2.5.1-FreeBSD

Linux uses the GNU version of Grep that can be checked by typing:

$ docker run -it ubuntu
root@d5b72371f815:/# grep -V
grep (GNU grep) 3.1
Copyright (C) 2017 Free Software Foundation, Inc.

GNU Grep on Mac

GNU grep can be installed on the Mac by means of homebrew. After brew has installed grep, you have two versions of the command. The BSD version is called grep and the GNU version is called ggrep. The examples that follow use the GNU version of grep, so Mac users should use the ggrep command.

# brew install grep
$ ggrep -V
ggrep (GNU grep) 3.1
Packaged by Homebrew
Copyright (C) 2017 Free Software Foundation, Inc.

Posix and PCRE Regex

The difference between the BSD and GNU version of Grep is the Regex engine that it uses. The BSD version uses the POSIX Compatible Regular Expressions and the GNU version uses the Perl Compatible Regular Expressions (PCRE). The short explanation of the difference is that the GNU version of grep is much easier to use. For an overview of the differences see this cheat sheet.

Using Grep

Most often we use grep to pipe output to. That way, grep acts as a filter eg:

# take the first 3 lines from the history, filtered by grep
$ history | grep "git" | head -3
  157  git status
  158  git status
  159  git add .

By default grep matches are case sentitive:

$ history | grep "TEST" | head -3
  399  touch TEST_1.txt
  400  touch TEST_2.txt
  401  touch TEST_3.txt

By adding the -i option, searches become case insensitive:

$ history | grep -i "TEST" | head -3
  302  find . -type f -name test*
  303  find . -type f -name test*.*
  304  find . -type f -name test1.txt

We can also match on exact words with the -w option:

$ history | grep -w "git status" | head -3
  157  git status
  158  git status
  279  git status

We can combine options so -iw searches for an exact word, case insensitive:

history | grep -iw "TEST_1" | head -3
  399  touch TEST_1.txt
  545  history | grep -iw "TEST_1" | head -3

Searching within a file

I have prepared an example project that you can use to learn grep. Grep can be used to search through files for content. For example, to search for the text ‘Dennis’ in the file LICENSE type:

grep "boto3" Pipfile
"boto3" = "*"

The option -n shows the line number of the match:

$ grep -n "boto3" Pipfile
10:"boto3" = "*"

Searching through files

Grep can also be used to search through multiple files by typing:

grep -n "boto3" ./*
./Pipfile:10:"boto3" = "*"
grep: ./config: Is a directory
grep: ./lambdas: Is a directory
grep: ./templates: Is a directory
grep: ./tests: Is a directory

Grep can only operate on files, that is why we see the message ‘Is a directory’. To search through all files in all folders we need to use the recursive -r option:

$ grep -r "handler" .
./lambdas/log_lambda.py:def handler(event, ctx):
./lambdas/cloudwatch_subscription_lambda.py:def handler(event, ctx) -> None:
./templates/cloudwatch.yaml:      Handler: index.handler
./templates/cloudwatch.yaml:          def handler(event, ctx):
./templates/cloudwatch.yaml:      Handler: index.handler
./templates/cloudwatch.yaml:          def handler(event, ctx) -> None:

Grep doesn’t have to show the output, it can also report on which file contains a match with the -l option:

$ grep -rl "handler" .
./lambdas/log_lambda.py
./lambdas/cloudwatch_subscription_lambda.py
./templates/cloudwatch.yaml

Grep can also show context around the match. Use the -A option to see lines after the match. Use -B to see lines before the match and use -C to see lines before and after the match:

$ grep -r -C 2 "handler" .
./lambdas/log_lambda.py:def handler(event, ctx):
./lambdas/log_lambda.py-    print(event)
--
--
./lambdas/cloudwatch_subscription_lambda.py-    return decode_record(event['awslogs'])
./lambdas/cloudwatch_subscription_lambda.py-
./lambdas/cloudwatch_subscription_lambda.py:def handler(event, ctx) -> None:
./lambdas/cloudwatch_subscription_lambda.py-    print(json.dumps(decode_event(event)))
--
--
./templates/cloudwatch.yaml-    Type: AWS::Lambda::Function
./templates/cloudwatch.yaml-    Properties:
./templates/cloudwatch.yaml:      Handler: index.handler
./templates/cloudwatch.yaml-      Runtime: python3.6
./templates/cloudwatch.yaml-      Role: !GetAtt 'LambdaBasicExecutionRole.Arn'
--
--
./templates/cloudwatch.yaml-      Code:
./templates/cloudwatch.yaml-        ZipFile: |-
./templates/cloudwatch.yaml:          def handler(event, ctx):
./templates/cloudwatch.yaml-              print(event)
./templates/cloudwatch.yaml-
--
--
./templates/cloudwatch.yaml-    Type: AWS::Lambda::Function
./templates/cloudwatch.yaml-    Properties:
./templates/cloudwatch.yaml:      Handler: index.handler
./templates/cloudwatch.yaml-      Runtime: python3.6
./templates/cloudwatch.yaml-      Role: !GetAtt 'LambdaBasicExecutionRole.Arn'
--
--
./templates/cloudwatch.yaml-              return decode_record(event['awslogs'])
./templates/cloudwatch.yaml-
./templates/cloudwatch.yaml:          def handler(event, ctx) -> None:
./templates/cloudwatch.yaml-              print(json.dumps(decode_event(event)))
./templates/cloudwatch.yaml-  CloudWatchSubscriptionLambdaLogGroup:

Regex search

Grep, being Global Regular Expression Print can search based on a Regex. Note that Mac users must use ggrep and use the -P option to search for PCRE Regex:

$ grep -rl -P "decode_event([.]*)" .
./lambdas/cloudwatch_subscription_lambda.py
./tests/.pytest_cache/v/cache/nodeids
./tests/test_cloudwatch_subscription_lambda.py
./templates/cloudwatch.yaml

Conclusion

Grep or Global Regular Expression Print can search for file content based on a word or a Regex. Grep can also be used to filter results from a command by means of a pipe. Grep is the indispensable tool in your toolbox!

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts