2. Working With Files

Learning Objectives:

  • Understanding vi

  • Creating Text Files with vi

  • Browsing Text Files with more and less

  • Using head and tail to see file start and end.

  • Displaying File Contents with cat and tac

  • Working with grep

  • Understanding Regular Expressions

  • Using Regular Expressions with grep

Understanding vi(m)

Vim in a nutshell

vimtutor => it is a command line vim tutorial.

More commands:

Insert mode commands:

  • i => insert mode

  • o => insert in with a new line

Command mode commands:

  • ESC => First return to command mode ((it is important to do)

  • v => select lines

  • d => cut/delete selected lines

  • y => copy the selected lines

  • p => paste the selected lines

  • u => undo

  • :redo => redo

  • :w => saving the work without quitting the editor

  • :wq => save and quit

  • :wq! => forcefully save and quit

  • :q! => forcefully quit without saving

  • First press ESC

  • gg => all the way up

  • G => all the way down

Searching in vi (same as man)

  • First press ESC

  • /searchText

Searching and Replacing text in vi

  • :s/searching_word/replacing_word: It will only search and replace the first occurrence.

Search and replace first occurrence
  • :%s/searching_word/replacing_word: It will search and replace all the occurrences of the word.

Search and replace EVERY occurrence

Using head and tail

By default you get 10 lines (if there are 10 lines of course) of a file in both head and tail cases.

But you can change the number as per your requirement. -n <number>

tail -f filename - the best part of this command is that it works in real time. This is especially useful if you are monitoring the log files in real time:

Displaying File Contents with 'cat' and 'tac'

  • cat has some nice options

    • -A shows all non-printable characters (can be quite useful in troubleshooting configuration files)

    • -b prints line numbers

    • -n prints line numbers, but not empty lines

    • -s suppressed repeated empty lines

  • tac is like cat, but in reversed order.

Example 1: $ shows every time we pressed the enter button:

Example 2: -b option skips the empty line:

Example 3: -n option counts every line whether empty or non-empty:

Example 4: -s removes repeated empty lines:

Example 5: tac reversed the text order:

Working with grep

grep is a filtering utility and often used with pipe. For example:

Example 1: grep can find pattern occurrences in files.

  • The following output shows the text osboxes has been found in all these files.

Example 2: The following example shows Linux is case-sensitive.

  • grep -i search_text myfile => shows all the line of the text occurrences. -i makes it case-insensitive.

You can also exclude some words with pipe grep -v excluding_word

-r and -l options are interesting.

  • -r is recursive pattern matching with show the lines where the text matches and the file name.

  • -l is also recursive but it only shows the files where the text was identified.

Regex Expressions

Regular expressions are helpful when you're not sure about the text pattern. They are used by grep and other utilities.

  • Don't confuse regular expressions with globbing.

  • Regular Expressions look like globbing but they are not same.

  • To avoid any confusion, it is a good idea to keep your regex in single quotes. Single quotes aren't interpreted by the shell.

  • Used with specific tools only (grep, vim, awk, sed)

  • See man 7 regex for details

Understanding Regular Expressions

  • Regular expressions are built around atoms; an atom specified what text is to be matched.

    • Atoms can be single characters, a range of characters, or a dot if you don't know the character

    • Or atoms can be a class, such as [[:alpha:]], [[:upper:]] or [[:alnum:]]

  • The second element is the repetition operator, which specifies how many times a character should occur

  • The third element is indicating where to find the next character

Starting here:

  • ^ beginning of the line

  • $ end of the line

  • \< beginning of a word

  • \> end of a word

  • \A start of a file

  • \Z end of a file

For repetition:

  • {n} exact n times

  • {n, } minimal n times

  • {, n} n times max

  • {n,o} between n and o times

  • *zero or more times

  • + one or more times

  • ? zero or one times

Note: whenever in doubt, use egrep as often as possible with regex. And, keep your regex in single quotes.

'^' matches the pattern beginning of a line
'$' matches the pattern ending of a line
'.' represents any single character like '?' in globbing
the expression seeks 'b' twice
'b' - zero or more times

Common Text Processing Utilities

  • cut: filter output from a text file

  • sort: sort files, often used in pipes

  • tr: translates uppercase to lowercase

  • awk: search for specific patterns

  • sed: powerful stream editor to batch-modify text files

cut: filters output from a text file. It is mainly used for cutting content from tabular data or CSV file.

Example 1: Imagine you want to see the all usernames from /etc/passwd file.

A sneak peak of the /etc/passwd file
cut -d : -f 1 /etc/passwd
###################################
# -d : => delimiter by ':'
# -f 1 => first field from the file
###################################

Example 2: Imagine you want to extract (or filter) more than one field with and without a custom delimiter.

Filter specific fields

Example 3: For a range

Filter a range

Example 4: Output by a custom delimiter

Delimited by my crazy delimiter

Example 5: Everything except one or a few fields.

More examples:

grep "/bin/bash" /etc/passwd | cut -d':' -f1,6
cut -d: -f1,7 /etc/passwd | sort | uniq -u | head

awk command

awk can do all that you can do with cut. But awk has more functionalities.

Example 1: Filter some text similar to cut

awk -F : '{ print $4 }' /etc/passwd | head # Using AWK
cut -d : -f 4 /etc/passwd | head # Using CUT

sort treats things as text, but to sort is differently you need to specify the option. In the following case, you need to specifically mention that you want sort to treat the text as number.

Sort options

Sort command comes with pretty good options to order file contents. Let's discuss some of them as follows:

sort -n: To sort a file numerically. sort -u: Suppress lines that repeat an earlier key. sort -k: Sorting a table by column number. sort -t SEP: Use the provided separator to identify the fields. sort -M: Sort as per the calendar month specified. sort -b: Ignore blanks at the start of the line. sort -r: Sorting in reverse order. sort -o: Output to a new file.

Example 1: Sorting and removing duplicate lines

$ cat duplicate.txt
hello
linux
lInux
Linux
raghu
world
zzz
zzz
$ sort -u duplicate.txt
hello
linux
lInux
Linux
raghu
world
zzz

Example 2: Sort by column (-k Option)

$ cat population.txt
Kids 500 India
Youth 400 England
Senior 600 USA
Junior 9000 Australia
Pensioners 650 China
$ sort -k2 population.txt
Youth 400 England
Kids 500 India
Senior 600 USA
Pensioners 650 China
Junior 9000 Australia

Learn more sort here: https://linoxide.com/linux-command/sort-command/

tr : Translate

Lower to uppercase

Lab

  • Use head and tail to display the fifth line of the file /etc/passwd

  • Use sed to display the fifth line of the file /etc/passwd

  • Use awk in a pipe to filter the first column out of the results of the command ps -aux

  • Use grep to show the names of the files in /etc that have lines starting with the text 'root'.

  • Use grep to show all lines from all files in /etc that contain exactly 3 characters

  • Use grep to find all files that contain the string "alex", but make sure that "alexander" is not included in the result.

Lab Solution

1. Use head and tail to display the fifth line of the file /etc/passwd

head -n 5 /etc/passwd | tail -n 1

2. Use sed to display the fifth line of the file /etc/passwd

$ sed -n 1,5p /etc/passwd # Print 1 to 5 lines
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
$ sed -n 5p /etc/passwd # Print only 5th line
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

3. Use awk in a pipe to filter the first column out of the results of the command ps -aux

ps -aux | awk -F " " '{ print $1 }'
## AWK is pretty good at recognizing seperators.
## So, in certain case, it is not necessary to specify it.
ps -aux | awk '{ print $1 }'

4. Use grep to show the names of the files in /etc that have lines starting with the text 'root'.

egrep -l '^root' * 2>/dev/null # -l => show the file names that contain the pattern

5. Use grep to show all lines from all files in /etc that contain exactly 3 characters.

# It can be achieved either way
egrep '^.{3}$' * 2>/dev/null
egrep '^...$' * 2>/dev/null
# That said, the question wants you to only show the files
egrep -l '^.{3}$' * 2>/dev/null #The solution

From the above output, it appears that white spaces and tabs are also considered as characters from grep's perspective.

[root@osboxes etc]# egrep -l '^...$' * 2>/dev/null
adjtime
favicon.png
filesystems
krb5.conf
localtime
mailcap
mtools.conf
radvd.conf
sudoers
wgetrc

6. Use grep to find all files that contain the string "alex", but make sure that "alexander" is not included in the result.

egrep '^alex$' names2 # 'alex' is grapped only from the begining of line
egrep '\<alex\>' names2 # now 'alex' can be grepped from anywhere in the line.

Resources