Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

Free AI exercises


Shell utilities



grep

grep                    search for a string

grep string file

(output of prog) | grep string | grep otherstring

-i    ignore case
-v    return all lines that do NOT match


  
More complex solution to long-line problem:
  1. Use egrep to display from 0 to max 40 chars each side of the string:
    egrep -o ".{0,40}string.{0,40}" file
    . means any char
    0 to max n of any char is .{0,n}



string matching / regular expressions

^                       start-of-line
$                       end-of-line
.                       any character

where "c" stands for the character:

c*			0 or more instances of c
cc*			1 or more instances of c
grep "  *"		1 or more spaces
.*                  any sequence of characters



where "c" has a special meaning, e.g. is $ or ., etc:

\c                  the character itself
grep "\."           the '.' character itself

recall the two forms of string:

grep '\$'	works (searches for the "$" char instead of end-of-line)
grep "\$"	fails (double quote treatment of $ is different to single quote treatment of $)
grep "\\$"	works



  


From xkcd.



In July 2019, poor use of regular expressions in a firewall rule took down Cloudflare, taking down a big chunk of the Internet.




cut

cut     extract columns or fields of text on command-line

To extract columns  30  to end of line of the ls listing:

  ls -l | cut -c30-

Note: ls -l outputs are not actually on predictable columns.
Longer/shorter userids give different column numbers. 
Try:

  ls -l /
  

How to use cut to parse grep output


In grep output, extract the 1st field, with delimiter ":"

 grep string *html | cut -f1 -d':' 

Extract the 2nd to end fields, with delimiter ":"

 grep string *html | cut -f2- -d':' 


Q. Why "-f2-" ?
Why not "-f2" ?


sed


sed     "stream editor" - find and replace text on command-line (and other things)

sed 's|oldstring|newstring|'    change first match on each line
sed 's|oldstring|newstring|g'   change all matches 

e.g. ls listing that highlights web pages:

 ls -l 	| sed "s|\.html| [Web page]|"

e.g. ls listing that changes how my username appears:

 ls -l 	| sed "s|$USER|ME|"




sed examples


To insert a new line


To put new lines in front of and after every HTML tag


To substitute back in the pattern we matched


# \( ... \) to mark a pattern
# \1 to reference it later

# e.g. change:
# (start of line)file.html: ...
# to:
# <a href=file.html>file.html</a>: ...

# search for:
# ^\(.*\.html\):
# change to:
# <a href=\1>\1</a>:


grep -i $1 *html |

	sed -e "s|^\(.*\.html\):| <a href=\1>\1</a>: |g"





tr

  

How to convert Windows EOL to Unix EOL

  

"testsuite" has files with odd chars

My "testsuite" collection has some files with odd characters:
cd /users/tutors/mhumphrysdculab/share/testsuite

# find all files with Windows EOL character:
grep -l '\015' */*html

# find all files with characters other than the basic 7 bit characters (00 to 7E hex):
LC_ALL=C  grep -P "[^\x00-\x7E]" */*html

# if you grep a file with non 7 bit chars you may get a warning like:
grep: file.html: binary file matches


awk



dirname, basename

useful cutting and pasting with filenames
dirname	  

basename	  

$ echo $HOME /users/group/me $ dirname $HOME /users/group $ basename $HOME me $ dirname `dirname $HOME` /users




head and tail

head
Display the first 100 lines of the output:
grep string files | head -100 


pipe can close early

If we do this:
 grep string files | head -100 
When head gets the first 100 lines, the pipe closes and grep terminates.
As opposed to: Doing the entire grep and then taking the first 100.
To see this is true, run the program "yes" (which outputs an infinite number of lines) with head, and you will see it does stop:
 yes | head -20 
  
tail
Display the last 30 lines of the logfile:
cat logfile | tail -30 



date

date                         looks like: "Tue Feb 17 16:28:33 GMT 2009"
CURRENTDATE=`date`           remember backquotes
echo $CURRENTDATE             

date "+%b %e"                looks like: "Jan 21"  

date "+%b.%e.log"            can add things to the string 

file=`date "+%b.%e.log"`    
echo $file             


Using date to get unique filename

Say web server in response to client needs to make a temporary file.
Use date to get a new filename that is unique to the current second:
timenow=`date +%H%M%S`
filename="/tmp/random.$timenow.txt"
Unique to second and nanosecond:
date "+%H.%M.%S.%N"


Alternative ways of getting unique filename




tar

  
For sending:
# bundle directory into one file
tar -cf dir.tar dir

# compress it
gzip dir.tar

# (can actually do the above two in one step)
  
When receiving:
# de-compress it 
gzip -d dir.tar.gz

# un-tar it
tar -xf dir.tar

# (can actually do the above two in one step)
  
ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.