Managing files, links and regex

These are basic common-use, for more check the holy man!

File managing and text processing commands

echo string                     #Prints "string"
echo string > file1             #Overwrites or creates file1 with string 
echo string >> file1            #Creates file1 and/or appends string at the end of file1
echo $PATH                      #Prints contents of variable/environment variable
echo $(command)                 #Prints output of whatever command
echo $(date)                    #Print current date
echo *                          #Prints the name of all items in the current directory
echo */                         #Prints only the name of all directories in the current directory
echo *.txt                      #Prints all items in the current directory ending in .txt
echo \*                         #Escapes and prints literal "*"
echo a \\n b                    #Prints line feed between "a" and "b"
echo a \\c b                    #Produces no further output after \c, so "b" is not printed
echo a \\t b                    #Prints a TAB between "a" and "b"
echo a \\v b                    #Prints a vertical TAB between "a" and "b"
echo 12\\b3                     #Backspaces character just before \\b, will print 13
commandX && echo "Done"         #Will only print "Done" if "commandX" is succesful
commandX || echo "Error"        #Will print "Error" if "commandX" is unsuccesful
echo "\033[1;31mThis is red"    #Prints string after "m" in red color
echo "\033[5;31mFlashing red"   #Prints string in red that will blink constantly
#\033 is for Escape, some colors: [1;33mOrange, [1,34mBlue, [1;35mPurple
#grep will look for literal | egrep or "grep -E" will use regular expressions
grep abc file1             #Prints all the lines with literal "abc" in them
grep ^abc file1            #Prints all the lines that start with "abc"
grep abc$ file1            #Prints all the lines that end with "abc"
grep -i a file1            #Ignores case sensitive, will print "a" or "A" lines
grep -v                    #inVert the result
grep -v ^abc file1         #Prints all the lines that do not start with "abc
grep -c a file1            #Counts number of lines with "a"
grep -cv a file1           #Count number of lines that doesn't contain "a"
grep -o abc file1          #Only prints every iteration of "abc", ignores rest of the line
grep -n a file1            #Shows the line numbers of the matches
grep ^$                    #Prints all the empty lines
grep -v ^$                 #Prints all the non-empty lines
grep ^#                    #Prints only commented lines
grep -w of file1           #Prints only lines with full word "of", will ignore "offer"
grep [a1?] file1           #Prints all inside [] independently, example will look for lines matching "a", "1" and "?"
grep ^[abc] file1          #Prints lines starting with either "a" "b" or "c"
grep [n-m] file1           #Prints the range between n and m $[0-9] will print all digits
grep -A n abc file1        #Prints n lines after matching string
grep -B n abc file1        #Prints n lines before matching string
grep -C n abc file1        #Prints n lines around matching string
grep .                     #Prints any single character, AKA everything
grep .a.                   #Prints any line which has "a" with one character after and before  !Spaces count as characters
grep a *                   #Prints all lines in the current directory that contain "a" and tags the name of the file
grep -r a *                #Prints all the lines in the current directory and others inside, recursive, that contain "a" aand tags the name of the file
grep -h a *                #Prints all lines in the current directory that contain "a" but ignores the name tag of the file
grep -l a *                #Prints the name of the files in the current directory that contain "a"
grep [[:upper:]] file1     #Prints all characters in uppercase
#[[:lower:]] [[:alpha::]] [[:alnum:]] [[:digit:]] [[:space:]] [[:punct:]] [[:xdigit:]] 


#Advanced Regex
#You can escape meta characters with \ or "quote" all the search
grep -E "a{n}" file1            #Prints all the lines that contain exactly n "a", a{3} will print only "aaa"
grep -E "a{n,m}" file1          #Prints lines that contain "a" repeated in the range n to m times
grep -E "a{n,}" file1           #Prints lines that contain "a" repeated n or more times 
grep -E "a{,n}" file1           #Prints lines that contain "a" repeated maximum n times
grep -E "a|b" file1             #Prints lines that contain "a" or "b"
grep -E "Ja(s|cks)on" file1     #Prints lines that contain "Jason" or "Jackson" 
grep -o -P "Tom.{1,10}" file1   #Prints all the ocurrences of "Tom" and the next 1 to 10 characters of the specified file
grep "\<abc\w*"                 #Prints lines that contain words that start with "abc" and end in whatever
grep "\<\w*abc\>"               #Prints lines that contain words that end with "abc"
grep "\<ab\w*cd\>"              #Prints lines that contain words that start with "ab" and end in "cd" with whatever in between


#Examples
#Print all the domain paths and links of a website
curl website | tr " " "\n" | grep -oE 'https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)'
// Some code

Links are special files that point to other files, we distinguish 2 types:

  • Hard Link: points to the data in the disk itself, to the exact inode, if you create a hardlink and check with ls -i you will see that they both point to the same inode index. It's like opening another door to the same place. So if you delete the original file, it will still be there, until all hardlinks are removed the file isn't removed from the disk. Also they can't be pointing to different filesystems(partitions), as inodes would collide since every partition has it's own index. You can only do hardlinks in the same partition. Neither they can point to directories because they could cause errors in correlation. Take in mind that hardlinks share also metadata, so privileges will be the same everywhere in every link, but you can create a hardlink on a public directory pointing to a file on a private directory (ex: your home where no one can access) and they could have access to it while not being able to even see the directory where it comes from.

  • Symbolic Link (AKA Soft Link): points to the path to the data, not the data itself. Unlike hardlinks, when you create a symlink it will use a different inode, they are not the same file per se. So if you delete the original file, the symlink will remain broken, unless you place a file with the same exact name and path again and if you delete the symlink, just create another, no drama. Unlike hard links, they can be used for directories. They are like shortcuts on Windows, a convenient "within reach" pathway to another file/directory. A big positive is that as they only point to a route, you can use them between different filesystems(partitions) or even external storage devices. They usually have lrwxrwxrwx privileges, but its meaningless as they share the privileges of the original file and the directory they are in. Also on a sticky bit directory, only the owner of the link may use it

Syntax
ln /route/sourceFile /route/newLink          #Creates a hard link of sourceFile
ln -s /route/sourceFile /route/newLink       #Creates a symbolic link of sourceFile

ln file1 file2 /destiny/route                #Creates multiple hard links in the destiny directory, keeps same name
ln /route/file1 /route/file2 /destiny/route  #Creates multiple symbolic links in the destiny directory, keeps same name

TAR & Compressing/Decompressing

When sending directories with lots of subdirectories and files to other systems, is best practice to group them to make your life and the one on the other side of the Ethernet cable easier, we have two types:

  • TAR: groups multiple items into a single item, usually called "tarball", it's like an archive but size remains the same.

  • Compress: AKA zipping, uses algorithms to remove redundancy on the items, therefore making them more compact, smaller, so they are easier to move over networks or to get stored

In linux usually what we do is a combination of both, to get all the files grouped and then compressed, best of both worlds.

Remember you have to write the extension ".tar.gz" or ".tgz" into the file, if you forget to it will work anyway, but can cause confussion.

Syntax
#TAR
tar -cvf tarball.tar dir       #Archives the contents of /dir into a new file tarball.tar
-c                              #Creates tarball
-v                              #Verbose, shows what's going, what files got inside
-f                              #Especifies next argument will be the tar file, always necessary
tar -rvf tarball.tar newFile    #Appends another file into the existinr tar file
tar -xvf tarball.tar            #E(x)tracts the contents of a tar file
tar -xvkf tarball.tar           #Keep old files, don't replace existing files when extracting the same ones
tar -tvf tarball.tar            #Lists the contents of a tar file, verbosed(same as ls -l)
tar --delete -f tarbl.tar rdir/ #Removes specified dir from inside the tar
tar --exclude="file1" -cvf ...  #Excludes from adding specified file/dir, regex available

#Compression with TAR
tar -cvzf tarball.tar.gz dir   #Also .tgz, creates a tar file then compresses it with gzip(GNU zip) algorithm
tar -cvjf tarball.tar.bz2 dir  #Also .tbz2, creates a tar file then compresses it with bzip2 algorithm 
tar -cvJf tarball.tar.xz dir   #Also .txz, creates a tar file then compresses it with xz algorihtm
tar -xv<algorithm>f tarball.x  #Decompresses then extracts the tar file

#Compression
gzip file1                     #Compress file1
gzip -d file1.gz               #Also gunzip, decompress file1
gzip -n                        #Sets n speed/quality, -1 faster, -9 better, default 6
gzip -k                        #Keep, don't delete original file after compressing

#The other compression algorithms use the same parameters as gzip
bzip2
xz
zstd 

For small to medium files it's not important, but if you are into algorithms or min-maxing space, the overall consensus at 2023 looks like:

zstd(--zstd) > xz(J) > bzip2(j) > gzip(z)

But best compression tends to be slower tho, also gzip is most compatible, so don't worry too much.

Last updated