27
Nov/09
0

tar: Extract specific files from a large tarball

tux Unpacking single files from big tar archives which have been compressed with gzip (Often named .tar.gz or tgz) is annoying. For example you got a backup of a big filesystem packed in a tarball and just need to restore a single file which size is just a couple of kilobytes. Extracting the whole archive often takes a lot of time. This is time you can really save. Digging a bit into the man pages you will see it is possible to get single files from large tarballs easily.

You will see it is possible to extract single files based on the path, complete directories and even on wildcard based selections. I give you a brief overview about the ways I extract single files from large tar archives using GNU tar on linux / based systems.

To get the contents of a .tar.gz archive you may use the list switch (-t):

lami@moep:~/test$ tar -tvzf backup.tar.gz
drwxr-xr-x lami/lami         0 2009-11-26 19:46 important/
-rw-r--r-- lami/lami         0 2009-11-26 19:46 important/secret
-rw-r--r-- lami/lami         0 2009-11-26 19:46 important/secret.back
drwxr-xr-x lami/lami         0 2009-11-26 19:46 private/
-rw-r--r-- lami/lami         0 2009-11-26 19:46 private/notes.back
-rw-r--r-- lami/lami         0 2009-11-26 19:46 private/notes

Once you found the file to unpack you can easily extract it by adding the file name as parameter after the archive file:

lami@moep:~/test$ tar -xvzf backup.tar.gz important/secret
important/secret

It is also possible to extract a directory and all the contents from the tar archive:

lami@moep:~/test$ tar -xvzf backup.tar.gz important
important/
important/secret
important/secret.back

You can also extract a couple of files/directories that match wildcards using globbing patterns:

lami@moep:~/test$ tar -xvzf backup.tar.gz --wildcards --no-anchored '*.back'
important/secret.back
private/notes.back

Using the globbing patterns it is also possible to exclude unwanted files from your wildcard selection:

lami@moep:~/test$ tar -xvzf backup.tar.gz --wildcards --no-anchored '*/[^n]*.back'
important/secret.back

Some other advice:

In some cases it may be faster to use zip archives instead of tar based and gzipped archives. Especially when you know that you will often need to extract single files from that big archives. The main reason for that is the algorithm zip / .tar.gz archives are based on. Zip compresses only the single files in an archive. Gzip compresses the whole tar file so it removes redundancies between distinct files in the tar archive. The drawback is that the whole .tar.gz file needs to be scanned from the beginning to the files location to extract that single file. When using zip archives the location of a file is clear without any scanning or calculations.

Comments (0) Trackbacks (0)

No comments yet.

No trackbacks yet.