LanguagesHow to Work with Archive Files in Go

How to Work with Archive Files in Go

Go Programming ``

Go’s standard library provides excellent support for working with several compression formats such as zip or gzip. It is easy to make Go programs able to seamlessly write and read files with gzip compression if they end with a .gz suffix. Also, the library has packages which allow us to write and read .zip files and tarballs. This Go developer tutorial explores the ideas of working with compressed archive files in Go.

Archive Files in the Go Programming Language

Typically, an archive file is a collection of one or more files, along with metadata information for storage or portability, as a single unit of file. Although the main purpose is long-term retention, in practice, it may be used for any purpose. The metadata information often defines the storage quality of files as an archive. For example, the files may be compressed to reduce storage space, store directory structure, error detection and correction information, and so forth. Sometimes the files are encrypted and archived due to security reasons.

We often see archive files used in packaging of software distribution. For example, the deb files for Debian, JAR in Java files, and APK in Android are nothing but archived packages of software. Also the zip or gzip files we see are compressed formats of one or more files for archival purposes. File compression prior to archival is common. Apart from using a number of files as a single unit, compression saves space. This is particularly useful for file transactions over networks or other portable reasons.

Read: How to Use Strings in Go

Go’s Support for File Compression

The standard Go library provides necessary APIs for file compression into different formats. For example, it is quite easy to seamlessly read and write gzip compression files or files that have the .gz extension. There is also API support for reading and writing zip files or tarballs or bz2 files. Here is a quick example of how to use Go APIs to create zipped archives.

How to Create Zip Archives in Go

The simplest way we can create a zipped archive is to use the zip package. The idea is to open a file so we can write in it and then use zip.Writer to write the file into the zip archive. To put this in perspective, here is the order of things we need to do:

Create an archive file. This is nothing but a simple file we create in Go using the os.Create function from the os package. This function creates a file if it does not exist otherwise it truncates the named file.
Now when the archive file is created, we need to create a writer to write other files and directories into the archive. This can be done by the zip.NewWriter function.
After creating a writer, we can add files and directories into the archive using the zip.Writer.Create function.
We can use the copy function either from io.Copy or use io.Writer.Write. The copy function copies from source to destination until EOF is encountered on the source file or an error occurs.

Finally, we must close the zip archive.

We will demonstrate how to create zip archives in Go with the sample code below. Note that we have omitted error handling to focus only on the required points. Normal and full length code must handle errors appropriately.

package main

import (
	"archive/zip"
	"io"
	"os"
)

func main() {

	zipArchive, _ := os.Create("myproject.zip")
	defer zipArchive.Close()
	writer := zip.NewWriter(zipArchive)

	f1, _ := os.Open("pom.xml")
	defer f1.Close()
	w1, _ := writer.Create("pom.xml")
	io.Copy(w1, f1)

	f2, _ := os.Open("SampleApplication.java")
	defer f2.Close()
	w2, _ := writer.Create("src/main/java/SampleApplication.java")
	io.Copy(w2, f2)

	writer.Close()
}

Read: Understanding Mathematical Operators in Go 

Unpacking Zip Archives in Golang

Unpacking – or unzipping – is just as straightforward as zipping except that here we need to re-create the directory structure if the filenames used in the archive include paths. Here is a quick example of how to unpack zip archives in Go:

func main() {

	//...

	src := "myproject.zip"
	dest := "myproject_unpacked"
	var filenames []string
	reader, _ := zip.OpenReader(src)
	defer reader.Close()
	for _, f := range reader.File {
		fp := filepath.Join(dest, f.Name)
		strings.HasPrefix(fp, filepath.Clean(dest)+string(os.PathSeparator))
		filenames = append(filenames, fp)
		if f.FileInfo().IsDir() {
			os.MkdirAll(fp, os.ModePerm)
			continue
		}
		os.MkdirAll(filepath.Dir(fp), os.ModePerm)
		outFile, _ := os.OpenFile(fp, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
		rc, _ := f.Open()
		io.Copy(outFile, rc)

		outFile.Close()
		rc.Close()
	}
}

Here, we open the zip file for reading and instead of calling the os.Open function, we invoke zip.OpenReader for the purpose. This function opens a zip file and returns zip.ReadCloser.

The important aspect of the ReadCloser function is that it contains a slice of a pointer that points to each representing file in the zip archive, which helps us to iterate over each of them. If the archive contains a directory, we then recreate the structure. The os.MkdirAll function provides the capability to create intermediate directories as per the structure.

Tarballs and gzipping in Go

Creating tarballs, or Tape Archive, is another commonly used archive type. It was introduced in the late 70s to write data to tape drives, which were a common storage device at that time. They were utilized for backing up and transporting files in Unix platforms. The technique is still in vogue even today and is quite a popular format for storing multiple files as a single unit and often used for sending files over the internet or software distribution/downloads. This can be seen in Linux and Unix systems with file extensions ending in “.tar”. Note, however, that the purpose of a tarball is to consolidate files and not compress them.

The gzip, on the other hand, is a compression technique where file size is reduced greatly. This algorithm was created by Jean-loup Gailly and Mark Adler as a free software utility for file compression in the early Unix system. The algorithm is based upon the deflate technique (lossless data compression) which is a combination of LZ77 and Huffman encoding. In short, gzip is excellent for compressing text-based files.

We often find files archived with the combination of tarball and compressed with gzip together in numerous platforms. The reason for this is that tarballs is an excellent algorithm for archiving files and gzip for file compression, especially for text files. A combination of them conforms to an efficient technique and is often used together. The archive file typically has an extension such as “.tgz” or ”tar.gz”.

Read: How to Handle Errors in Go

Combining tar and gz in Go

Here is an example to illustrate how to apply or create tar.gz files in Go. The example is similar to the one we have done with zip. Here, we supply a list of files to be archived to a function named createTarAndGz, which takes files named in a slice of a string. Like the previous example, here, also, we first create the output file and name it mytestarchive.tar.gz using the os.Create function:

package main

import (
	"archive/tar"
	"compress/gzip"
	"fmt"
	"io"
	"os"
)

func main() {
	listOfFilesNames := []string{"main.tex", "main.toc", "util/file1.txt", "util/file1.txt"}
	outFile, err := os.Create("mytestarchive.tar.gz")
	if err != nil {
		panic("Error creating archive file")
	}
	defer outFile.Close()
	err = createTarAndGz(listOfFilesNames, outFile)
	if err != nil {
		panic("Error creating archive file.")
	}

	fmt.Println("Archiving and file compression completed.")
}

func createTarAndGz(fileNames []string, buffer io.Writer) error {
	gzipWriter := gzip.NewWriter(buffer)
	defer gzipWriter.Close()
	tarWriter := tar.NewWriter(gzipWriter)
	defer tarWriter.Close()
	for _, f := range fileNames {
		err := addToTar(tarWriter, f)
		if err != nil {
			return err
		}
	}
	return nil
}

func addToTar(tarWriter *tar.Writer, filename string) error {
	file, err := os.Open(filename)
	if err != nil {
		return err
	}
	defer file.Close()
	info, err := file.Stat()
	if err != nil {
		return err
	}
	header, err := tar.FileInfoHeader(info, info.Name())
	if err != nil {
		return err
	}
	header.Name = filename
	err = tarWriter.WriteHeader(header)
	if err != nil {
		return err
	}
	_, err = io.Copy(tarWriter, file)
	if err != nil {
		return err
	}

	return nil
}

In the createTarAndGz function, we create both of the writers for tar and gz, which implements the io.Writer interface. The writes created, respectively, form a chain of command. We then iterate over the slice of file names and add them to the tar archive using the function addToTar.

The addToTar function opens the particular file, gets its metadata information, file header to generate a tar header from it, writes the header, and finally uses the io.Copy function to include the file into the archive. That’s it.

Once the archive file has been created, it is extracted with any untar and gzip decompression utility, such as tar xzvf mytestarchive.tar.gz.

Read: Intro to Database Programming in Go

Final Thought on Archiving Files in Go

Go’s standard library provides the required functionality to work with archive files. It supports different compression techniques such as zip, gzip, bz2, and so forth. All of them can be put together using the tarball. The single compressed unit of several files can then be used for storage or transportation in any media or network. We also can customize the archiving process according to our requirements. For example, we can encrypt the files, compress them, and then archive using tarball. The possibilities are endless with Go’s standard library API to suit our purpose.

Latest Posts

Related Stories