Languages Working with Strings in Go

Working with Strings in Go

In programming terminology, a string simply means a sequence or an array of characters. A single character is an alphanumeric value. Back in the days when C was invented, a character in a computer was represented by a 7-bit ASCII code. A string, therefore, is a collection of many 7-bit ASCII characters. However, as the use of computers grew throughout the world, the 7-bit ASCII scheme became insufficient to support other language’s characters. Therefore, various character encoding models were proposed, such as Unicode, UTF-8, UTF-16, UTF-32, etc. The Unicode FAQ is an interesting place to get more detail on these.

Different programming languages have their own character encoding scheme. For example, Java natively uses UTF-16 to map between sequences of sixteen-bit UTF-16 code units and sequences of bytes. Go, on the other hand, uses UTF-8. Both of these are multibyte character encodings. The UTF-8 was originally designed by Rob Pike and Ken Thompson, the same people who designed Go. Go strings are significantly different from C strings and do not operate on the same low level as C. In fact, Go strings operate on a much higher level. Unlike C/C++ and Java or Python, where strings are constant types of fixed-length character sequences, Go strings have a variable width where each character is represented by one or more bytes according to the UTF-8 encoding scheme. Although they behave like an array or slice of bytes, they are actually a separate type with their own set of unique behavior.

Here we’ll discuss a few points with respect to string type and its manipulation in Go.

Go String Functions

Golang features a built-in len function that returns the length of the string or a string’s byte length. According to the array scheme we can access the i-th byte, where 0<=i<=len(theString). Observe the following code snippet:

theString := "Love manifests in pleasing service"
fmt.Println(len(theString))

The following would code result in a panic, because we are trying to access the index out of bound:

fmt.Println(theString[0], " ", theString[len(theString)]) //panic

Instead, we would want to write the code above as follows:

fmt.Println(theString[0], " ", theString[len(theString)-1]) // 76, 101 - (L, e)

There is a shorthand for extracting a substring in Go. For example, we could write:

 
thestring[i:j] (where i<=j) 

to yield a new string which consists of bytes of the original string beginning at index i to index j-1. The string would contain j-i bytes. The following results in the same output because we can omit i and j values, which Go assumes as default 0 and len(theString) respectively.

fmt.Println(theString[0:len(theString)]) // Love manifests in pleasing service
fmt.Println(theString[:len(theString)]) // Love manifests in pleasing service
fmt.Println(theString[0:]) // Love manifests in pleasing service
fmt.Println(theString[:]) // Love manifests in pleasing service

To extract a substring we may write the following code:

fmt.Println(theString[2:8]) // ve man
fmt.Println(theString[:8]) //Love man
fmt.Println(theString[6:]) //anifests in pleasing service

The immutable characteristics of Go strings ensures that the string value can never be changed, although we can assign a new value – or concatenate a new value – without changing the original value. For example:

str1 :="sample text"
str2 := str1 // sample text
str1 += ", another sample" // sample text, another sample

But if we try to modify the original string value it flags a compile-time error because it violates the immutable constraints of Go strings:

str1[4] = 'A' // error!

Comparing Strings in Go

As developers, we often need to compare two strings and Go supports all the common comparison operators, including ==, !=, <, >, <=, and >=. The comparison is done byte by byte and follows natural lexical ordering in the comparison process. Here is a quick example of how to compare two strings in Go:

str1 := "I saw a saw to saw"
str2 := "I saw" + " a saw to saw"

if str1 == str2 {
 fmt.Println("str1 == str2")
}
str2 += " a tree"
if str1 != str2 {
 fmt.Println("str1 != str2")
}
if str1 < str2 {
 fmt.Println("str1 < str2") } str1 += "Today " if str1 > str2 {
 fmt.Println("str1 > str2")
}

Working with the strconv Package in Go

While working with strings we often need to convert a string type into a numeric value. The strconv package contains numerous functions that convert a string value to represent many of the basic Go data types. For example, an integer value may be converted from a string representation as follows:

strInt := "1234"
intVal, err := strconv.ParseInt(strInt, 10, 32) //1234, base-10 number and 32-bit
if err == nil {
 fmt.Println(intVal)
} else {
 fmt.Println(err)
}

The strconv.ParseInt function takes three parameters – the string value to be parsed, base type, and the bit size. There are similar functions for other basic types such as ParseFloat, ParseBool, ParseUint, and ParseComplex for floating-point, boolean, unsigned integer and complex values, respectively.

We also can convert back to a numeric value to a string in Go as demonstrated in the following code example:

floatVal := 3.1415
strFloat = strconv.FormatFloat(floatVal, 'E', -1, 64)
fmt.Println(strFloat)

The conversion functions provided by strconv are more versatile than similar functions such as FormalBool, FormatInt, FormatUint, and FormatComplex. Also, a quick way to convert a float value to string in Go is by using the fmt package is as demonstrated in this code example:

f := 2.5678
strFloat := fmt.Sprintf("%f", f)
fmt.Println(strFloat)

The strconv package contains many conversion functions that work with string values. Check out the Go strconv documentation for more details

The strings Package in Go

The strings package is another utility package for string manipulation. For example, here we have a string that contains weekdays as data where each weekday is delimited by a comma(,). We can parse the string and extract each weekday using the strings.Split function as follows:

str := "sun,mon,tue,wed,thu,fri,sat"
weekdays := strings.Split(str, ",")
for _, day := range weekdays {
 fmt.Println(day)
}

There are numerous such utility functions. For example, to convert a string to uppercase or lowercase letters we may use the function strings.ToUpper(str) or strings.ToLower(str) respectively. There are functions such as Trim that return a slice of a string with all leading and trailing Unicode codepoints in cutsets removed. Check the Go strings documentation for more details.

The unicode/utf8 Package in Go

The unicode/utf8 package contains many functions for querying and manipulating string and UTF-8 bytes. It supplies functions to translate between runes and UTF-8 byte sequences. For example, the utf8.DecodeRuneInString() and utf8.DecodeLastRuneInString() return the first and last characters in a string. Here is a quick example:

q := "A to Z"
fc, size1 := utf8.DecodeRuneInString(q)
fmt.Println(fc, size1)
lc, size2 := utf8.DecodeLastRuneInString(q)
fmt.Println(lc, size2)

Check the Go utf8 documentation for more details.

Final Thoughts on Strings in Go

Go provides extensive support to manipulate strings. There are other packages like unicode that provide functions for querying Unicode code points to determine if they meet certain criteria. The regexp package provides functions to manipulate strings using regular expressions. One of the important things to understand in relation to Go string is that what we loosely call the individual elements of a string a character is actually a sequence of UTF-8 bytes called codepoints typically represented by the word rune which is an alias for the type int32. The Go packages are replete with string manipulation functions, here we have merely scratched the surface. Stay tuned, we’ll explore more.

Curious about other Go features? Check out our article on Reflection in Go.

Latest Posts

Related Stories