Go GC and maps with pointers.(25 Jan 2018)
The other day I read a blog post titled
GC is bad and you
should feel bad by
@philpearl. It's a well
written blogpost and I would welcome you to go read it.
In it he says:
In our ML fraud scoring pipeline we have a huge (>5GB) table of pre-calculated probabilities that we
kept
in Go map[string]X in RAM.
Recently we’d seen high GC pause times, corresponding with high 95th percentile response time peaks
on our API.
Once we’d moved the allocations off-heap into a custom hash-map backed by mmap allocated memory,..,
the GC pauses practically
disappeared.
Soon after, I saw a tweet by
Damian Gryski, (if you are on
twitter and love Go, then I would highly recommend you follow Damian)
In the tweet, he suggested that a minimum perfect hash function can be created to be used in place of
the string keys to
the map thus reducing GC times.
I was curious as to how using a hash would help to relieve the pressure on the garbage collector.
So, I set out to profile the GC times; first when you have a map whose keys are strings and second when
you have a map whose
keys are strings that have been hashed into ints.
The program I used to profile that is:
package main
import (
"strconv"
"fmt"
"os"
"runtime"
"time"
)
// run this program as:
/*
#!/bin/bash
for t in 1 2; do go run gctest.go $t; done
*/
func timeGC() time.Duration {
start := time.Now()
runtime.GC()
return time.Since(start)
}
func main() {
const N = 30e6
if len(os.Args) != 2 {
fmt.Printf("usage: %s [1 2]\n(number selects the test)\n", os.Args[0])
return
}
switch os.Args[1] {
case "1":
// create a big map of strings. since strings contain pointers,
// we expect this to have more pressure on GC.
m := make(map[string]string)
for i := 0; i < N; i++ {
n := strconv.Itoa(i)
m[n] = n
}
runtime.GC()
fmt.Printf("With a map of strings, GC took: %s\n", timeGC())
_ = m["0"]
case "2":
// create a map of ints. We want to store strings in the map but unlike in case 1,
// we will hash the strings to ints (which have no pointer)
// so we expect less pressure on GC. We are using strconv.Atoi(str)
// as a stand-in for a proper hash function
m := make(map[int]int)
for i := 0; i < N; i++ {
str := strconv.Itoa(i)
// hash string to int
n,_ := strconv.Atoi(str)
m[n] = n
}
runtime.GC()
fmt.Printf("With map of strings(that are hashed to ints), GC took:: %s\n", timeGC())
_ = m[0]
}
}
On running it, the results were;
With a map of strings, GC took:: 775.280442ms
With map of strings(that are hashed to ints), GC took:: 19.086413ms
For the map whose keys are strings(ie *pointers) the Garbage collector ran for 775.2ms on my machine, while
for the other
map whose keys are strings that have been hashed into ints the GC ran for 19ms.
NB: Note that I used
strconv.Atoi(str) in place of a proper string to integer hash function. In practice, you would be
using
a proper hash implementation
That contrast is much more clear when visualized, So I used
Dave Cheney's gcvis program
to visualize the programs GC trace data.
The results are as shown below:
GC profile of a creating a large Go map whose keys are strings;
data:image/s3,"s3://crabby-images/22039/220393e00ada4959dd42adda956a2952c42f77b0" alt="map with string keys GC profile"
data:image/s3,"s3://crabby-images/78ac7/78ac7a069e72e9116ce91008ba7945bfaee4a3db" alt="map with string keys hashed to ints GC profile"