Go GC and maps with pointers.(25 Jan 2018)
The other day I read a blog post titled
GC is bad and you
should feel bad by
@philpearl. It's a well
written blogpost and I would welcome you to go read it.
In it he says:
In our ML fraud scoring pipeline we have a huge (>5GB) table of pre-calculated probabilities that we
kept
in Go map[string]X in RAM.
Recently we’d seen high GC pause times, corresponding with high 95th percentile response time peaks
on our API.
Once we’d moved the allocations off-heap into a custom hash-map backed by mmap allocated memory,..,
the GC pauses practically
disappeared.
Soon after, I saw a tweet by
Damian Gryski, (if you are on
twitter and love Go, then I would highly recommend you follow Damian)
In the tweet, he suggested that a minimum perfect hash function can be created to be used in place of
the string keys to
the map thus reducing GC times.
I was curious as to how using a hash would help to relieve the pressure on the garbage collector.
So, I set out to profile the GC times; first when you have a map whose keys are strings and second when
you have a map whose
keys are strings that have been hashed into ints.
The program I used to profile that is:
package main
import (
"strconv"
"fmt"
"os"
"runtime"
"time"
)
// run this program as:
/*
#!/bin/bash
for t in 1 2; do go run gctest.go $t; done
*/
func timeGC() time.Duration {
start := time.Now()
runtime.GC()
return time.Since(start)
}
func main() {
const N = 30e6
if len(os.Args) != 2 {
fmt.Printf("usage: %s [1 2]\n(number selects the test)\n", os.Args[0])
return
}
switch os.Args[1] {
case "1":
// create a big map of strings. since strings contain pointers,
// we expect this to have more pressure on GC.
m := make(map[string]string)
for i := 0; i < N; i++ {
n := strconv.Itoa(i)
m[n] = n
}
runtime.GC()
fmt.Printf("With a map of strings, GC took: %s\n", timeGC())
_ = m["0"]
case "2":
// create a map of ints. We want to store strings in the map but unlike in case 1,
// we will hash the strings to ints (which have no pointer)
// so we expect less pressure on GC. We are using strconv.Atoi(str)
// as a stand-in for a proper hash function
m := make(map[int]int)
for i := 0; i < N; i++ {
str := strconv.Itoa(i)
// hash string to int
n,_ := strconv.Atoi(str)
m[n] = n
}
runtime.GC()
fmt.Printf("With map of strings(that are hashed to ints), GC took:: %s\n", timeGC())
_ = m[0]
}
}
On running it, the results were;
With a map of strings, GC took:: 775.280442ms
With map of strings(that are hashed to ints), GC took:: 19.086413ms
For the map whose keys are strings(ie *pointers) the Garbage collector ran for 775.2ms on my machine, while
for the other
map whose keys are strings that have been hashed into ints the GC ran for 19ms.
NB: Note that I used
strconv.Atoi(str) in place of a proper string to integer hash function. In practice, you would be
using
a proper hash implementation
That contrast is much more clear when visualized, So I used
Dave Cheney's gcvis program
to visualize the programs GC trace data.
The results are as shown below:
GC profile of a creating a large Go map whose keys are strings;
GC profile of a creating a large Go map whose keys are ints(ie keys are strings that are hashed into
ints);
In conclusion;
(a) The Go Garbage collector is
designed to
minimise latencies.
(b) And it seems like one of the things that can cause the Garbage collector to have long pause times are
large maps that
have strings(any pointer really) as keys.
(c) And it seems like one of the ways to solve the issue in (b) above is to hash the strings into ints and
then using the
ints as keys, (You would of course have
methods
to Query the map using your earlier strings)
I'm left wondering, shouldn't the Go runtime then not do it for me/you by default? Anytime the runtime sees
a map that is
larger than X and the keys to that map contains pointers, the runtime inlines(for lack of a better term)
that map into one whose keys are ints(or whatever) and does the string->ints hashing for you without you
even noticing.
Maybe it is just my naivety...