Logging without losing money or context. (26 May 2020)
Problem In your application, you want all flows of execution to be covered with log statements. That way, if something were to go wrong, you can be able to trace at which point the flow broke at. However, this presents a problem; if your application has lots of traffic, then the amount of logs it generates are going to be gigantic. This in itself is not a problem, were it not for the fact that you are using a logging as a service provider and they all charge an arm and a leg for every log event. Some of the pricing models are so inscrutable, that the logging service providers offer calculators ( here, here, etc) to try and help you figure out what should ideally have been kindergaten arithmetic. So it seems like for you to have your cake and eat it too, you have to part with large sums of money every month(and due to elastic pricing you can't even tell in advance how much that will be.) There are two main ways that people try and solve this problem(usually at the suggestion of the said logging service providers);- Filtering logs by severity and only sending logs above a certain severity to the logging service provider. and/or
- Sampling logs so that you only send a certain percentage to the logging service provider.
func main() {
updateSocialMedia("Sending out my first social media status message")
}
func updateSocialMedia(msg string) {
traceID := "sa225Hqk" //should be randomly generated per call
logger := logrus.WithFields(logrus.Fields{"traceID": traceID})
tweet(msg, logger)
facebook(msg, logger)
linkedin(msg, logger)
}
func tweet(msg string, logger *logrus.Entry) {
logger.Info("tweet send start")
// code to call twitter API goes here
logger.Info("tweet send end.")
}
func facebook(msg string, logger *logrus.Entry) {
logger.Info("facebook send start")
err := facebookAPI(msg)
if err != nil {
logger.Errorf("facebook send failed. error=%s", err)
}
logger.Info("facebook send end.")
}
func linkedin(msg string, logger *logrus.Entry) {
logger.Info("linkedin send start")
// code to call linkedin API goes here
logger.Info("linkedin send end.")
}
If we were filtering logs and only sending logs of ERROR level to our logging service provider, then we
would lose context on how the facebook send failed error came to be. I had previosuly written
that logs are primarily used to help debug issues in
production; thus,
context and chronology of events that led to a particular issue are of importance. You do not want to
investigate a murder mystery where half the clues have been deliberately wiped out by your earlier
self.
In the same way, if we were sampling logs; the chronology leading upto the error would be missing a few INFO
log statements since those would have been sampled out.
What we want is;
If the application has not emitted any errors, no logs gets sent to our logging service provider(INFO or
otherwise.) However, if there are any errors emitted; all the logs leading to that error irrespective of
their severity/level are sent to the logging service.
Proposed Solution
I think we should be able to implement such a logging scheme. The basic idea is; whenever your application
emits log events, all the logs go into a circular buffer of size X.
Whenever an error log is emitted, the whole circular buffer is flushed and all its contents are sent to the
logging service provider.
The circular buffer can be in memory or on disk/whatever and the size is configurable.
I took a stub at implementing this using sirupsen/logrus which is a popular logging library for the
Go programming language, but the implementation
should be transferable across libraries/languages.
In sirupsen/logrus, you can declare a hook implementing the custom
behaviour that you want.
package main
import (
"io"
"github.com/sirupsen/logrus"
)
// hook to buffer logs and only send at right severity.
type hook struct {
writer io.Writer
// Note: in production, lineBuffer should use a circular buffer instead of a slice.
// otherwise you may have unbounded memory growth.
// we are just using a slice of []bytes here for brevity and blogging purposes.
lineBuffer [][]byte
}
// Fire will append all logs to a circular buffer and only 'flush'
// them when a log of sufficient severity(ERROR) is emitted.
func (h *hook) Fire(entry *logrus.Entry) error {
line, err := entry.Bytes()
if err != nil {
return err
}
h.lineBuffer = append(h.lineBuffer, line)
// if the current log event is of level ERROR or a higher severity,
// then flush the buffer; thus emitting all the logs.
if entry.Level <= logrus.ErrorLevel {
var writeError error
for _, line := range h.lineBuffer {
_, writeError = h.writer.Write(line)
}
h.lineBuffer = nil // clear the buffer
return writeError
}
return nil
}
// Levels define on which log levels this hook would trigger
func (h *hook) Levels() []logrus.Level {
return logrus.AllLevels
}
And the way to use it in your application is;
package main
import (
"errors"
"io/ioutil"
"math/rand"
"os"
"time"
"github.com/sirupsen/logrus"
)
func main() {
// send logs to nowhere by default
logrus.SetOutput(ioutil.Discard)
logrus.SetFormatter(&logrus.JSONFormatter{})
// Use our custom hook that will append logs to a circular buffer
// and ONLY flush them to stderr when errors occur.
logrus.AddHook(&hook{writer: os.Stderr})
updateSocialMedia("Sending out my first social media status message")
}
Now, if any error occurs; all the logs and chronology leading upto the error are available and are sent to
the logging service provider.
go run .
{"level":"info","msg":"tweet send start","time":"2020-05-25T21:03:36+03:00","traceID":"sa225Hqk"}
{"level":"info","msg":"tweet send end.","time":"2020-05-25T21:03:36+03:00","traceID":"sa225Hqk"}
{"level":"info","msg":"facebook send start","time":"2020-05-25T21:03:36+03:00","traceID":"sa225Hqk"}
{"level":"error","msg":"facebook send failed. error=http 500","time":"2020-05-25T21:03:36+03:00","traceID":"sa225Hqk"}
Conversely, if there are no errors been emitted by your application then
no logs are sent to your logging service provider.
go run .
# no log output
So you do not end up spending tons of money and you also do not lose context when errors occur.
Conclusion:
You can implement a logging strategy that loses you neither money nor context.
Indeed, this is how logging in the Apollo Guidance Computer worked:
Another function served by these auxiliary erasable memories was called the "Coroner,"" which recorded every instruction executed, with its inputs and results, writing over the oldest such record when it filled up. When a program "crashed," you could punch out a full record of what it was doing in most of its last second and analyze the problem at your ease. I have often wished that PCs offered such an advanced feature. - Annotations to Eldon Hall's Journey to the Moon. Chapter 13, page 155A downside of the presented solution is that it can be hard to tell if there are no logs because the application has not emitted any errors or because the logging pipeline itself has a bug. However this can be solved by emitting a heartbeat log event every Y minutes and letting this heartbeat propagate upto the logging service provider. All the code in this blogpost, including the full source code, can be found at: https://github.com/komuw/komu.engineer/tree/master/blogs/09 You can comment on this article by clicking here.