How to propagate context without cancellation. (23 Aug 2023)
note: This blogpost was initially written on
Tyk's blog. It's reprinted
here for posterity, but the Tyk post is still the canonical reference.
Tyk Cloud is a fully managed service that
makes it easy for developers to create, secure, publish and maintain APIs at any scale, anywhere in the
world.
Whenever customers sign up for the service, we send them an email welcoming them onboard. We emit logs to
enable our developers to troubleshoot if any error occurs in the email-sending functionality (network
timeouts, etc.).
To improve the developer debugging experience, we added corresponding request-scoped values like request-ids
to those logs. The way you typically do that in Go is by using
context.Context. Maintaining a consistent
understanding of the context is essential as data flows through various stages. This helps to ensure
accurate processing, error handling, logging, and other operational aspects.
While this information usually has a timeframe for the service to respond (after which it is cancelled), we
often need to send this flow of data and context between different components or stages of a system before
it is interrupted or prematurely terminated.
This blog looks deeper at propagating context without cancellation, why it's important, and how we found a
solution.
Into the context we go
Initially, our code looked like this:
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Create a subsription, etc
// Send email to the customer.
go sendEmail("accountID", "subscriptionPlan")
}
func sendEmail(accountID, subscriptionPlanName string) {
ctx := context.Background()
ctx, cancel := context.WithTimeout(ctx, 120*time.Second)
defer cancel()
// Call a third-party email sending service.
err := thirdPartyMailService(ctx, accountID, subscriptionPlan)
if err != nil {
log.Error("Failed to send email.", err)
}
}
The
OnboardAccount http handler is called when someone signs up as a customer on
Tyk Cloud. It does several things - synchronously - like
creating a subscription, creating an organisation, etc. and eventually sends a welcome email to the customer
asynchronously.
As mentioned, we wanted to update the code so that
sendEmail will take in a
context.Context as a parameter.
We would then pass in a
http.Request.Context when calling
sendEmail; this way, we could have richer logs emitted in the
sendEmail function since they
would now
contain request-scoped values(request-ids, etc.) for each specific request.
We updated the code to:
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Create a subscription, etc
// Send email to the customer.
go sendEmail(r.Context(), "accountID", "subscriptionPlan")
}
func sendEmail(ctx context.Context, accountID, subscriptionPlan string) {
ctx, cancel := context.WithTimeout(ctx, 120*time.Second)
defer cancel()
// Call a third-party email sending service.
err := thirdPartyMailService(ctx, accountID, subscriptionPlan)
if err != nil {
log.Error("Failed to send email.", err)
}
}
Soon after, we started seeing these errors in our services' logs:
"RequestID=Kj24jR8LQha, Failed to send email. context canceled"
It was great to see that logs now contained the relevant request-scoped values like RequestID, but what's up
with that context cancelled error?
This happened for almost every call of
sendEmail, which was surprising since we were using a
substantial context duration when calling
thirdPartyMailService. This value has served us very well
in the past. We established that the third-party email SaaS systems were healthy and experienced no
downtime.
After a cup of coffee and proper scrutiny of the new code, we zeroed in on this line:
go sendEmail(r.Context(), "accountID", "subscriptionPlan")
The problem was that the context,
r.Context(), is
scoped to the
lifetime of the http request. Thus,
this
context would get cancelled as soon as the
OnboardAccount http handler returns. Since the
sendEmail
call is
running in a goroutine, it is possible that it would run after
OnboardAccount has returned (and by
extension, the context would already be cancelled.)
Here is a small stand-alone reproducer of the issue:
func main() {
OnboardAccount()
}
func OnboardAccount() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go sendEmail(ctx)
fmt.Println("OnboardAccount called")
}
func sendEmail(ctx context.Context) {
fmt.Println("sendEmail called")
ctx, cancel := context.WithTimeout(ctx, 120*time.Second)
defer cancel()
fmt.Println("sendEmail, ctx.Err(): ", ctx.Err())
}
go run -race ./...
OnboardAccount called
sendEmail called
sendEmail, ctx.Err(): context canceled
We reverted the code change and began looking for permanent solutions to the problem.
We need to
propagate a
context's values without also propagating its cancellation.
Solution space
Surely, someone else in the Go community must have experienced similar issues? It turns out, this was not an
uncommon problem; there was even an existing
Go proposal suggesting to fix the issue in the
standard
library. At the time, that proposal had not yet been accepted, so we had to look for alternative solutions
to the problem.
There are multiple [
1] [
2] [
3]
third-party packages that implement
context.Context, which you can
propagate without cancellation. Most of those were
Go internal packages , which we could not
import.
We thus created a small library in our application that offered this functionality and updated our code to
utilise it:
import "our/pkg/xcontext"
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Send email to the customer.
go sendEmail(
// Propagate context without cancellation.
xcontext.Detach(r.Context()),
"accountID",
"subscriptionPlan",
)
}
This fixed the issue.
And, there's more good news; the aforementioned Go proposal has since been
accepted and implemented,
and it
is available in Go v1.21 that was released in early August 2023. With the release, this is how you can use
the newly added API:
import "context"
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Send email to the customer.
go sendEmail(
// Propagate context without cancellation.
context.WithoutCancel(r.Context()),
...
)
}
Now, a question remains - what if someone forgets to use
xcontext.Detach or
context.WithoutCancel?
Wouldn't it be better to have a linter for this scenario? I enquired on
gophers-slack whether anyone knew of
one; nothing seemed available.
Soon after, Damian Gryski added this
linter
to his awesome
repository. Go Damian! I sent him this small
bug
fix, here.
So, there you have it. This repository is your current best bet for
catching the issue of propagating
without context. If you're interested in checking out Tyk Cloud, you can start a free trial now - you'll be
ready to go in just a
few
minutes.