07 March 2022

Make Your Go Code More Reliable with the Retry Pattern

As a programmer, you will deal with distributed systems sooner or later. A simple web app is already an example of a distributed system. The browser is a node, and there is also a node serving the web app to the browser. The web app most likely communicates with some backend API, a node as well. And more often than not, the backend needs a form of persistent storage: a database, which is also a node in the distributed system.

In the cloud-native computing world, almost all systems are distributed in some form or way. A distributed system is a system that is composed of nodes that communicate with each other. A node can refer to a physical device such as a mobile phone or server, but it can also be a software process such as a browser.

As a programmer, you will deal with distributed systems sooner or later. A simple web app is already an example of a distributed system. The browser is a node, and there is also a node serving the web app to the browser. The web app most likely communicates with some backend API, a node as well. And more often than not, the backend needs a form of persistent storage: a database, which is also a node in the distributed system.

Distributed systems offer advantages like higher availability, efficiency and scalability. But just like life, it is not all sunshine and rainbows. The following quote by Leslie Lamport, a computer scientist well known for his work on distributed systems, summarises it perfectly:

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.

There are several assumptions that most programmers make when first working with a distributed system. Peter Deutsch formulated these assumptions as The Eight Fallacies of Distributed Computing:

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn’t change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Hardening Your Code

One of your primary goals is to write clean code that works and deliver reliable services as a programmer. Or at least it should be, in my opinion. But how can we make sure the applications we write are reliable when they run in unreliable environments. The services that your application depends on can be unreliable.

We can use stability patterns in our code to make it more reliable. This article discusses one of those patterns: the retry pattern. The programming language of choice in this article is the Go programming language. However, you can apply the pattern in your favourite programming language.

Retry Pattern

Let’s go back to the web app example and use our imagination to add fictional details. Let’s say the company you work for develops a to-do list manager (yes, not very imaginative). This app has a feature where users receive an email with a PDF report summarising their completed tasks.

A third-party service handles the generation of these PDFs. Sadly the service is often plagued by transient faults. A transient fault, also known as a transient error, has an underlying cause that resolves itself. The third-party service is down now and again but always recovers quickly.

As usual, there is no time and money to switch to a less error-prone service. Yet, you are tasked with making this service more reliable since the users seem to be quite fond of it. Luckily the retry pattern offers a straightforward but effective solution to your woes.

Let’s dive straight into the code and then digest it piece by piece. The retry pattern can be implemented as follows:

type Effector func(context.Context) (string, error) func Retry(effector Effector, retries int, delay time.Duration) Effector { return func(ctx context.Context) (string, error) { for r := 0; ; r++ { response, err := effector(ctx) if err == nil || r >= retries { // Return when there is no error or the maximum amount // of retries is reached. return response, err } log.Printf("Function call failed, retrying in %v", delay) select { case <-time.After(delay): case <-ctx.Done(): return "", ctx.Err() } } } }

There are two essential components to the pattern. The Effector type defines a function signature, and this function interacts with the third party service. The Effector can take any form you want it to have. For the demonstrative code, we keep it simple.

The second part is the Retry function. This function accepts an Effector and returns an anonymous function with the same signature as the Effector. It essentially wraps the received function with the retry logic.

The function accepts three parameters: an Effector, an integer describing how many times the function retries the passed Effector and the delay between the retries. Often with the Retry pattern, some form of backoff algorithm is implemented that increases the delay between each retry. For brevity, this is left as an exercise for the reader.

Anonymous Functions

The last section mentions the term anonymous function. Let’s sidestep for a second and discuss what that is and how we can use it. We can declare named functions at the package level in Go, and an example of that is the Retry function.

Another option to declare a function is to use a function literal. A function literal is written like a function declaration without a name following the func keyword. The value of this expression is called an anonymous function.

The most important thing about anonymous functions is that they can access the entire lexical environment. In other words, the inner function can use variables of the enclosing function. For example, in the Retry function, the inner anonymous function has access to the parameters of the enclosing function. A more straightforward example is the squares function:

// squares returns a function that returns // the next square number each time it is called. func squares() func() int { var x int return func() int { x++ return x * x } }

The function returns an anonymous function. A local variable x is created, and an anonymous function is returned when the function is called. Each time the returned function is called, it increments x and returns its square.

The squares example demonstrates that function values are not just code but can have a state. Function values like these are implemented using a technique called closures.

Applying the pattern

Let’s get back to the Retry pattern. To use the pattern, we need to implement a potentially failing function. Remember that the signature of this function needs to match the Effector type. In the example below, GetPdfUrl emulates our potentially failing function:

var count int func GetPdfUrl(ctx context.Context) (string, error) { count++ if count <= 3 { return "", errors.New("boom") } else { return "https://linktopdf.com", nil } } func main() { r := Retry(GetPdfUrl, 5, 2*time.Second) res, err := r(context.Background()) fmt.Println(res, err) }

Running this code prints the following:

2022/03/06 13:14:40 Function call failed, retrying in 2s 2022/03/06 13:14:42 Function call failed, retrying in 2s 2022/03/06 13:14:44 Function call failed, retrying in 2s https://linktopdf.com <nil>

Share article