11 March 2021

Generate a Sitemap With Next.js and TypeScript

As a React developer, you have most likely heard of Next.js by now. It is an excellent React-based framework for server-side rendering and generating static websites. It comes with numerous features out of the box. However, it does not provide functionality regarding sitemaps. Luckily with very little code, one can add such functionality to any Next.js project. This article will show how to achieve this.

What Is a Sitemap?

First things first, let's refresh our memory about what a sitemap is. A sitemap is a file where you provide information about the pages on a website. A sitemap is useful for search engines to more intelligently crawl your website.

When a website contains a numerous amount of pages, it is appropriate to provide a sitemap. Another use case where sitemaps help out is orphaned pages. An orphaned page is a page that is not referenced anywhere else on a website. Typically orphaned pages are not found by a crawler.

The crawler will go through the website and open the links it finds. If the page is not linked anywhere, it is not indexed. A sitemap tells a search engine that these pages exist, making sure those pages get found during your website's crawling.

It is good to note that using a sitemap does not guarantee that all the items in your sitemap will be crawled and indexed. The inner workings of search engines rely on complex algorithms which aren't public knowledge. However, in most cases, your site will benefit from having a sitemap, and you are never penalised for having one. You can learn more about sitemaps here.

The Structure of a Sitemap

There are multiple file formats that you can use for the sitemap file. The XML format is the most commonly used. Therefore, in this article XML is used for the implementation of the sitemap. The XML for a sitemap needs to adhere to a specific schema. You can find an example for a sitemap in the following code snippet.

...waiting for Gist...

Precise documentation of the protocol can be found here. Not all children of the url element are required. Only the loc element is required, which describes the URL of a webpage. The rest of the elements are optional. If you are not sure what is meant by child and element, you can read more about XML here.

Sitemaps in Next.js

So how can a sitemap be generated in Next.js? There are multiple ways to go about this. As with most problems, there are already existing solutions. If you want you can use an npm package such as nextjs-sitemap-generator or next-sitemap to do the job for you.

But where is the fun in that? I try to keep my project dependencies at a minimal. Not to mention that building your own implementation results in more experience and knowledge of the tools you use.

The recommended way to use Next.js is static page generation. If a page uses static generation, the markup of the page is generated at build time. We can choose to build the sitemap only at build time as well. That would be an excellent solution if the website is genuinely static, and the number of pages does not change.

But let's account for the scenario of a website where occasionally pages are added, for example, a blog or an e-commerce website. In that case, it is useful to have the sitemap change dynamically whenever new pages are added.

Implement the Functionality

The sitemap will be generated dynamically on each request using server-side rendering. Before the sitemap can be rendered the pages that reside on the website need to be known. In Next.js pages are based on the files contained in the pages directory. Each page is associated with a route based on its file name. More about the routing of Next.js can be found here.

An option would be to traverse the pages folder using Node's file system module and parse the filenames manually. However, one might expect that Next.js stores the routes it parses during build time somewhere. This is indeed the case; the routes are stored in the build folder in the .next\server\pages-manifest.json file.

This will prove quite useful. Let's go ahead and create a file called sitemap.tsx in the pages folder. This file will be responsible for rendering the sitemap. As discussed, server-side rendering is used. Therefore this file will need to export a function called getServerSideProps, as documented here.

When reading the documentation of this function it can be observed that it receives a parameter called context. One of the properties of this object is res, the Node response object. This object contains functions that allow to set the header of the HTTP response and directly write to the response body. These function can be used to set the Content-Type header to XML and write an XML string to the body of the response.

...waiting for Gist...

In the code above the necessary modules are imported first. These are the filesystem and path modules of Node, which will be used for reading the manifest files. Please note that these are server-side modules. Such modules can be safely used in the getServerSideProps function, because imports used in that function are not included in the client-bundle of Next.js

The file system module is used in another function later on. If a function is declared and not used in getServerSideProps it is included in the client bundle. This causes problems because the file system module itself will not be included in this bundle, yet a function that uses it is. When the project is build this absence causes a Module not found: Error.

...waiting for Gist...

Adding the above code to the next.config.js remedies the issue. An incredibly useful tool to examine what code ends up in the client bundle can be found here. A thorough explanation of code splitting in Next can be found here.

Next up a type called Url is defined. This type describes the shape of an object that represents an entry in the sitemap. Then a react component called Sitemap is defined. This component is empty, because in getServerSideProp the end method on the response object is called. Calling this method signals the server that all of the response headers and body have been sent; that server should consider this message complete.

Although seemingly useless, the definition of the component cannot be left out, because the getServerSideProps function needs a component to attach to. If the component is not included in the file the build process of Next.js once again throws an error: getServerSideProps can not be attached to a page's component and must be exported from the page.

Before we dive deeper into the details of the function, let's think about what steps are needed to achieve the end goal. The steps of the procedure are as follow:

  • Read the manifest file from the file system and obtain a JSON object.
  • Parse the contents of the JSON and retrieve a collection of URLs.
  • Retrieve dynamic route URLs from the build folder output files.
  • Create XML sitemap string based on URLs

Each step is implemented in a separate function. Reading the manifest file is done in the function ReadManifest. This function returns the JSON from the manifest, which the function GetPathsFromManifest uses as an input to build a collection of URLs. Only static URLs can be gathered from the manifest, however GetPathsFromManifest accounts for the dynamic URLs.

The function GetSitemapXml implements the last step of building an XML string based on the URLs. The following sections of the article discuss each specific step. XML string adhering to the sitemap standards, based on all the gathered URLs.

In the code above, all the function calls for each step can already be seen. What also can be seen in the code but which is not discussed yet is the excludedRoutes array. These paths are defined and filtered on because custom error pages are also present in the manifest. If we do not exclude the custom error page for a 404 page, it ends up in the sitemap. This is not wanted since that page actually returns a 404 HTTP code.

Read the Manifest File

The manifest file location needs to be known if it to be read from the file system. The relative path is known. However, to obtain the absolute path to the file, the Node process's working directory also needs to be known. This value is retrieved by calling the process.cwd function. Since this value is also needed later on in a different function, it is already bound to the basePath variable.

In ReadManifest the reading of the manifest is implemented. The path and filesystem modules of Node are used in this function. As explained earlier Next.js saves the routes it parses in the pages-manifest json file. This json file contains a list of key-value pairings.

The key represents the route and the value represents the path to the file where the component for that route is defined. This path points to the compiled file in the client bundle, so if you have your components defined in .tsx files, the paths in the manifest will have .js extensions nonetheless.

First the file path to the manifest is built by combining the basePath function parameter with the path to the file.

...waiting for Gist...

The existence of the file is checked before the function tries to read it. If the file does not exist the function returns the primitive null value. If it does it reads the file synchronously and parses the content using the built-in JSON object. The JSON.parse function returns an object, which is also returned by the ReadManifest function.

Get URLs From the Routes

The ReadManifest functions returns an object. For the sitemap a list of urls is needed. Getting this list from the object is done in GetPathsFromManifest function.

...waiting for Gist...

So what is going on in this function? First an empty string array is initialised which will contain the routes. Then the function loops through all the routes in the parsed manifest object. The keys of the object correspond to the routes.

There are two kinds routes in Next.js that are special. Dynamic routes segments which are used to create dynamic pages and routes that start with an underscore. Dynamic routes contain square brackets. Routes that start with an underscore are used to overwrite certain default behaviour of Next.js. For example using a custom App component.

These routes should not be included in the sitemap. The routes will be filtered using the isNextInternalUrl function. This function is implemented as follows:

...waiting for Gist...

The function uses a regular expression to find the special routes of Next.js If you are unfamiliar with regular expressions, you can read up on them using various online resources.

Dynamic Routing

A great deal of what makes Next.js attractive to use is its ability to build pages with data from all different sources dynamically. The sitemap should most definitely contain those page at well.

The dynamically generated pages are the pages in the manifest that contain square brackets. In the previous functions, these are filtered out and not taken into consideration.

Next.js has two forms of pre-rendering: static generation and server-side rendering. This section introduces a method to include statically generated pages in the sitemap. It does not discuss dynamic routes that use server-side rendering, as there is no one size fits all solution for such cases.

For statically generated pages, the data is fetched at build time. The logic for this data fetching is defined in the getStaticProps function. When a file defines this function, it also needs to define getStaticPaths, which returns all the statically pre-rendered routes.

When a statically rendered page is pre-rendered at build time, in addition to the page HTML file, Next.js generates a JSON file holding the result of the getStaticProps function. These files are stored in the following folder: .next/server/pages

For example, a blog made with Next.js has article pages defined by blog/[slug].tsx. A list of possible values for slug is defined in getStaticPaths. For each of these values, the page's properties will be gathered at build time and saved in a JSON file. If there is an article with the slug how-to-start-with-react, there will be a file: .next/server/pages/blog/how-to-start-with-react.json.

The .next/server/pages can be traversed to read all the filenames of the JSON files. In this fashion, all the know routes for all the statically generated pages are gathered. The next section discusses how you can implement this traversal.

Traverse StaticallyGenerated Page Properties

The GetPathsFromBuildFolder function is responsible for traversing the saved properties of statically generated pages. These properties are stored in JSON files in the server/pages folder.

One thing to keep in mind is the depth of the possible dynamic URLs. For example, you can create a file called pages/posts/[id].js the generated pages will reside in a folder called pos ts in the server/pages folder.

Yet, there can be pages that are not nested in a folder. The function needs to account for both of these possibilities. Basically, the directory needs to be traversed fully. This is an excellent case to use recursion.

We have established that GetPathsFromBuildFolder is a recursive function; let's dive into the code!

...waiting for Gist...

As seen in the code above, the function has four parameters. The parameter dir describes the directory that is being traversed in a specific function call. The urlList stores all the URLs gathered. The host describes the host part of the URL that is put into the sitemap. Last, the basePath describes the working directory of Next.js, as discussed before.

First, the function reads all the elements in the directory's contents described in the dir parameter. Then for each item in this directory, it checks whether or not it is a directory itself. If it is, the function will be recursively called for that directory.

If the item is not a directory and the item extension is JSON, we regard it as a route. The extension and basePath are removed, and the route is added to the urlList.

Build the Sitemap

At this point, all the URLs are gathered. There is only one thing left to do: generate an XML string and return this in the response. This is a trivial task compared to the actual gathering of the routes.

The GetSitemapXML function implement the functionality using template literals. All the URLs are mapped to the appropriate XML by the GetUrlElement function. They are implemented as follows:

...waiting for Gist...

Conclusion

This article has shown you how to implement sitemap functionality for a Next.js website. Thank you for reading. Many improvements can be added to the functionality, such as caching and reading the saved properties to include lastModified XML elements.

If you integrate the code into your project, do not forget to make your sitemap available to Google. There are multiple ways to do this, you can read more about that here.


Share article