Decoupling CMS frameworks with REST

By Boris Gordon on 1 June, 2017.

Default REST APIs provided by content management systems shouldn’t replace important design decisions developers need to make when building decoupled solutions.

A headless CMS anti-pattern

There is a trend in recent years by web CMS products and frameworks such as WordPress, Drupal or the various hosted CMS services, to support “decoupled” use by serving their data as JSON served over a HTTP interface. While this helps us avoid treating content management systems as a platform for building general web applications1, when it comes to their API design they tend to share the same approach, and the same flaws. CMSs that focus on merely supporting “headless” operation decouple clients and servers in terms of technology but they usually require manipulation of resources in a way that is equivalent to direct manipulation of domain objects — coupling the client to the domain model in the process.

Cartoon: Our new handheld!

One way to evaluate coupling in your system is to consider how design changes on one side of the client-server boundary affect the other side. For a “headless” CMS, concerns are usually separated between mobile or javascript applications on the client and a RESTful API on the server. Clients are concerned with how a user interacts with and flows through various screens, and the display logic to render those screens. An API server, on the other hand, should be able to service different clients, each with their own screen flows and display logic. It is not just another way to access your data layer but should provide interaction with the business rules, both application and domain specific, through a common interface, to the various clients it serves. If we are to allow for clients and servers to change at different rates then we will want to keep these business rules out of our UI and client applications as much as possible.

Identifying business rules

In CMS applications, the bulk of the domain model2 consists of a content model (content types, fields, validation rules) whose design is informed by the content structure and relationships. The administrative interfaces, like all UIs, should be primarily informed by the tasks and workflows of its users (i.e. content authors and administrators) and not just the content model. There are also UIs for consumers of the content and those that provide “user generated content” such as likes, comments, bookmarks and such. The tasks and workflows for each user on a system — authors, editors, visitors — are not always the same. So we have a content model and at least two types of UIs with different user requirements, manipulating the same content model, and yet many CMS APIs simply expose the content model over HTTP as-is, no matter which tasks the client needs to perform. This has the effect of both neglecting to provide a useful abstraction for the common tasks that UIs will carry out, while increasing coupling of clients to the specifics of the content model. Another approach is to think about the domain and application business rules in the system.

Examples of domain rules in a CMS would be:

  • Articles must always have at least 1 and no more than 3 category tags.
  • The article summary, if not provided, is automatically generated by selecting the nearest word boundary that is less than 140 characters.
  • Authors require a full name.

Business rules on a web form illustration

These rules tend to change relatively slowly as they are informed by properties intrinsic to the content itself, or by other domain knowledge such as content availability, system integration points and so on.

Examples of application specific business rules3 would be “authors need to request an article be reviewed by a nominated editor” and “an article must be approved by a reviewer before it can be published”. Such business rules will need to change more often as they directly address use cases of the system which naturally evolve at a quicker pace. While they are more likely to require a change to a client (the most obvious being when a completely new use case is added) we shall see there are ways to expose these rules as a reified process that limits the need for a client to know all the design details.

To illustrate we will compare two API designs, one which derived its resources from the content model and the other which focussed on the use cases.

Designing from the content model

Consider a typical CMS with an out-of-the-box API that exposes the content model directly. This content model includes separate domain entities for articles and category tags and the publishing workflow is implemented with properties and fields on the articles. Our clients need to be able to support our publishing workflow and editor approval process according to the example business rules above. Our first step will be to create any category tags that don’t exist yet. This will usually require a request such as POST /tags/category for each of the tags in question and possibly more requests than that if it is the client’s responsibility to check if the tag exists already. Then a request to create the unpublished article that includes references to the tags created will need to be made (POST /content/articles), which on success will provide a URL for the newly created post (e.g. /content/articles/42). Once we have an article resource we can apply a series of PUT or PATCH requests to it in order to make content changes as our author client edits the article. Finally we make another update request that sets some status property indicating the article is awaiting approval. After review, an appropriately authorised client can again PUT or PATCH the same resource with an approved status, allowing for the author to PUT or PATCH a published status to finally publish the article.

So what’s wrong with this API design? It seems reasonable, especially if you are familiar with the CMS in question and its use of the various status properties on the article to implement the editorial workflow. Sure it seems a little “chatty” but is the chattiness inherent to HTTP and REST or is it a property of the way these resources have been modelled?

We can start to reduce chatter by looking for places where clients have needless domain object knowledge, such as the fact that category tags are a separate domain object and need to exist prior to tagging an article. In this case we should be able to let the server manage the creation of the tags when dealing with an article resource. What’s worse is when a client tries to set the awaiting approval status for an article with a missing tag, we can easily end up with dangling tags. If validation fails (perhaps due to domain rules that have nothing to do with tags), the client will not be able to create the article, even though the categories have already been created. This inconsistent state might lead some clients to present empty category pages for these tags, unless they explicitly handle this possibility. It should not be possible for clients to create such inconsistent domain states nor should they need to defend against it.

A less obvious flaw is that as new use cases are added to the system, there will be an increase in the size of our REST resources and hence any API payloads. Remember /content/articles/42 will be the resource used to get the content for our newly minted article after it is published. As we layer on more use cases, such as new approval workflows, we are not only adding new status codes similar to awaiting approval and published, but also any metadata that goes along with those use cases. Think metadata such as: an approval timestamp; references to approving editor; or references to review threads representing the conversation between an editor and an author during the approval process. Does a client that is simply rendering the title and summary of an article in a list need to know what date the article was approved by an editor? Some administrative interfaces used by editors might, but not all clients will. So we have the problem that our API payload size increases as use cases that deal with our domain objects are added, even when information about those use cases are not needed by existing clients. So are these bulky and ever increasing payloads also a property of the REST approach or is it something that can be addressed by a different API design?

Designing from the use cases

It turns out we can do better, by raising the level of granularity of the [resources]4 that make up the API, and by representing some of the application processes as resources in their own right5. By doing this we protect clients from needing to know the domain rules in order to interact with the business process, resulting in smaller payloads and less HTTP requests at the same time. We do this by driving our resource design with the use cases6,7 and the actual needs of the clients rather than the specifics of the content model. Using a design process that follows this approaches might result in the following command type resources to go along with the query resource (/content/article) we already identified:

  • POST /drafts (for creating a draft - can include category tags in the representation and let the server be in charge of managing the domain objects and avoiding inconsistent states)
  • POST /approvals (for submitting a draft for approval)
  • PUT /approvals/{APPROVAL_ID} (for approving a draft for publication - this PUT could even be avoided if we want to but we will leave that as an exercise)
  • POST /articlePublications (to schedule an article for immediate publication)

So we are separating the resources that model the commands operating on articles, such as creating drafts, requesting approvals, and approving or publishing, from the resources that allow simple queries for the state of those articles8. This separation also gives us separate resources for clients to query for metadata relating to those commands. For example the resources /approvals/12 or /author/{AUTHOR ID}/approvals could be queried by an author’s dashboard client. By doing this we are addressing the problem of chattiness (a single POST to create a draft, request approval or publish). We are also addressing the payload size creep by having separate queryable resources that change and grow such that existing clients that just want to access the article content don’t need to deal with ever increasing metadata related to new business processes.

A process for success and the CMS framework landscape

In future posts I will describe a simple design process that helps you identify resources and avoid the one-to-one mapping to entities in your CMS content model. This process is even useful for command light and query focussed systems. Depending on your CMS framework and the tools it provides for building HTTP APIs, you might be limited in your ability to model your resources in this fashion and may need to build your REST API as a separate web service in front of your CMS driven web application. We will see however that by leveraging the JSON:API media type (which is a REST media type specification appropriate for many kinds of web applications), we can often get slimmer payloads and fewer HTTP requests for free. This specification has broad tooling support and has even started to appear in some CMS frameworks directly.