WordPress

Getting gutenberg block data in the REST API

WordPress core added the block editor in version 5.0.0, changing fundamentally how WordPress stores and displays elements in post (page) content. The block editor, named gutenberg, is written using react and uses the WordPress REST API to interact with data in WordPress. So you maybe forgiven to believe that it would then be easy to interact with blocks and block data (attributes) via the REST API. However it is not, but to understand why, first you must understand how block data is stored. 

Take the following example. 

<!-- wp:button {"className":"is-style-outline"} -->
<div class="wp-block-button is-style-outline"><a class="wp-block-button__link" href="#">test</a></div>
<!-- /wp:button -->

This example shows how a the core/button block is stored in post content in the database. Blocks are represented as mixture of html comments (with a json object) and html tags. A block is defined with the starting comment 

<!– wp: . Blocks can either have an opening and closing tag (like the button block above) or a single line like the following search block example. 

<!-- wp:search /-->

Block data is used to define how the block behaves and renders. Block data can be stored in one of two ways. First, as a json blob in the html comment, see the className in the core button in the above example. Handling this data, once parsed, is pretty simple in javascript or in other systems that can modify json. The second way of storing data is as part of the html markup inside the block. Take for example how the button block is defined. 

{
    "name": "core/button",
    "category": "layout",
    "attributes": {
        "url": {
            "type": "string",
            "source": "attribute",
            "selector": "a",
            "attribute": "href"
        },
        "title": {
            "type": "string",
            "source": "attribute",
            "selector": "a",
            "attribute": "title"
        },
        "text": {
            "type": "string",
            "source": "html",
            "selector": "a"
        },
        "backgroundColor": {
            "type": "string"
        },
        "textColor": {
            "type": "string"
        },
        "customBackgroundColor": {
            "type": "string"
        },
        "customTextColor": {
            "type": "string"
        }
    }
}

As you can see from this block definition, attributes like url, title and text are all stored in the html of the block. Data can be stored in the contains of the html tags, meaning everything that appears after the close of the opening tag and before the end of the tag. Or data can also be stored in attributes of the tag, such as the href or title. This can make working with block data extremely difficult. 

Processing block data

When rendering blocks in WordPress in the front end (theme) on for example a post page, the PHP will parse the html that contains blocks (post content). Core does this, as blocks can have registered script associated with them, which are defined using register_block_type. The function parse_blocks uses regex, to find the block definition and return an array of blocks. From there, it loops around each block, and enqueues the registered javascript libraries. Each element in the array, as a field called attributes, this contains all the fields and data defined in the html comment (json blob). However, what is missing from these attributes is the data found in the html attribute / tags. At the time of writing (in WP 5.2), core doesn’t parse these fields and return them as usable data. To be able to extract data from html, PHP would have to parse the html of the block and navigate the doc tree generated using css selectors, to get the tag and field where the data is stored. This would be a resource intensive process and result in false positives. Even if this process wasn’t resource intensive, at time of writing it is  not possible to process all blocks. Many blocks both core and ones defined in plugins are only defined in the javascript, meaning the PHP is completely unaware of the block and associated block data. 

Defining block data 

As part of an going project to be able to use gutenberg in the WordPress mobile app, the gutenberg team have been working on a RFC to define block structure. This project hopes to define the structure of a block as a platform independent json file. This file can then be used by both the javascript and PHP, to define each block and make so the both the server and front end are aware of all blocks. Once the blocks are defined, the core team plan to make two new REST APIs. These new apis are documented in this post

Fetching the available block types through REST APIs.
Fetching block objects from posts through REST APIs.

Once these apis exists, headerless applications like the WordPress mobile app or a headerless frontend written in react or a similiar framework, could much easier use block data to render. 

It is likely that it take some time to get all existing blocks converted to this new style and will require efforts from developers to do so. It is likely that functions required to register the block in PHP, will not be merged until 5.3 at the earliest. 

Accessing block data now

What if you need access to this block data now and you can’t wait? Well, you are in luck, there is a proof of concept plugin, called wp-rest-blocks written to expose this data. Once installed and activated, this plugin adds two field to the post / page REST API endpoints. These fields are ‘has_blocks’ and ‘blocks’. Has block is a boolean, the denote, if the post content contains blocks and the blocks field, is an array of blocks. An example of the output is displayed below. 

{
    "blockName": "core/heading",
    "attrs": [
       
    ],
    "innerBlocks": [
       
    ],
    "innerHTML": "\n<h2><strong>Flight Deals to Tianjin</strong></h2>\n",
    "innerContent": [
      "\n<h2><strong>Flight Deals to Tianjin</strong></h2>\n"
    ],
    "rendered": "\n<h2><strong>Flight Deals to Tianjin</strong></h2>\n"
  },

As detailed above, it is not possible to get all block data currently. But this plugin does get existing block data found in the html comment which in many cases would be enough. But there is also a “rendered” field, that has the fully rendered block. This is specially useful for dynamic blocks. If the block uses lots of html attibributes to store block data, the innerContent or rendered fields, can be used to extract this data. The api, even supports nested inner blocks, returning an array of the inner blocks, in each block recursively. 

I have also made an effort to try and extract block data of core blocks. Current I have mapped data of the following core blocks. 

  • Image
  • Gallery
  • Heading
  • Paragraph
  • Button
  • Video 
  • Audio
  • File

There is even some unit test coverage, in case there are some breaking changes in core in the future. This plugin is in the very early stages, but maybe a good starting point for anyone wishing to build a front end using headerless WordPress. 

I was a guest on the WordPress weekly podcast

I had the honour of being invited to talk on the hosts of the WordPress weekly podcast this week (episode 356). I have been listening to the podcast for years and been friends with JJJ (the co host) for a while.

I talked about on many topics and spoke a little bit about how I got into the industry. In the podcast I mention the RFCs the gutenberg team have released. There is an “Add the block registration RFC” and “Add Blocks in Widget Areas RFC“. The widget area RFC, is one where I wished more developers had commented and fedback.

Making user queries in WordPress scale

As one of the maintainers of the users component in WordPress core, the question of how well sites with lots of registered users scale, is an important one to me. The idea of sites with hundreds of thousands of register user is make more complex when you add multisite into the mix.

So for a client project an educational site I had built a site that required users to register and login to get access to more lessons. This site became pretty popular, getting over 100,000 registered users in the first 3 months of the site going live. But I noticed one interesting thing, many of the users screens in the CMS, became unusable. This is unfortunately because of how user roles are stored in the WordPress database. User role are stored in user meta, in a serialized array. Meaning to query users by their role (something core does in a number of places), core has to query on the user table, with a join to the user meta that does a like search on an unindexed field (meta value). As you might have guessed, the result of that queries is extremely slow and when you have over 100k users, unless there is massively powerful database server running the query, that query will simply times out.

So, I had tracked down the badly performing queries, what are the next step to fixing them? Well, first, I needed to review what core functions / classes accessed the users tables and what were the caches were in place. After hours of reviewing core code, I realised one thing. User queries are a mess. Unlike almost all other parts of core, there were so many places in core that were still using raw SQL instead of using the WP_User_Query class. There were many places that should be using caches, but weren’t. There were places that weren’t even using the user cache invalidation function.

So I set-able fixing a number of these issues. These issues include

There were others and are I am still working on more. Props to the other members of the core team, specially Adam Silverstein .who worked with me to get all these tickets into WordPress 5.1 and 5.2. But what this means in the end, that is much more constancy on how User data is queried (using WP_User_Query where possible) and how it cached. This by itself it a massive improvement, but I decided push it a little more. I built a feature plugin for core called WP User Query Cache. This plugin use new filtered that were added in 5.1 to cache the result of the query, so that those expensive user queries, need only be run once and the cached. One of the biggest issues, is that sites with lots of users, will likely have those caches invalidated by other users updating their profiles and new users registering. This is not something that is easily solved, however, some level of caching is better than nothing, as it makes the CMS user edit screen accessible again.

If you are a site that has lots of users and have object caching enabled, please take a look at the WP User Query Cache plugin consider installing and help me test and improve this plugin. Hope to try and push for this to get into core and if it doesn’t this code can always live on as plugin.