react

Getting gutenberg block data in the REST API

WordPress core added the block editor in version 5.0.0, changing fundamentally how WordPress stores and displays elements in post (page) content. The block editor, named gutenberg, is written using react and uses the WordPress REST API to interact with data in WordPress. So you maybe forgiven to believe that it would then be easy to interact with blocks and block data (attributes) via the REST API. However it is not, but to understand why, first you must understand how block data is stored. 

Take the following example. 

<!-- wp:button {"className":"is-style-outline"} -->
<div class="wp-block-button is-style-outline"><a class="wp-block-button__link" href="#">test</a></div>
<!-- /wp:button -->

This example shows how a the core/button block is stored in post content in the database. Blocks are represented as mixture of html comments (with a json object) and html tags. A block is defined with the starting comment 

<!– wp: . Blocks can either have an opening and closing tag (like the button block above) or a single line like the following search block example. 

<!-- wp:search /-->

Block data is used to define how the block behaves and renders. Block data can be stored in one of two ways. First, as a json blob in the html comment, see the className in the core button in the above example. Handling this data, once parsed, is pretty simple in javascript or in other systems that can modify json. The second way of storing data is as part of the html markup inside the block. Take for example how the button block is defined. 

{
    "name": "core/button",
    "category": "layout",
    "attributes": {
        "url": {
            "type": "string",
            "source": "attribute",
            "selector": "a",
            "attribute": "href"
        },
        "title": {
            "type": "string",
            "source": "attribute",
            "selector": "a",
            "attribute": "title"
        },
        "text": {
            "type": "string",
            "source": "html",
            "selector": "a"
        },
        "backgroundColor": {
            "type": "string"
        },
        "textColor": {
            "type": "string"
        },
        "customBackgroundColor": {
            "type": "string"
        },
        "customTextColor": {
            "type": "string"
        }
    }
}

As you can see from this block definition, attributes like url, title and text are all stored in the html of the block. Data can be stored in the contains of the html tags, meaning everything that appears after the close of the opening tag and before the end of the tag. Or data can also be stored in attributes of the tag, such as the href or title. This can make working with block data extremely difficult. 

Processing block data

When rendering blocks in WordPress in the front end (theme) on for example a post page, the PHP will parse the html that contains blocks (post content). Core does this, as blocks can have registered script associated with them, which are defined using register_block_type. The function parse_blocks uses regex, to find the block definition and return an array of blocks. From there, it loops around each block, and enqueues the registered javascript libraries. Each element in the array, as a field called attributes, this contains all the fields and data defined in the html comment (json blob). However, what is missing from these attributes is the data found in the html attribute / tags. At the time of writing (in WP 5.2), core doesn’t parse these fields and return them as usable data. To be able to extract data from html, PHP would have to parse the html of the block and navigate the doc tree generated using css selectors, to get the tag and field where the data is stored. This would be a resource intensive process and result in false positives. Even if this process wasn’t resource intensive, at time of writing it is  not possible to process all blocks. Many blocks both core and ones defined in plugins are only defined in the javascript, meaning the PHP is completely unaware of the block and associated block data. 

Defining block data 

As part of an going project to be able to use gutenberg in the WordPress mobile app, the gutenberg team have been working on a RFC to define block structure. This project hopes to define the structure of a block as a platform independent json file. This file can then be used by both the javascript and PHP, to define each block and make so the both the server and front end are aware of all blocks. Once the blocks are defined, the core team plan to make two new REST APIs. These new apis are documented in this post

Fetching the available block types through REST APIs.
Fetching block objects from posts through REST APIs.

Once these apis exists, headerless applications like the WordPress mobile app or a headerless frontend written in react or a similiar framework, could much easier use block data to render. 

It is likely that it take some time to get all existing blocks converted to this new style and will require efforts from developers to do so. It is likely that functions required to register the block in PHP, will not be merged until 5.3 at the earliest. 

Accessing block data now

What if you need access to this block data now and you can’t wait? Well, you are in luck, there is a proof of concept plugin, called wp-rest-blocks written to expose this data. Once installed and activated, this plugin adds two field to the post / page REST API endpoints. These fields are ‘has_blocks’ and ‘blocks’. Has block is a boolean, the denote, if the post content contains blocks and the blocks field, is an array of blocks. An example of the output is displayed below. 

{
    "blockName": "core/heading",
    "attrs": [
       
    ],
    "innerBlocks": [
       
    ],
    "innerHTML": "\n<h2><strong>Flight Deals to Tianjin</strong></h2>\n",
    "innerContent": [
      "\n<h2><strong>Flight Deals to Tianjin</strong></h2>\n"
    ],
    "rendered": "\n<h2><strong>Flight Deals to Tianjin</strong></h2>\n"
  },

As detailed above, it is not possible to get all block data currently. But this plugin does get existing block data found in the html comment which in many cases would be enough. But there is also a “rendered” field, that has the fully rendered block. This is specially useful for dynamic blocks. If the block uses lots of html attibributes to store block data, the innerContent or rendered fields, can be used to extract this data. The api, even supports nested inner blocks, returning an array of the inner blocks, in each block recursively. 

I have also made an effort to try and extract block data of core blocks. Current I have mapped data of the following core blocks. 

  • Image
  • Gallery
  • Heading
  • Paragraph
  • Button
  • Video 
  • Audio
  • File

There is even some unit test coverage, in case there are some breaking changes in core in the future. This plugin is in the very early stages, but maybe a good starting point for anyone wishing to build a front end using headerless WordPress.