Site icon Hip-Hop Website Design and Development

Did Gutenberg block editor change the html in post content during import?

Which HTML is better, pre-Gutenberg or post-Gutenberg?

I imported post content from an old and large WP site into a fresh install and new database. Almost 1200 posts along with their meta and related media. The xml file is 15mb. I used the standard wordpress import/export along with a media export [plugin][1] for featured images. The origin site uses tinymce advanced to maintain the classic editor look for the client.

Most everything carried over, but in the new setup, the html of the imported content changed.

Here’s how it looked on the front-end. left is origin, right is import.

Here is how the html changed. Left is the origin. Right is after import.

At what point in the process did the entire content block get wrapped in <p>’s, with <br>’s and &nbsp’s added and original line breaks missing?

Apparently, something about Gutenberg is changing the basic html of post content.

https://wordpress.org/support/topic/gutenberg-does-not-play-nicely-with-code-editor/

https://github.com/WordPress/gutenberg/issues/11211

https://core.trac.wordpress.org/ticket/45636?cversion=0&cnum_hist=1

I was able to fix the front-end paragraph spacing issue with this css courtesy of Themeisle

    br
{   content: "A" !important;
    display: block !important;
    margin-bottom: 1em !important;
}

Even so, I want to know how to best move forward in terms of the html. Is this some kind of bug with Gutenberg or should I just go ahead with the CSS fix? What is the proper html here? if it’s wrong, is there some kind of regex magic that would fix it?

update

In response to @tomjnowell, I ran the export with no active plugins and imported it into a fresh installation with no active plugins. Here are the results:

The xml was imported into the blank site. The post content then appears in a Gutenberg “Classic” block in “edit as html” mode with the space between parargraphs removed and br’s and nbsp’s added. BR’s do not appear in the database, but nbsp’s do.

Here is another comparison showing from left to right, the XML, the origin db and the destination db. Also, I noticed the following differences between the databases. Not sure if it matters.

Origin db post_content has as type: MyISAM with collation set as: latin1_swedish_ci

Destination db post_content has as type: InnoDB with collation set as: utf8mb4_unicode_520_ci