Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory in parser #3799

Closed
oandregal opened this issue Dec 4, 2017 · 18 comments
Closed

Out of memory in parser #3799

oandregal opened this issue Dec 4, 2017 · 18 comments
Assignees
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Priority] High Used to indicate top priority items that need quick attention [Type] Bug An existing feature does not function as intended
Milestone

Comments

@oandregal
Copy link
Member

oandregal commented Dec 4, 2017

When trying to load a very old and large post from my blog and I get these errors:

parser-line-96

As far as I've tested this fails randomly at lines 96, 349, or 372 of the parser when I reload the page.

I've tried with increasing the maximum file size allowed for the network but it didn't solve the issue. For context, if I deactivate the Gutenberg plugin, the post is loaded and shows the content.

Not sure how I can share any more details for debugging, just ping me and I'll provide what you need.

@oandregal oandregal added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Bug An existing feature does not function as intended labels Dec 4, 2017
@oandregal
Copy link
Member Author

cc @aduth @dmsnell @pento as you did recent work on that part of the code.

@aduth aduth added the [Priority] High Used to indicate top priority items that need quick attention label Dec 4, 2017
@dmsnell
Copy link
Member

dmsnell commented Dec 4, 2017

Can you share the source of the post?

@dmsnell dmsnell self-assigned this Dec 4, 2017
@oandregal
Copy link
Member Author

oandregal commented Dec 4, 2017

@dmsnell you can find it here. Expect broken links, non-existing media, and the like. I haven't touched it since I discovered the bug, so we can reproduce and debug what happened.

@dmsnell
Copy link
Member

dmsnell commented Dec 4, 2017

Hm. Not that big of a document. I'll start thinking about this - thanks!

@pento
Copy link
Member

pento commented Dec 5, 2017

I can reproduce similar excessive memory usage with Gutenberg on this post:

  • Hello World post, Gutenberg disabled: 4.1MB
  • Hello World post, Gutenberg enabled: 4.2MB
  • This post, Gutenberg disabled: 4.3MB
  • This post, Gutenberg enabled: 46.6MB

@dmsnell
Copy link
Member

dmsnell commented Dec 5, 2017

I believe that I know why it's occurring and it's probably because of how we're handling those freeform HTML blocks. we build a list of each character in the parser and then later join them into a single string.

this was a fear of mine when converting to the nested parser with our move back to the idea of tokens (necessary) but not sure yet how to fix this.

@oandregal
Copy link
Member Author

A follow-up to this in Gutenberg 2.0: the post loads and the memory issue is gone.

Not sure if related to the fix or not, but when the post loads it has the raw HTML and the text is grey:

post-load

The console shows an error:

post-load-console

Attached the error log.

When I convert this to blocks, it's done but some paragraphs and items within lists are stripped.

@mcsf
Copy link
Contributor

mcsf commented Jan 15, 2018

For the record, the mismatch that fails the validation is based on   around tags:

screen shot 2018-01-15 at 14 43 32

In contrast, pasting the raw content works, though it's very slow (several seconds on a 3.1 GHz i7).

@karmatosed karmatosed added this to the Merge Proposal milestone Jan 25, 2018
@oandregal
Copy link
Member Author

I can confirm that @aduth #4591 fixed the issue: convert to blocks works. Thanks!

I have found some issues with a couple of lists when they are converted to Gutenblocks, but will report in a separate issue. The source of the error is likely an originally bad-formatted HTML.

@mtias
Copy link
Member

mtias commented Mar 1, 2018

Thanks for the follow ups, @nosolosw. :)

@oandregal
Copy link
Member Author

oandregal commented Nov 16, 2018

I've been battle-testing Gutenberg again with this post I wrote 10 years ago. I've tried to convert it to blocks using Gutenberg 4.4:

  • Open the post in Gutenberg, the whole text is embedded in the classic block.
  • Check the document stats: it's 10.605 words long.
  • Convert it to blocks.
  • Check the document stats: it's 8.269 words long. One of the last sections was entirely gone (anything following the "Una panorámica general de las herramientas").

cc @dmsnell @mcsf @mtias @aduth @youknowriad

Going to see if I can bisect when this started to happen. Will post any follow ups here.

@oandregal
Copy link
Member Author

Bisected from v4.0.0-rc.1: it looks like the content loss may have started happening in 489eb79 cc @iseulde

Note that before that commit, Gutenberg reported 9.981 words after converting the old post to blocks (less than 10.605 in the original) but I couldn't find noticeable content loss.

@ellatrix
Copy link
Member

@nosolosw could you report a new issue with it?

@oandregal
Copy link
Member Author

@iseulde created at #12029

@mcsf
Copy link
Contributor

mcsf commented Nov 19, 2018

Thanks for battle-testing and dog-fooding, @nosolosw. Out of curiosity, since this issue was originally about the parser, have you noticed any issues with calling the server-side parser on that post of yours? Because since #10463 we no longer skip the parser (as fixed in #4591) but rather use the improved parser implementation added in #8083.

@oandregal
Copy link
Member Author

oandregal commented Nov 20, 2018

I wasn't able to become familiar with all Gutenberg components in these few weeks in the project, so I'm not sure what the server-side parsing means. I'm happy to try if I'm given some steps to reproduce.

@oandregal
Copy link
Member Author

oandregal commented Nov 20, 2018

Oh, now that I had some coffee, it occurred to me that by server-side parsing you meant if the content within the classic block in Gutenberg was the same that the content in the classic editor? I've just compared both and yes, the content is equal.

cc @mcsf

@mcsf
Copy link
Contributor

mcsf commented Nov 20, 2018

Oh, now that I had some coffee

😄

it occurred to me that by server-side parsing you meant if the content within the classic block in Gutenberg was the same that the content in the classic editor? I've just compared both and yes, the content is equal.

@nosolosw, actually I meant whether PHP's gutenberg_parse_blocks ever acts up with your largest posts. At its simplest, this means: when loading the front-end for a post, does anything ever fail? In more detail, this could mean examining the output of gutenberg_parse_blocks (called in do_blocks) for anything unexpected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Priority] High Used to indicate top priority items that need quick attention [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

No branches or pull requests

8 participants