-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing wasnt forced anymore since middleware order has changed #729
Comments
Thanks for you detailed information, I'll see if I can find a solution for this. Will get back to you when I have more information on this. |
Have tested the same crawler setup in a TYPO3 10.4.14 Installation. |
Super. That's really valuable feedback. I'll need to dig more into Middleware, it's unfortunately something that I don't know well enough yet. |
Thank you, no problem, do not rush. |
I'm afraid we have a chicken/egg problem here. The AOEpeople@0f7cb6a moved the So moving this back in that position will be a regression, that introduces the bug from #642 again. Any hints on this would be welcome. |
Could we split up the middlewares into one called |
I honestly don't know. I'm still missing some basics knowledge on the middleware concept, but in general I don't see a problem in splitting it if it would solve the problem. But still not sure if it will solve the problem or introduce a new one. Can be because of conceptual missing information from my site. |
We could discuss this over slack tomorrow afternoon, if you have some time, we can still summarize our talking points here. |
Btw... it is possible to define restrictions in EXT:crawler for middlewares that do not exist in an installation (as far as I can remember): "Load after TSFE initialization, but before EXT:staticfilecache middleware XYZ") staticfilecache has very "lax" middleware orderings so we can push it in the right order (don't know from just reading this issue where EXT: staticfilecache should be loaded and where the crawler middleware should be loaded). |
Do you have a link to the documentation @bmack ? |
So, we discussed it with @tomasnorre and found what we need to change. Tomas prepares an according PR. What we did is split the initialization to have a part that ensures that the content is always rendered, before we return the result status to the crawler runner. |
This middleware splits the content from the Crawler Initialization Middleware to ensure that the content is written at the end. This ensures that middleware that expects response-object gets the content for the renders and the crawler queue gets the correct request status. Resolves: #729
@rengaw83 Could you please check if this PR fixes the problem for you? |
This middleware splits the content from the Crawler Initialization Middleware to ensure that the content is written at the end. This ensures that middleware that expects response-object gets the content for the renders and the crawler queue gets the correct request status. Resolves: #729
hi @tomasnorre, The position of the The At the
Its the same problem like #610. |
@rengaw83 Thanks for getting back to me. We thought splitting it into the ContentFinisher the problem would be solved. Don't want to break anything in the StaticFileCache with a fix in the Crawler. |
I understand, it's a bit tricky with the different dependencies, i guess. I have added a return [
'frontend' => [
'aoe/crawler/initialization' => [
'target' => \AOE\Crawler\Middleware\CrawlerInitialization::class,
'after' => [
'typo3/cms-frontend/tsfe',
],
'before' => [
'typo3/cms-frontend/prepare-tsfe-rendering',
],
],
],
]; |
Thanks will have a look at it. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Since commit 8a9b896 (#837) the "real" response is returned by the The reason for the patch 0f7cb6a (#642) that broke the middleware order is gone, and we can restore the proper loading order now. |
History: -------- Because of a problem with lochmueller/staticfilecache, crawler issue tomasnorre#642 changed the middleware loading order to execute crawler after static file cache. (commit 0f7cb6a) The source of the problem was that the crawler CrawlerInitialization middleware overwrote the HTTP response that was generated by TYPO3. Since commit 8a9b896 (issue tomasnorre#837) the HTTP response is not destroyed/overwritten by crawler anymore but moved into a HTTP header "X-T3Crawler-Meta". The loading order does not influence compatibility with static file cache anymore. Bug --- The changed loading order in the bug fix led to the problem that > indexed_search:TypoScriptFrontendHook was executed before > crawler:CrawlerInitialization But CrawlerInitialization must be run before TypoScriptFrontendHook because it loads request data that are needed by indexed_search. This led to bug tomasnorre#729 - forced reindexing by the crawler did not work anymore if the page was already in cache. Solution -------- Restore the HTTP middleware loading order as it was before the fix for tomasnorre#642, so that the code path is again: 1. crawler:FrontendUserAuthenticator (aoe/crawler/authentication) 2. crawler:CrawlerInitialization (aoe/crawler/initialization) 3. indexed_search:TypoScriptFrontendHook (called by typo3/cms-frontend/prepare-tsfe-rendering) Resolves: tomasnorre#729
History: -------- Because of a problem with lochmueller/staticfilecache, crawler issue tomasnorre#642 changed the middleware loading order to execute crawler after static file cache. (commit 0f7cb6a) The source of the problem was that the crawler CrawlerInitialization middleware overwrote the HTTP response that was generated by TYPO3. Since commit 8a9b896 (issue tomasnorre#837) the HTTP response is not destroyed/overwritten by crawler anymore but moved into a HTTP header "X-T3Crawler-Meta". The loading order does not influence compatibility with static file cache anymore. Bug --- The changed loading order in the bug fix led to the problem that > indexed_search:TypoScriptFrontendHook was executed before > crawler:CrawlerInitialization But CrawlerInitialization must be run before TypoScriptFrontendHook because it loads request data that are needed by indexed_search. This led to bug tomasnorre#729 - forced reindexing by the crawler did not work anymore if the page was already in cache. Solution -------- Restore the HTTP middleware loading order as it was before the fix for tomasnorre#642, so that the code path is again: 1. crawler:FrontendUserAuthenticator (aoe/crawler/authentication) 2. crawler:CrawlerInitialization (aoe/crawler/initialization) 3. indexed_search:TypoScriptFrontendHook (called by typo3/cms-frontend/prepare-tsfe-rendering) Resolves: tomasnorre#729
History: -------- Because of a problem with lochmueller/staticfilecache, crawler issue tomasnorre#642 changed the middleware loading order to execute crawler after static file cache. (commit 0f7cb6a) The source of the problem was that the crawler CrawlerInitialization middleware overwrote the HTTP response that was generated by TYPO3. Since commit 8a9b896 (issue tomasnorre#837) the HTTP response is not destroyed/overwritten by crawler anymore but moved into a HTTP header "X-T3Crawler-Meta". The loading order does not influence compatibility with static file cache anymore. Bug --- The changed loading order in the bug fix led to the problem that > indexed_search:TypoScriptFrontendHook was executed before > crawler:CrawlerInitialization But CrawlerInitialization must be run before TypoScriptFrontendHook because it loads request data that are needed by indexed_search. This led to bug tomasnorre#729 - forced reindexing by the crawler did not work anymore if the page was already in cache. Solution -------- Restore the HTTP middleware loading order as it was before the fix for tomasnorre#642, so that the code path is again: 1. crawler:FrontendUserAuthenticator (aoe/crawler/authentication) 2. crawler:CrawlerInitialization (aoe/crawler/initialization) 3. indexed_search:TypoScriptFrontendHook (called by typo3/cms-frontend/prepare-tsfe-rendering) Resolves: tomasnorre#729
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Stalebot I hate you
--
Regards/Mit freundlichen Grüßen
Christian Weiske
|
I'm considering disabling it. |
History: -------- Because of a problem with lochmueller/staticfilecache, crawler issue #642 changed the middleware loading order to execute crawler after static file cache. (commit 0f7cb6a) The source of the problem was that the crawler CrawlerInitialization middleware overwrote the HTTP response that was generated by TYPO3. Since commit 8a9b896 (issue #837) the HTTP response is not destroyed/overwritten by crawler anymore but moved into a HTTP header "X-T3Crawler-Meta". The loading order does not influence compatibility with static file cache anymore. Bug --- The changed loading order in the bug fix led to the problem that > indexed_search:TypoScriptFrontendHook was executed before > crawler:CrawlerInitialization But CrawlerInitialization must be run before TypoScriptFrontendHook because it loads request data that are needed by indexed_search. This led to bug #729 - forced reindexing by the crawler did not work anymore if the page was already in cache. Solution -------- Restore the HTTP middleware loading order as it was before the fix for #642, so that the code path is again: 1. crawler:FrontendUserAuthenticator (aoe/crawler/authentication) 2. crawler:CrawlerInitialization (aoe/crawler/initialization) 3. indexed_search:TypoScriptFrontendHook (called by typo3/cms-frontend/prepare-tsfe-rendering) Resolves: #729
History: -------- Because of a problem with lochmueller/staticfilecache, crawler issue tomasnorre#642 changed the middleware loading order to execute crawler after static file cache. (commit 0f7cb6a) The source of the problem was that the crawler CrawlerInitialization middleware overwrote the HTTP response that was generated by TYPO3. Since commit 8a9b896 (issue tomasnorre#837) the HTTP response is not destroyed/overwritten by crawler anymore but moved into a HTTP header "X-T3Crawler-Meta". The loading order does not influence compatibility with static file cache anymore. Bug --- The changed loading order in the bug fix led to the problem that > indexed_search:TypoScriptFrontendHook was executed before > crawler:CrawlerInitialization But CrawlerInitialization must be run before TypoScriptFrontendHook because it loads request data that are needed by indexed_search. This led to bug tomasnorre#729 - forced reindexing by the crawler did not work anymore if the page was already in cache. Solution -------- Restore the HTTP middleware loading order as it was before the fix for tomasnorre#642, so that the code path is again: 1. crawler:FrontendUserAuthenticator (aoe/crawler/authentication) 2. crawler:CrawlerInitialization (aoe/crawler/initialization) 3. indexed_search:TypoScriptFrontendHook (called by typo3/cms-frontend/prepare-tsfe-rendering) Resolves: tomasnorre#729
History: -------- Because of a problem with lochmueller/staticfilecache, crawler issue #642 changed the middleware loading order to execute crawler after static file cache. (commit 0f7cb6a) The source of the problem was that the crawler CrawlerInitialization middleware overwrote the HTTP response that was generated by TYPO3. Since commit 8a9b896 (issue #837) the HTTP response is not destroyed/overwritten by crawler anymore but moved into a HTTP header "X-T3Crawler-Meta". The loading order does not influence compatibility with static file cache anymore. Bug --- The changed loading order in the bug fix led to the problem that > indexed_search:TypoScriptFrontendHook was executed before > crawler:CrawlerInitialization But CrawlerInitialization must be run before TypoScriptFrontendHook because it loads request data that are needed by indexed_search. This led to bug #729 - forced reindexing by the crawler did not work anymore if the page was already in cache. Solution -------- Restore the HTTP middleware loading order as it was before the fix for #642, so that the code path is again: 1. crawler:FrontendUserAuthenticator (aoe/crawler/authentication) 2. crawler:CrawlerInitialization (aoe/crawler/initialization) 3. indexed_search:TypoScriptFrontendHook (called by typo3/cms-frontend/prepare-tsfe-rendering) Resolves: #729
Indexing was'nt forced anymore.
TYPO3 9.5.25
Crawler: 9.2.2
The urls are crawled, but the indexing wasn't forced.
In have debugged a lot, and in my case the problem is the current middleware registration order.
In #642 the order was changed, to fix issues with StaticFileCache extension.
See AOEpeople@0f7cb6a#diff-38337bad08776b0fe94f73f1cf471f8d184ec3f9d2e9254f7c2a275b83501e34
The middleware order (without staticfilecache installed) now looks like this:
The
aoe/crawler/initialization
middleware is loaded after thetypo3/cms-frontend/prepare-tsfe-rendering
now.The crawler can't fill
$GLOBALS['TSFE']->applicationData['tx_crawler']
before the TSFE checks theheaderNoCache
and so indexed_search cant disable cache output. so no page is generated and no content can be indexed.Here typo3 9 checks if the page should be generated or the cache will be used
https://github.com/TYPO3/TYPO3.CMS/blob/9.5/typo3/sysext/frontend/Classes/Controller/TypoScriptFrontendController.php#L2498-L2519
Here indexed search checks if the crawler is enabled and a indexing is in progress
https://github.com/TYPO3/TYPO3.CMS/blob/9.5/typo3/sysext/indexed_search/Classes/Hook/TypoScriptFrontendHook.php#L25-L40
I have reverted AOEpeople@0f7cb6a and the indexing works like expected again.
The crawler/initialization can't be the last in middleware chain, it has to be before the tsfe rendering!
I do not use the staticfilecache, so i can't fix issues with the staticfilecache and create a pr.
Maybe this is related to #679, but i have created a new issue, becouse i do not have staticfilecache like bh-teufels has.
The text was updated successfully, but these errors were encountered: