Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML parser option to forcibly use SimpleParser instead of NativeParser #48

Merged
merged 2 commits into from
Jul 13, 2017

Conversation

diegosainz
Copy link

Hi Sébastien,

At work we have been through a couple of busy days to diagnose and workaround a problem in PHP's xml_parse native functions while parsing large values in XML attributes.

The problem is that xml_parse (at least from PHP 5.5 to PHP 7.1) limits the maximum size of any XML node's attribute value to 10M (it can progressively parse XML files of any size, but not attribute values of any size - find here a script that I created to demostrate that). This seems related to the change that was done to libxml2 to limit the memory while parsing. The option XML_PARSE_HUGE can be passed to the C library to overcome that limitation, but sadly PHP's xml_parse does not supports it yet.

As we use soluble-japha mainly for Jasper Reports, when there is a PDF of a couple thousand pages, the response will come as a base64-encoded XML node attribute value that is easily greater than the xml_parser limit of 10M.

This PR adds the following features:

  • Hint the Parser class to always use SimpleParser instead of NativeParser. The SimpleParser parses huge Pjb62 without any problem.

And fixes the following issues:

  • Default options on PjbProxyClient were impossible to be overriden.

Due to the nested singleton objects I had some trouble to create the unit tests. I finally settled on using reflection to force the re initialization of the PjbProxyClient object and be able to unit test its configuration.

Right now my hacky workaround is to artificially define the HHVM_VERSION constant to fool the Parser class into using SimpleParser (yeah, I know...).

*/
public function __construct(Client $handler)
public function __construct(Client $handler, bool $forceSimpleParser)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you set a default behaviour here ?

public function __construct(Client $handler, bool $forceSimpleParser=false)

@belgattitude
Copy link
Owner

Hi @diegosainz ,

Interesting, I'll check this week-end to know how to fix the travis issue (unit test + code style) and release a new version.

In the meanwhile, you can always add your p/r in composer, see an example here in case you need a urgent fix.

That said, I would not recomment to send the generated PDF through the pjb protocol. When using Jasper, I generally save the result in a file (the filename can be generated with tempnam for example) and read it back on the PHP side. It works great. I used this method to generate 700+ pages product catalogs (pictures...) without problems.

@diegosainz
Copy link
Author

Hi @belgattitude - thank you for your quick reply!

I'll update the PR with the default constructor value for the new argument - good point.

No rush on our side - we applied the define("HHVM_VERSION", ...) workaround and is working well, but thanks for the composer/PR tip - didn't know about that!

Indeed, as you say, a better alternative would be for Jasper to the file and then serve it. Initially the report server was on another server and that was the quickest path. Either way you will be happy to know that we are generating 8,000-page pdfs (albeit just with a couple barcodes and some extra info) without much trouble using japha :)

@belgattitude belgattitude merged commit 620509f into belgattitude:master Jul 13, 2017
@belgattitude
Copy link
Owner

Hi @diegosainz , merged and released in 2.1.0.

Congrats ! 8000 pages ;) Not sure about my own volume, but I used it for barcode generation too for two major shippers in belgium (track&trace)... But very nice to know what japha is used for, that gives me some motivation to opensource more ;)

If you intend to setup an instance on the same server, be sure to check the latest phpjavabridge server fork. It's much more easy to produce a ready to deploy war file with all your deps (jasper, mysql...) or a standalone version with webapp-runner (see the .travis folder as an example).

Documentation is also availble here: http://docs.soluble.io/soluble-japha/install_server/.

Have fun and thanks again.

Seb

@belgattitude
Copy link
Owner

And if you like to, don't forget to star the project ;) Might give some extra motivation, hehe

@diegosainz
Copy link
Author

That was fast! Thank you, @belgattitude! :D I really appreciate the extra work required to open source something instead of just making quick changes and keep the fork private.
I don't know why I didn't star the repo before- we use your code everyday across multiple clients. It was a very welcome change to have a modern PHP/Java Bridge.

Also I didn't know about the server fork! - thanks for pointing it out! it will be for sure in our product maintenance roadmap to use it instead of the old one.

@belgattitude
Copy link
Owner

Great,

I was curious about the performance difference between native and pure-php version of the parser, I you're interested:

  • the bench suite with native: around 6500ms
  • the bench suite with pure-php: around 9500ms

See how

All the best :)

FYI, just documented the new option: http://docs.soluble.io/soluble-japha/bridge_connection/

@diegosainz
Copy link
Author

diegosainz commented Aug 7, 2017

Thank you @belgattitude! - I wonder if under OpCache it would be closer to the native implementation.

Anyway, didn't know about the documentation page! it's great! (and even better how you have been mantling the small details with tips/summaries/etc). I will definitively take a look into MkDocs

@belgattitude
Copy link
Owner

Hey @diegosainz , yes I was wondering too... So I've re-run the benchsuite with latest php7.1.8 fpm and opcache, but it does not change a lot. (native still 25-30% faster). Anyway in real world usage I guess this does not make any differences, the bench suite is making a lot of calls ( 150.000 ping pongs ;)

PS:

Native parser bench

Benchmark name x1 x100 x1000 x10000 Average Memory
New java(java.lang.String, "One") 0.29ms 5.32ms 34.76ms 317.08ms 0.03ms 12.34Kb
New java(java.math.BigInteger, 1) 0.06ms 3.17ms 32.26ms 317.58ms 0.03ms 0.37Kb
javaClass(java.sql.DriverManager) 0.16ms 0.05ms 0.56ms 4.74ms 0.00ms 3.48Kb
Enums on javaClass 0.15ms 2.79ms 29.54ms 273.27ms 0.03ms 2.13Kb
Method call java.lang.String->length() 0.05ms 2.15ms 22.30ms 225.21ms 0.02ms 0.34Kb
Method call String->concat("hello") 0.06ms 2.98ms 30.08ms 299.74ms 0.03ms 0.37Kb
$a = ...String->concat('hello') . ' world' 0.09ms 5.64ms 58.46ms 590.79ms 0.06ms 0.42Kb
New java(java.util.HashMap, $arr) 0.13ms 3.95ms 38.46ms 389.25ms 0.04ms 3.05Kb
Method call HashMap->get('arrKey') 0.06ms 2.92ms 30.06ms 298.74ms 0.03ms 0.39Kb
Call (string) HashMap->get('arrKey')[0] 0.09ms 5.77ms 60.52ms 608.00ms 0.06ms 0.37Kb
Iterate HashMap->get('arrKey')[0]` 0.24ms 13.41ms 134.17ms 1,367.12ms 0.14ms 2.49Kb
GetValues on HashMap 0.06ms 4.38ms 37.63ms 391.61ms 0.04ms 1.27Kb
New java(HashMap(array_fill(0, 100, true))) 0.17ms 12.38ms 131.94ms 1,299.80ms 0.13ms 0.63Kb
Pure PHP: call PHP strlen() method 0.00ms 0.00ms 0.01ms 0.05ms 0.00ms 0.37Kb
Pure PHP: concat '$string . "hello"' 0.00ms 0.01ms 0.02ms 0.20ms 0.00ms 120.37Kb
  • Connection time: 2 ms
  • Total time : 7095 ms

Pure-php parser bench;

Benchmark name x1 x100 x1000 x10000 Average Memory
New java(java.lang.String, "One") 0.09ms 5.14ms 47.90ms 467.57ms 0.05ms 12.34Kb
New java(java.math.BigInteger, 1) 0.05ms 4.22ms 47.05ms 468.34ms 0.05ms 0.37Kb
javaClass(java.sql.DriverManager) 0.11ms 0.04ms 0.50ms 4.48ms 0.00ms 3.48Kb
Enums on javaClass 0.07ms 4.52ms 42.39ms 421.94ms 0.04ms 2.13Kb
Method call java.lang.String->length() 0.07ms 2.81ms 29.35ms 277.48ms 0.03ms 0.34Kb
Method call String->concat("hello") 0.06ms 4.24ms 40.97ms 422.63ms 0.04ms 0.37Kb
$a = ...String->concat('hello') . ' world' 0.12ms 8.22ms 84.86ms 820.51ms 0.08ms 0.42Kb
New java(java.util.HashMap, $arr) 0.13ms 4.85ms 54.54ms 530.22ms 0.05ms 3.05Kb
Method call HashMap->get('arrKey') 0.08ms 4.55ms 49.25ms 490.05ms 0.05ms 0.39Kb
Call (string) HashMap->get('arrKey')[0] 0.09ms 7.94ms 80.25ms 790.83ms 0.08ms 0.37Kb
Iterate HashMap->get('arrKey')[0]` 0.29ms 21.09ms 209.82ms 2,071.85ms 0.21ms 2.49Kb
GetValues on HashMap 0.11ms 8.08ms 79.69ms 786.67ms 0.08ms 1.27Kb
New java(HashMap(array_fill(0, 100, true))) 0.18ms 14.84ms 148.79ms 1,474.98ms 0.15ms 0.63Kb
Pure PHP: call PHP strlen() method 0.00ms 0.00ms 0.01ms 0.05ms 0.00ms 0.37Kb
Pure PHP: concat '$string . "hello"' 0.00ms 0.00ms 0.02ms 0.20ms 0.00ms 120.37Kb
  • Connection time: 2 ms
  • Total time : 10038 ms

@diegosainz
Copy link
Author

@belgattitude interesting to have the raw benchmark, thank you!!. And that's right: in our use case the calls are minimum (a couple java calls to set up the byte stream, compile the reports, etc) and all the hard work is done by Jasper Reports. Probably we are in the order of less than 200 calls.

@belgattitude
Copy link
Owner

@diegosainz , very good. Just out of curiosity, are you using PHP7.1 yet ?

I've recently released the 2.0 milestone and updated the requirement to PHP7.1 (no api break)

@diegosainz
Copy link
Author

@belgattitude that was a JIT question! - we are right now migrating our production servers to PHP 7.1 (from 7.0). All our docker and tests environments are already running PHP 7.1 since last week. So far everything has been stable (but we are not using the latest version of soluble-japha yet)

@belgattitude
Copy link
Owner

Great, don't forget to update to ^2.2.0 release when ready then ;)

Keep the good work.

@belgattitude
Copy link
Owner

Hey @diegosainz ,

Just started working on a jasper wrapper with the bridge. You can have a look at https://github.com/belgattitude/soluble-jasper.

It's in early stages now. I'm currently trying to stabilize the API before starting the doc... and optimize.

I would really appreciate your comments on this (api, features, ideas, experiences... ?). See also the benchmarks on the project page, I'll improve it over time.

All the best

Thanks,

Seb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants