Get amount of transfered bytes #10156

ariya · 2011-07-02T00:20:46Z

There is no way to reliably get the amount of transfered bytes for a request.

bodySize is not available for responses with stage==end and the content-length header is not very reliable, especially as it seems to be unset for requests with content-encoding "gzip".

I guess the bodySize has to be summed up for each bunch of data received and made available in a respose with stage==end.

Disclaimer:
This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #156.
🌟 9 people had starred this issue at the time of migration.

ariya · 2011-07-03T12:26:43Z

ariya.hi...@gmail.com commented:

Metadata Updates

Label(s) removed:
- Type-Defect
Label(s) added:
- Type-Enhancement
Milestone updated: FutureRelease (was: ---)
Status updated: Accepted

marcelduran · 2012-03-24T07:12:46Z

marceldu...@gmail.com commented:

Any reason why Content-Length header isn't available for Content-Encoding:gzip responses?

While issue 158 is still open there's no way to get both compressed and uncompressed sizes of gzip responses, therefore netsniff.js example that generates HAR file is a bit misleading:

https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js#L51

According to HAR spec (http://www.softwareishard.com/blog/har-12-spec/#content), content.size:
"... should be equal to response.bodySize if there is no compression and bigger when the content has been compressed."

marcelduran · 2012-03-24T09:29:45Z

marceldu...@gmail.com commented:

Another gzip/raw content issue:

By running:
phantomjs netsniff.js http://search.yahoo.com

The generated HAR shows that the main html response headers contains Content-Encoding: gzip and the bodySize is 12726.

However by running curl with compression it gets different result:

curl search.yahoo.com -H "Accept-Encoding:gzip" | wc -c
4328

And without compression it gets similar size for what phantomjs is returning:

curl search.yahoo.com | wc -c
12120

strtok · 2013-06-12T00:12:34Z

I see this was migrated to 'feature enhancement', but I think this should be considered a bug. Anyone using the HAR output from netsniff.js are seeing uncompressed bytes only, and are getting an inaccurate representation of actual bytes transferred.

Is this data not easily accessible from QT?

sveisvei · 2013-09-25T13:21:18Z

+1 on this, any suggestion where the extra bytes are comming from?

fwebdev · 2013-11-08T16:40:22Z

For me all bytesizes on CSS/JS Files are shown significant smaller then tey are in reallity (According to Chorme Dev Tools and Firebug)
Also comparing to the gzip Files size they are shown too small.

Imagesizes are all shown correct. Anybody else has that kind of Problem?

ariya/phantomjs#10156

zackw · 2015-04-19T18:21:11Z

Seems to still be an issue in 2.0. I get the impression Qt/Webkit changes might be needed?

djberriman · 2015-04-29T13:21:43Z

I beleive if you are talking to a chunking server content-length is not set, instead the size of each chunk is passed before the data itself and when a size of zero is returned the resource is complete. So that may explain why content-length is not present sometimes.

Looking at networkaccessmanager.cpp NetworkAccessManager::handleStarted() sets the bodySize to reply->size(); NetworkAccessManager::handleFinished does not set the bodySize to presumably it is left as is and is the size of the content (when not chunking) or the first chunk.

QTNetworkReply has a downloadProgress signal which returns bytesReceived and bytesTotal. Perhaps that could be used.

NetworkAccessManager::handleFinished could set the bodySize to the content-length where it is available.

Its a pity there does not appear to be a signal for each chunk (unless downloadProgress provides that) as it would then be possible to determine the size downloaded correctly by simply adding the chunksize to bodySize

djberriman · 2015-04-29T15:06:36Z

I did some more research and it appears QT must be removing the content-length header when gzip is used. I did the same request via telnet and via phantomjs, note chunking is not in use.

telnet response:-

Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
Set-Cookie: ASP.NET_SessionId=onq34pudvbwazeh04ksylpfs; path=/; HttpOnly
X-AspNetMvc-Version: 4.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
X-Frame-Options: SAMEORIGIN
Date: Wed, 29 Apr 2015 14:54:17 GMT
Content-Length: 22767

phantom response:-

Cache-Control = private
Content-Type = text/html; charset=utf-8
Content-Encoding = gzip
Vary = Accept-Encoding
Server = Microsoft-IIS/8.0
Set-Cookie = ASP.NET_SessionId=e02yvkniwvolblo31qyt42ia; path=/; HttpOnly
X-AspNetMvc-Version = 4.0
X-AspNet-Version = 4.0.30319
X-Powered-By = ASP.NET
X-Frame-Options = SAMEORIGIN
Date = Wed, 29 Apr 2015 14:51:42 GMT

It would appear QT is for some reason removing the header.

djberriman · 2015-04-30T08:42:22Z

Changenetworkaccessmanager.cpp
data["bodySize"] = reply->size();
to
data["bodySize"] = reply->header(QNetworkRequest::ContentLengthHeader);
This then means that when Content-Length is passed bodySize is correct.
It won't work (but then neither does the current code) when chunking is in use or the Content-Length is not passed by QT such as when gzip is used. Disabling gzip in the 2nd case works round that issue.

djberriman · 2015-04-30T08:55:59Z

from what I can see size() is just the size of the QbyteArray.......

djberriman · 2015-04-30T09:10:57Z

for gzip you need to set the header yourself to accept gzip as there is a bug in QT

https://bugreports.qt.io/browse/QTBUG-41840

Content-Length is then returned unfortunately you then run accross bug https://forum.qt.io/topic/2308/content-encoding-gzip-with-qt-webkit/9 and the content is not decompressed.

gmetais · 2015-04-30T09:16:28Z

Great digging work @djberriman! Go on!
Thousands of people are supporting you!

👍 👍 👍

djberriman · 2015-04-30T09:31:22Z

QT does indeed specifically remove the content-length header on gzip data

void QHttpNetworkReplyPrivate::removeAutoDecompressHeader()
{
// The header "Content-Encoding = gzip" is retained.
// Content-Length is removed since the actual one send by the server is for compressed data
QByteArray name("content-length");
QList<QPair<QByteArray, QByteArray> >::Iterator it = fields.begin(),
end = fields.end();
while (it != end) {
if (qstricmp(name.constData(), it->first.constData()) == 0) {
fields.erase(it);
break;
}
++it;
}

}

djberriman · 2015-04-30T10:02:10Z

From what I can see from the QT source code it may well be worth using the QTNetworkReply downloadProgress signal which returns bytesReceived and bytesTotal. I believe this will also mean chunked data will work correctly as it will fire for each chunk.

djberriman · 2015-04-30T12:31:01Z

I appear to have a fix for this not sure how to submit it so I will work on that in a moment.

Basically phantomjs is not trapping one of the emits from QT so the size returned is that of the first read. We need to add another stage as well as 'start' and 'end' which I have called 'data'. If you cater for this in your onResourceReceived function and add up the res.bodySize returned each time it is triggered for a particular resource (End will return 0) then you will have the true size of the content. This should I believe work regardless of conent-length being passed, gzip or chunking. Do not rely on Content-Length.

replace handleStarted() in networkaccessmanager.cpp with the following code.

void NetworkAccessManager::handleStarted()
{
QNetworkReply reply = qobject_cast<QNetworkReply>(sender());
if (!reply)
return;

QVariantList headers;
foreach (QByteArray headerName, reply->rawHeaderList()) {
    QVariantMap header;
    header["name"] = QString::fromUtf8(headerName);
    header["value"] = QString::fromUtf8(reply->rawHeader(headerName));
    headers += header;
}

QVariantMap data;
if (!m_started.contains(reply)) {
  m_started += reply;
  data["stage"] = "start";
}
else {
  data["stage"] = "data";
}
data["id"] = m_ids.value(reply);
data["url"] = reply->url().toEncoded().data();
data["status"] = reply->attribute(QNetworkRequest::HttpStatusCodeAttribute);
data["statusText"] = reply->attribute(QNetworkRequest::HttpReasonPhraseAttribute);
data["contentType"] = reply->header(QNetworkRequest::ContentTypeHeader);
data["bodySize"] = reply->size();
data["redirectURL"] = reply->header(QNetworkRequest::LocationHeader);
data["headers"] = headers;
data["time"] = QDateTime::currentDateTime();

emit resourceReceived(data);

}

sveisvei · 2015-04-30T14:05:21Z

This is awesome @djberriman

djberriman · 2015-04-30T15:19:09Z

Just be aware the total size returned appears to be the uncompressed size not the content-length when gzip is being used, ran a test allowing gzip and one not allowing gzip and got same results.

ktilcu · 2015-06-16T18:01:19Z

@djberriman Any thoughts on getting the gzip sizes?

atwenzel · 2015-07-17T19:02:02Z

@djberriman Thanks so much for this fix, this is exactly what I need for my project.

Can anyone give a general idea of the changes that should be made to the onResourceReceived function, especially in the context of the netsniff.js example (https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js) I've built phantomjs with this fix but I'm a little unsure how to implement it in a script. Thanks!

EDIT: I seem to have solved my issue. For anyone else with as little phantomjs experience as I have who finds this thread, in the above example, you can change

page.onResourceReceived = function (res) {
if (res.stage === 'start') {
page.resources[res.id].startReply = res;
}
if (res.stage === 'end') {
page.resources[res.id].endReply = res;
}
};

to

page.onResourceReceived = function (res) {
    if (res.stage === 'start') {
        page.resources[res.id].startReply = res;
    }
    if (res.stage === 'data') {
        page.resources[res.id].startReply.bodySize += res.bodySize;
    }
    if (res.stage === 'end') {
        page.resources[res.id].endReply = res;
    }
};

And it should work with @djberriman's change.

tufandevrim · 2015-12-15T22:01:48Z

@ariya @djberriman what's the resolution on this one?

djberriman · 2015-12-17T09:33:21Z

@tufandevrim Just waiting for @ariya to put it in the main line

EFF · 2016-03-11T16:20:47Z

@ariya @djberriman ... did we finally merged this one in 2.1.1 ? Fix looks good to me.

vargasj · 2016-03-30T18:21:30Z

Has this been solved? Thanks.

adeelraza · 2016-04-05T10:36:44Z

+1

djberriman · 2016-04-05T11:55:59Z

onResourceReceived function should read more like:-

if (res.stage == 'start') {
urlRequestedBytes[res.id] = res.bodySize;
}
else {
if (res.bodySize != undefined) {
urlRequestedBytes[res.id] += res.bodySize;
}
}

During my testing I found both 'data' and 'end' could return a size depending on whether chunking is in use and that it can also be returned as undefined. To get the correct size in all cases you need to add the value returned in bodySize in each 'start','data' and 'end'.

djberriman · 2016-04-13T20:45:08Z

Just a quick update on content-length with encoded response (gzip). The lack of a content-length header was due to a feature of QT whereby they physically removed the header if it was compressed. Following proof of the bug/feature and some discussion the code will now be removed from QT that does this which means content-length will always be passed if returned from the server (chunking servers for instance don't return a length).

stephanebachelier · 2016-06-01T18:20:34Z

@djberriman for gzipped response you will probably have no header content-length as the content will be stream which you can verify if there is the header "Transfer-Encoding: chunked".
If the content has already been gzipped before (cache, disk, ...), the server will set the content-length header as it will know the length of the gzip archive.

jsut · 2016-08-25T19:32:20Z

@djberriman with regards to the content length, the current version of QT will emit the content length header though 'downloadMetaData', but i'm not convinced the value of the contentLength header is really the best thing to use if you actually want the amount of bytes transferred, that omits the size of the header, which if you have a lot of cookies can be significant, especially across all the requests required to render a web page.

It seems like using downloadProgress, which you mentioned earlier might be a better approach, depending on what your use case is. Better yet would be if the QT library had something like reply->bytes_transferred. based on the documentation of downloadProgress[1] though, it does seem like that is the best approach. Though I think QT removing the contentLength header is kind of dumb too.

[1] http://doc.qt.io/qt-5/qnetworkreply.html#downloadProgress

tomgallagher · 2017-01-26T17:04:09Z

Strangely I'm in need of the content length header only. What's the state of play on this? Has this been resolved in a later version of Phantom. I'm using 2.1.1

abbasharoon · 2017-02-24T18:17:15Z

+1

macbre mentioned this issue Nov 24, 2013

GZIP compression and response size reporting issues macbre/phantomas#137

Open

macbre added a commit to macbre/phantomas that referenced this issue Nov 26, 2013

Body size related metrics are not reliable

15caa3c

ariya/phantomjs#10156

soulgalore mentioned this issue Jan 29, 2014

JS and css sizes are not displaying size gzipped sitespeedio/sitespeed.io#54

Closed

cvan mentioned this issue Jul 2, 2014

Adjust responses sizes to be correct (workaround for PhantomJS/SlimerJS bug) cvan/phantomHAR#4

Closed

macbre mentioned this issue Nov 26, 2014

gzip and uncompressed seem to be showing the same value macbre/phantomas#432

Closed

zackw removed this from the FutureRelease milestone Apr 19, 2015

zackw added Improvement Qt/Webkit Need code Confirmed and removed old.Priority-Medium labels Apr 19, 2015

zackw mentioned this issue Apr 19, 2015

bodySize wrong values #10169

Closed

djberriman mentioned this issue Apr 30, 2015

Update networkaccessmanager.cpp - fix #10156 bodysize inconsistances/bytes transferred #13195

Merged

erikdubbelboer added a commit to erikdubbelboer/phantomjs that referenced this issue Dec 27, 2015

Fix bodySize in onResourceReceived end, fix ariya#10156

80ce3e5

tomgallagher mentioned this issue Jan 26, 2017

Getting Content-Length Response Headers in gzipped Network Traffic teampoltergeist/poltergeist#849

Closed

ghost closed this as completed in 545b03c Jan 8, 2018

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get amount of transfered bytes #10156

Get amount of transfered bytes #10156

ariya commented Jul 2, 2011

ariya commented Jul 3, 2011

marcelduran commented Mar 24, 2012

marcelduran commented Mar 24, 2012

strtok commented Jun 12, 2013

sveisvei commented Sep 25, 2013

fwebdev commented Nov 8, 2013

zackw commented Apr 19, 2015

djberriman commented Apr 29, 2015

djberriman commented Apr 29, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

gmetais commented Apr 30, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

sveisvei commented Apr 30, 2015

djberriman commented Apr 30, 2015

ktilcu commented Jun 16, 2015

atwenzel commented Jul 17, 2015

tufandevrim commented Dec 15, 2015

djberriman commented Dec 17, 2015

EFF commented Mar 11, 2016

vargasj commented Mar 30, 2016

adeelraza commented Apr 5, 2016

djberriman commented Apr 5, 2016

djberriman commented Apr 13, 2016

stephanebachelier commented Jun 1, 2016

jsut commented Aug 25, 2016

tomgallagher commented Jan 26, 2017

abbasharoon commented Feb 24, 2017

Get amount of transfered bytes #10156

Get amount of transfered bytes #10156

Comments

ariya commented Jul 2, 2011

ariya commented Jul 3, 2011

marcelduran commented Mar 24, 2012

marcelduran commented Mar 24, 2012

strtok commented Jun 12, 2013

sveisvei commented Sep 25, 2013

fwebdev commented Nov 8, 2013

zackw commented Apr 19, 2015

djberriman commented Apr 29, 2015

djberriman commented Apr 29, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

gmetais commented Apr 30, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

djberriman commented Apr 30, 2015

sveisvei commented Apr 30, 2015

djberriman commented Apr 30, 2015

ktilcu commented Jun 16, 2015

atwenzel commented Jul 17, 2015

tufandevrim commented Dec 15, 2015

djberriman commented Dec 17, 2015

EFF commented Mar 11, 2016

vargasj commented Mar 30, 2016

adeelraza commented Apr 5, 2016

djberriman commented Apr 5, 2016

djberriman commented Apr 13, 2016

stephanebachelier commented Jun 1, 2016

jsut commented Aug 25, 2016

tomgallagher commented Jan 26, 2017

abbasharoon commented Feb 24, 2017