sitespeedio / chrome-har Goto Github PK
View Code? Open in Web Editor NEWCreate HAR files from Chrome Debugging Protocol data
License: MIT License
Create HAR files from Chrome Debugging Protocol data
License: MIT License
Hi everybody !
At first, thanks for this very useful tool !
And I wanted to suggest an improvement about a multi page HAR output, and I will try to provide as many elements as possible.
page1.html: loading 3 images + 1 js file, redirecting to page2.html on DOMContentloaded + 500ms
page2.html: loading 2 images
Thanks by advance !
Hi, I'm not sure of the best place to ask this. I see that there are some PRs / issues around iframe support in the project, but I'm struggling to see how to hook it up.
I am using the Chrome Extension debugger API to attach to the top level tabId
. I then receive a top level iframe loading network event, but I won't receive any events for scripts inside of that iframe. The same thing occurs for service workers.
I think that I can manually attach to the iframe as a target, but that leaves the problem of attaching immediately and enabling network before any requests are made in the target. Target.setAutoAttach
comes back successful, but I only see events the time after I restart the debugger—which is not super helpful. Using functions like Target.setDiscoverTargets
just says "not allowed", so those don't seem viable.
Any pointers here?
It would be awesome if there was an option to keep the internal properties of the HAR entries, such as __requestId
. This would allow us to make extensions to the HAR, e.g. to add the raw content of the requests. Is that something you'd consider adding? If yes, I can prepare a pull request.
Do you have plans to include WebSockets in the result of harFromMessages?
Since chrome 76, it includes WebSocket messages in HAR exports.
Ref: https://developer.chrome.com/blog/new-in-devtools-76#websocket
Thanks,
Awesome project!
Germán
when trying to work on my application using puppeteer i attempted to use chrome-har to convert the session events into a har file
everything works great, except a few requests (3 requests which are all served from browser cache) which seem to cause chrome-har to crash with following error message
TypeError: Cannot set property 'receive' of undefined
at harFromMessages (c:\tufin\tasks\2019_08_27_puppeteerPlayground\node_modules\chrome-har\index.js:373:29)
at Object.<anonymous> (c:\tufin\tasks\2019_08_27_puppeteerPlayground\testEventsLeadingToErrorInChromHar.js:25:13)
this is how i created the browser session events that i used to create the har file that fails
let events = [];
const client = await page.target().createCDPSession();
await client.send('Page.enable');
await client.send('Network.enable');
observe.forEach(method => {
client.on(method, params => {
events.push({ method, params });
});
});
this is an example of the problematic events (from a single requestId) that failed
{
"method": "Network.requestWillBeSent",
"params": {
"requestId": "1000029348.4288",
"loaderId": "86B2349DB4E665FBB7E8BADA9DF0D2F6",
"documentURL": "https://10.100.9.28/securetrack/pages/homePage/dashboard/dashboardMain.faces",
"request": {
"url": "https://10.100.9.28/securetrack/fonts/fontawesome-webfont.woff2?v=4.5.0",
"method": "GET",
"headers": {},
"mixedContentType": "none",
"initialPriority": "VeryLow",
"referrerPolicy": "no-referrer-when-downgrade"
},
"timestamp": 257791.674372,
"wallTime": 1567574376.163183,
"initiator": {
"type": "parser",
"url": "https://10.100.9.28/securetrack/pages/homePage/dashboard/dashboardMain.faces",
"lineNumber": 42
},
"type": "Font",
"frameId": "15AFFC9193B829A16FD799DE57028678",
"hasUserGesture": false
}
},
{
"method": "Network.requestServedFromCache",
"params": {
"requestId": "1000029348.4288"
}
},
{
"method": "Network.loadingFinished",
"params": {
"requestId": "1000029348.4288",
"timestamp": 257791.674396,
"encodedDataLength": 0,
"shouldReportCorbBlocking": false
}
}
when i remove all of the requests which are problematic (there are 3 requests, all related to this strange fontawesome-webfont url, with 3 events each, total 9 events) everything works great
There are discrepancies in the har generated from browsertime and the one from Chrome Dev tools
Find attached the two har files run on https://www.infosys.com
I have used below commands to generate the HAR using browsertime
browsertime https://www.infosys.com --chrome.chromedriverPath D:\programs\chromedriver_win32\89.0.4389.23\chromedriver.exe --iterations 1 --headless --browser chrome --chrome.includeResponseBodies all
My chrome browser version is 89.0.4389.114 and the compatible chrome driver I used is of version 89.0.4389.23
Chrome DevTools HAR has total 143 Http Requets whereas in btime har has only 137 http requests captured.
See the bug: sitespeedio/browsertime#522
So GoogleChrome/lighthouse#4005 implies this module will generate a HAR file from lighthouse output from the lighthouse cli like:
$ lighthouse --save-assets http://google.com
which generates several files the interesting one being:
www.google.com_2018-11-28_11-31-08-0.devtoolslog.json
Trying the create a HAR file from that gives errors like:
m-c02x8099jgh7:~ m0t01b6$ DEBUG=* node node_modules/chrome-har/tools/harCreator.js ~/www.google.com_2018-11-28_11-31-08-0.devtoolslog.json
chrome-har Request will be sent with requestId F64EEE5F8CA7C3C531714B9410C214AB that can't be mapped to any page at the moment. +0ms
chrome-har Couldn't find original request for redirect response: F64EEE5F8CA7C3C531714B9410C214AB +2ms
chrome-har Request will be sent with requestId F64EEE5F8CA7C3C531714B9410C214AB that can't be mapped to any page at the moment. +0ms
chrome-har Couldn't find original request for redirect response: F64EEE5F8CA7C3C531714B9410C214AB +0ms
chrome-har Request will be sent with requestId F64EEE5F8CA7C3C531714B9410C214AB that can't be mapped to any page at the moment. +0ms
Unhandled rejection TypeError: Cannot read property 'content' of undefined
at Object.harFromMessages (/Users/m0t01b6/node_modules/chrome-har/index.js:326:28)
at fs.readFileAsync.then.then.messages (/Users/m0t01b6/node_modules/chrome-har/tools/harCreator.js:22:28)
at tryCatcher (/Users/m0t01b6/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromise0 (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:694:18)
at Promise._fulfill (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:638:18)
at Promise._resolveCallback (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:432:57)
at Promise._settlePromiseFromHandler (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:524:17)
at Promise._settlePromise (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromise0 (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:694:18)
at Promise._fulfill (/Users/m0t01b6/node_modules/bluebird/js/release/promise.js:638:18)
at /Users/m0t01b6/node_modules/bluebird/js/release/nodeback.js:42:21
at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)
Thoughts??? THANKS!!!
while the browser already open, you can't get the har cause it doesn't have page events. is there a way to solve this?
Hello,
I'm using [email protected] [email protected] and [email protected]
const puppeteer = require('puppeteer');
const PuppeteerHar = require('puppeteer-har');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
console.log("create page");
const har = new PuppeteerHar(page);
await har.start({ path: 'results.har' });
console.log('Har started');
await page.goto('https://********/portail');
await har.stop();
await browser.close();
})();
How can I stop having empty "pageTimings" please ?
{"log":{"version":"1.2","creator":{"name":"chrome-har","version":"0.11.7","comment":"https://github.com/sitespeedio/chrome-har"},"pages":[{"id":"page_1","startedDateTime":"2020-04-29T14:00:06.681Z","title":"https://********/portail","**pageTimings":{}**}],"entries":[{"cache":{},"startedDateTime":"2020-04-29T14:00:06.682Z","_requestId":"5380B4DD08F8636FEB1F2EC3A5914CE6","_initialPriority":"VeryHigh","_priority":"VeryHigh","pageref":"page_1","request":{"method":"GET","url":
Is there a chance we can have types for the HAR return value
https://github.com/micmro/har-format-ts-declaration
I think we miss include requests that times out. Checkout this site:
http://results5.sitespeed.io/www.vansterpartiet.se/2018-08-29-06-12-25/pages/moderaterna.se/2.html#waterfall
The waterfall halts for a long time and we cannot see what happens. But looking at the Chrome trace log:
it's one request that takes 2 minutes.
entry.response.content.text
is missing, so the HAR is missing the response body.
It looks like the Chrome DevTools Protocol allows querying for the response body by requestId
, so this could be an approach.
When we navigate to two different pages (using Browsertime) we still just create one page (branch with example trace https://github.com/sitespeedio/chrome-har/tree/multi-page-har).
The problem is that we use the frame_id as the unique key when we create a page new page. But the frame_id is the same between navigation, so when we get new requests, we match that with the first page.
debug
version 4.1.1
is suffering from ReDos Vulnerability, see debug-js/debug#797. we'd better move away from it.
Hello,
I would like to create a new release for chrome-har with the fix for initiator ( See #9 )
Do I have to make a Pull Request with a Tag and ask you to merged it so that you can create the release
Or do I have to open an Issue to create a new release ?
Thanks 😄
The value field should be a string according to: https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/HAR/Overview.html#sec-object-types-params, instead it is an object.
This doesn't happen all the time, but sometimes it does. So if the value is a Json, I think it should be escaped, somehow, to not break the RFC.
I just replaced the title and url.
value [string, optional] - value of a posted parameter or content of a posted file.
{
"request": {
"method": "GET",
"postData": {
"params": [
{
"name": "feedUrls",
"value": {
"Title": "xxTitlexx",
"Url": "xxUrlxx",
"ImageHandling": 1
}
}
]
}
}
}
Original issue (sitespeedio/sitespeed.io#1467)
==========
Hello there,
I've been trying to run sitespeed on a page behind login (I believe that's all good now) but I am seeing different results (on requests numbers for example) between doing it "manually" in my Chrome browser (and looking in the Network tab) and when doing it via the sitespeed command.
I've setup a test page with a login / password for you to have a look here: http://lumsites-sandbox.appspot.com/a/lumapps/home/sitespeed-homepage-test-private (see credentials in the preScript login script I use).
Here's the command I use:
docker-compose run sitespeed.io --preScript sitespeed-login.js http://lumsites-sandbox.appspot.com/a/lumapps/home/sitespeed-homepage-test-private -n 3 -b chrome --browsertime.chrome.args no-sandbox --browsertime.chrome.args disk-cache-dir=/dev/null --browsertime.chrome.args disk-cache-size=1 --browsertime.chrome.args media-cache-size=1 --browsertime.pageCompleteCheck 'return (function() {try { return (Date.now() - window.performance.timing.loadEventEnd) > 15000;} catch(e) {} return true;})()' --graphite.host=graphite --summary-detail
These are the versions it uses on my machine:
Versions OS: linux 4.9.8-moby nodejs: v6.9.1 sitespeed.io: 4.4.2 browsertime: 1.0.0-beta.25 coach: 0.31.0
Which gives me the following results:
Total requests 31
Image requests 3
CSS requests 4
Javascript requests 7
Font requests 0
When I do the same process manually (logging in, then going to the test page and looking up the numbers in Chrome > Network with cache disabled) I get this:
Total requests 51
Image requests 4
CSS requests 4
Javascript requests 21
Font requests 5
I suspect it has something to do with the 'warm cache' due to the login page visited during the preScript step, but if that's the case I can't see how to clear that cache completely before running the tests on my target page.
Thanks in advance for your help or any hint on what I may have done wrong.
For info, I get different results depending on the version of the image of sitespeed I specify in my docker-compose file. The 'Transfer size' values look especially different.
==========
sitespeed 4.0.3
Score / Metric Median
✗ Overall score 67
✗ Performance score 65
✗ Accessibility score 65
✗ Best Practice score 76
√ Fast Render advice 95
! Avoid scaling images advice 90
√ Compress assets advice 100
! Optimal CSS size advice 90
✗ Total size (transfer) 1.9 MB
√ Image size (transfer) 642.6 KB
✗ Javascript size (transfer) 1.1 MB
✗ CSS size (transfer) 97.0 KB
Total requests 43
Image requests 13
CSS requests 1
Javascript requests 11
Font requests 0
200 responses 43
Domains per page 7
Cache time 10 minutes
Time since last modification -1 second
RUM Speed Index 5121
First Paint 4840 ms
Backend Time 1530 ms
Frontend Time 7155 ms
==========
sitespeed 4.5.1
Score / Metric Median
✗ Overall score 69
✗ Performance score 68
✗ Accessibility score 65
✗ Best Practice score 76
√ Fast Render advice 95
! Avoid scaling images advice 90
✗ Compress assets advice 80
! Optimal CSS size advice 90
√ Total size (transfer) 987.1 KB
√ Image size (transfer) 664.9 KB
✗ Javascript size (transfer) 126.5 KB
✗ CSS size (transfer) 113.3 KB
Total requests 46
Image requests 17
CSS requests 4
Javascript requests 7
Font requests 0
200 responses 46
Domains per page 8
Cache time 45 minutes
Time since last modification -1 second
RUM Speed Index 4440
First Paint 2015 ms
Backend Time 1389 ms
Frontend Time 6617 ms
Hi,
We are using chrome har creator in out production env. If we enable cached entries in code, it is the case that receive time for cached entries seems to be wrong.
If we get the har file manually from chrome, receive time is always within few milliseconds. Not sure what the issue is. Does anyone has the same issue.
Bellow example is based on code found here : https://michaljanaszek.com/blog/generate-har-with-puppeteer
Page section is showing first ressource loaded and not the base url.
Example with en.wikipedia.org
Consequently, I can't find initial page or ressource in the HAR.
chrome-har : 0.3.1
puppeteer : 1.2.0 (with chromium V 67.0.3372.0)
Currently in the outputted HAR, the response.content
object does not include the actual content, which is how the HAR typically looks when exported from the browser. Instead there is a compression
value and no text
value. Is there a way to enable this or implement in the code?
"response": {
"httpVersion": "h2",
"redirectURL": "",
"status": 200,
"statusText": "",
"content": {
"mimeType": "text/html",
"size": 1135094,
"compression": 979339
},
Running on Browsertime master in Docker with Chrome 66 it seems like we miss out the base URL and that request? Running locally on my machine with Chrome 65 looks ok. Download the JSON:
http://webpagereplay-wikimedia.s3-website-us-east-1.amazonaws.com/?prefix=enwiki-test/mobile/chrome/200/Sweden/2018-03-27-13-46/
Seem requestIntercepted has been deprecated. Is there any way for this library to work with responseReceived? It contains response and request information. All you would need for capture is:
client.on('Network.responseReceived', async (params: any) => { let method = "Network.responseReceived"; harEvents.push({ method, params }); });
However when harFromMessages is called, it gets no events and no pages, basically empty. Thoughts?
This issue is copied from sitespeedio/sitespeed.io#3132
Hi, my targeted website includes some pictures that are blocked by the firewall, so that there some errors like "_error": "net::ERR_TIMED_OUT" or "_error": "net::ERR_CONNECTION_RESET" in the HAR file generated by the Chrome DevTools.
But I can not find any such errors in the HAR file generated by the sitespeed.io on the same website pages while there is this error:
ERROR: Failed waiting on page to finished loading, timed out after 300000 ms BrowserError: Running page complete check
Is it possible to ask the sitespeed.io tool to include the "_error" in the HAR file?
On version 0.13.2, getting this error fairly consistently:
RangeError: Invalid time value
at Date.toISOString ()
at M.m.toISOString (/app/node_modules/chrome-har/node_modules/dayjs/dayjs.min.js:1:6270)
at module.exports (/app/node_modules/chrome-har/lib/entryFromResponse.js:185:53)
at harFromMessages (/app/node_modules/chrome-har/index.js:410:15)
My netsniffPrototype.js :
await this.page.goto(url, {timeout: 60000});
let cssSelectorconsent = '#consent-notice-agree-button';
let cookieAcceptBtn = await this.page.waitForSelector(cssSelectorconsent, { timeout: 60000, visible: true });
await cookieAcceptBtn.click();
await this.page.setCacheEnabled(false);
await this.harStart();
await this.page.reload({waituntil: networkidle2, timeout: 60000});
await this.harStop();
Add console.log in chrome-har/lib/entryFromResponse.js
const entrySecs =
page.__wallTime + (timing.requestTime - page.__timestamp);
console.log('page.__wallTime = ' + page.__wallTime + ' timing.requestTime = ' + timing.requestTime + ' page.__timestamp = ' + page.__timestamp)
entry.startedDateTime = dayjs.unix(entrySecs).toISOString();
console.log('dayjs.unix(entrySecs).toISOString() = ' + dayjs.unix(entrySecs).toISOString())
Stdout results (Python call nodejs script using subprocess.run() ) :
2020-02-24:12:39:28,092 INFO [netsniffPrototype.js:296] <Thread(Thread-1, started daemon 140015018862336)> Stop Har events
page.__wallTime = 1582544363.28644
timing.requestTime = 6971.141358
page.__timestamp 6971.138063
dayjs.unix(entrySecs).toISOString() = 2020-02-24T11:39:23.289Z
page.__wallTime = undefined
timing.requestTime = 6971.149286
page.__timestamp = undefined
2020-02-24:12:39:28,101 ERROR [netsniffPrototype.js:303] <Thread(Thread-1, started daemon 140015018862336)> Process exitHarNotStopped event Har not stopped properly !
e.stack => RangeError: Invalid time value
at Date.toISOString ()
at h.d.toISOString (/home/pptruser/node_modules/dayjs/dayjs.min.js:1:6372)
at module.exports (/home/pptruser/node_modules/chrome-har/lib/entryFromResponse.js:161:53)
at harFromMessages (/home/pptruser/node_modules/chrome-har/index.js:310:15)
at PuppeteerHar.stop (/home/pptruser/node_modules/puppeteer-har/lib/PuppeteerHar.js:109:21)
at runMicrotasks ()
at processTicksAndRejections (internal/process/task_queues.js:97:5)
Hello,
I am using chrome-har as a npm package to extract har from my puppeteer scenarios.
However, I can't see the requests served from cache when I reload any page.
Here is a quick check-list :
I tested with a simple scenario : a load of https://google.fr as a first step, and then a reload of the page.
You can find attached the waterfalls/har of both steps :
Google_test_waterfall_cache.zip
For example the google logo is not present in the second capture
(https://www.gstatic.com/images/branding/googlelogo/1x/googlelogo_color_92x36dp.png)
Here are some log details that could be useful :
chrome-har_cache.zip
Perhaps I misunderstood and chrome-har cannot manage requests served from cache ?
The code seems to handle Network.requestServedFromCache, and I'm pretty sure I've already recorded a har which displayed cached ressources !
Don't hesitate if you want a full repro demo with puppeteer.
Thanks a lot, have a good day.
It looks like when a har is created from the messages any entry that is a responseFailed
is filtered out of the entry list. For my usecase I would like to know when a fired requests returns say a 403
. Is this possible in some way?
I'm trying to automate a test with puppeteer where I generate a har file after each interaction step.
Since it is an SPA I do not get Page navigation events. Because of that I do not get any results except for the intial page where I do have a page event. During debugging I could see that the events are captured but the nr. of pages is 0.
It also does not seem to be related to puppeteer because I checked the events when manually navigating the page.
Any hints what I can do?
I can maybe clone the code and throw out all page event handling, because effectively I do not need those events.
Any better option?
Hi everyone,
I already described the situation in this thread:
sitespeedio/browsertime#1664 (comment)
but, anyway, to summarize the problem, what I want to do is to visit a page for 9 minutes, and, on this page, there is a video provided using Nginx server and dash player.
Everything seems to work well in every scenario except for what concerns http3 using tc (traffic controller - Linux) with 10Mbit Up/Down constraints.
In particular, I can see on the nginx server-side that browsertime executes correctly 150 (almost) requests but on the har file I'm able to see only the first 60.
I don't know why browsertime seems to stop registering requests after the first 60.
Also looking at the pcap (that is not attached cause dimension) the experiment seems fine, the problem is only in missing data in the .har file
To conclude, I attach the result obtained in one of these cases, in the zip folder there are both .har and perfLogs of chrome.
I am really grateful if anyone can help me.
Really thanks,
Gianluca
On a page with verified early hints requests we can see requests initiated by early hints:
Running that page with npx browsertime --chrome.collectPerfLog -n 1 $URL
and inspecting the HAR we can see:
.css
requests.css
stylesheetsInspecting the devtools log I can see that the request made to master-cb9e9afd3d.css
(request ID: 87395.8) has the expected Network.requestWillBeSent
, Network.responseReceived
, Network.dataReceived
, Network.loadingFinished
events.
Curiously, I noticed that both fromDiskCache
and fromEarlyHints
are both true
on the Network.responseReceived
request:
In checking entryFromResponse.js
we can see that fromDiskCache
and __servedFromCache
are used to gate functionality:
chrome-har/lib/entryFromResponse.js
Line 93 in 2006c4d
chrome-har/lib/entryFromResponse.js
Line 180 in 2006c4d
I suspect we'll want to check request entries to see if they've been loaded by early request hints.
In case we need it, there's a Network.responseReceivedEarlyHints
event.
I'm using chrome-har in my Chrome Extension, all i want is to record a pice of network activity and generate a HAR file.
chrome.debugger.sendCommand({ tabId }, 'Network.enable', {}); chrome.debugger.sendCommand({tabId}, 'Page.enable', {});
As above, i enabled "Network" and "Page" event debugger whitch i investigated they are Required to get HAR data
Then
chrome.debugger.onEvent.addListener(debuggerEventListener);
i add an Listener to Listen the events on the page
and then
const debuggerEventListener = (debuggeeId, method, params) => { if (recording && debuggeeId.tabId === tabId) { if (method !== 'Network.responseReceived') { networkEvents.push({method, params}); } else { const requestId = params.requestId; if (params.response) { chrome.debugger.sendCommand({ tabId: tabId }, "Network.getResponseBody", { requestId: requestId }, function (responseBody) { if (responseBody) { if (responseBody.body) { params.response.body = responseBody.body; networkEvents.push({method, params}); } else { networkEvents.push({method, params}); } } else { console.error('Error fetching response body:', error); networkEvents.push({method, params}); } }); } else { networkEvents.push({method, params}); } } } }
I collect all the event in a Array and i past the Array to harFromMessages Fun
const har = harFromMessages(eventsArrayData, {includeTextFromResponseBody: true, includeResourcesFromDiskCache: true});
The problem is i got pages: [] and entries: [] in the last har result. why???
i printed the eventsArrayData in my console, there is Obviously so many events data collected, but harFromMessages Fun gives me two [] in the result. is seems that harFromMessages dose not push my given event, I'm very confused about this
Looking forward to your answer 🙏
This is the test code. It loads http://localhost:8080/
and then clicks on the only link on the page. The index.html
page has one IMG
tag, one IFRAME
tag and one A
tag.
const fs = require('fs');
const { promisify } = require('util');
const puppeteer = require('puppeteer');
const { harFromMessages } = require('chrome-har');
// list of events for converting to HAR
const events = [];
// event types to observe
const observe = [
'Page.loadEventFired',
'Page.domContentEventFired',
'Page.frameStartedLoading',
'Page.frameAttached',
'Network.requestWillBeSent',
'Network.requestServedFromCache',
'Network.dataReceived',
'Network.responseReceived',
'Network.resourceChangedPriority',
'Network.loadingFinished',
'Network.loadingFailed',
];
function wait(ms) {
return new Promise((resolve, reject) => setTimeout(resolve, ms));
}
(async () => {
const launchOptions = {
executablePath: "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
};
const browser = await puppeteer.launch(launchOptions);
const page = await browser.newPage();
// register events listeners
const client = await page.target().createCDPSession();
await client.send('Page.enable');
await client.send('Network.enable');
observe.forEach(method => {
client.on(method, params => {
events.push({ method, params });
});
});
// perform tests
await page.goto('http://localhost:8080/');
(await page.waitForSelector('a', { visible: true })).click();
await page.waitForNavigation({ waitUntil: 'networkidle2' });
await wait(2000);
await browser.close();
// convert events to HAR file
const har = harFromMessages(events);
await promisify(fs.writeFile)('localhost.har', JSON.stringify(har));
})();
<p><img src="One_black_Pixel.png?index">
<p><iframe src="iframe.html"></iframe>
<p><a href="index2.html">Next</a>
<img src="One_black_Pixel.png?iframe">
<p><img src="One_black_Pixel.png?index2">
<p><iframe src="iframe2.html"></iframe>
<img src="One_black_Pixel.png?iframe2">
Copied from here, https://en.wikipedia.org/wiki/File:One_black_Pixel.png.
The chrome-har
captured HAR shows all the requests collected as a single page. But there should be two pages, one for index.html
and one for index2.html
.
The HAR captured from Chrome devtools after performing the test manually, correctly shows two pages (albeit in the wrong order here).
INFO - b'[2017-06-29 21:33:28] INFO: Testing url https://www.macys.com/ run 1\n'
INFO - b'[2017-06-29 21:33:39] ERROR: SyntaxError: Unexpected token \x1f in JSON at position 0\n'
INFO - b'SyntaxError: Unexpected token \x1f in JSON at position 0\n'
INFO - b' at Object.parse (native)\n'
INFO - b' at parsePostData (/usr/src/app/node_modules/chrome-har/index.js:570:37)\n'
I've come across an issue where requests are being omitted from an iFrame in an iFrame.
I've created a reproducible case where an image (img_girl.jpg
) is missing from the generated HAR, even though it is in the devtools logs.
Going through the devtools log you can see that there are no Page.frameAttached
events for the second iframe, despite a Network.requestWillBeSent
event referring to the frameId
25A2EF16DA63D669E56F8639E5EFE72C
.
{
"method": "Network.requestWillBeSent",
"params": {
"requestId": "A96E1426CD51C4AC5E90C43EE09F3C23",
"loaderId": "A96E1426CD51C4AC5E90C43EE09F3C23",
"documentURL": "https://zfh61.csb.app/",
"request": {
"url": "https://zfh61.csb.app/",
"method": "GET",
"headers": {
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36",
"Referer": "https://deploy-preview-26--michaeldijkstra-netlify-test.netlify.app/"
},
"mixedContentType": "none",
"initialPriority": "VeryHigh",
"referrerPolicy": "no-referrer-when-downgrade"
},
"timestamp": 162823.851228,
"wallTime": 1590380010.679388,
"initiator": {
"type": "parser",
"url": "https://deploy-preview-26--michaeldijkstra-netlify-test.netlify.app/",
"lineNumber": 6
},
"type": "Document",
"frameId": "25A2EF16DA63D669E56F8639E5EFE72C",
"hasUserGesture": false
},
"source": {
"targetId": "B6C52BA4191283DB2B514A0AA6507C5A",
"sessionId": "1F88229A0F0DA12B6ADECDE9D3134F15"
}
}
This could well be a bug in Chromium as when I was trying to reproduce the issue I noticed that if the first iframe was on the same domain it would create the Page.frameAttached
event but when it's on a different domain it doesn't.
I still think it'd be good to show these events in the HAR as they are happening.
The block of code which ultimately removes the entry is during Network.responseReceived and it can't be matched to a page:
const frameId = rootFrameMappings.get(params.frameId) || params.frameId;
const page = pages.find((page) => page.__frameId === frameId);
if (!page) {
debug(
`Received network response for requestId ${params.requestId} that can't be mapped to any page.`
);
continue;
}
I noticed that when processing Network.requestWillBeSent the last page is used, which was added to support multi page hars in PR #30
const page = pages[pages.length - 1];
Do you think we can do something similar to this? Where we look for the page but fallback to the last page?
const page = pages.find((page) => page.__frameId === frameId) || pages[pages.length - 1];
I'm not sure of the reasoning behind filtering the events out or what the repercussions of doing this as a fallback would be.
Again, happy to work on a PR for this if we think there's a good way forward!
Hi guys,
First of all, thanks for your amazing work! That's huge !
I would like to determine from HAR generated file which request was preloaded (or not) during page loading.
Is there a way to determine it easily ?
Thanks for your help.
Best.
I have sent Page.enable
and Network.enable
and when I dump all events that I have received when loading example.com my events look like this:
[
{
"method": "Network.requestWillBeSent",
"params": {
"requestId": "B008822534A6FDC6F47E9345E71955AC",
"loaderId": "B008822534A6FDC6F47E9345E71955AC",
"documentURL": "https://example.com/",
"request": {
"url": "https://example.com/",
"method": "GET",
"headers": {
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-User": "?1"
},
"mixedContentType": "none",
"initialPriority": "VeryHigh",
"referrerPolicy": "no-referrer-when-downgrade"
},
"timestamp": 178604.513679,
"wallTime": 1573906240.497734,
"initiator": {
"type": "other"
},
"type": "Document",
"frameId": "9D57A9C83BE59C00C8A87B300ADA30B8",
"hasUserGesture": false
}
},
{
"method": "Network.responseReceivedExtraInfo",
"params": {
"requestId": "B008822534A6FDC6F47E9345E71955AC",
"blockedCookies": [],
"headers": {
"status": "200",
"content-encoding": "gzip",
"accept-ranges": "bytes",
"cache-control": "max-age=604800",
"content-type": "text/html; charset=UTF-8",
"date": "Sat, 16 Nov 2019 12:05:00 GMT",
"etag": "\"3147526947\"",
"expires": "Sat, 23 Nov 2019 12:05:00 GMT",
"last-modified": "Thu, 17 Oct 2019 07:18:26 GMT",
"server": "ECS (bsa/EB17)",
"vary": "Accept-Encoding",
"x-cache": "HIT",
"content-length": "648"
}
}
},
{
"method": "Network.responseReceived",
"params": {
"requestId": "B008822534A6FDC6F47E9345E71955AC",
"loaderId": "B008822534A6FDC6F47E9345E71955AC",
"timestamp": 178604.518433,
"type": "Document",
"response": {
"url": "https://example.com/",
"status": 200,
"statusText": "",
"headers": {
"status": "200",
"content-encoding": "gzip",
"accept-ranges": "bytes",
"cache-control": "max-age=604800",
"content-type": "text/html; charset=UTF-8",
"date": "Sat, 16 Nov 2019 12:05:00 GMT",
"etag": "\"3147526947\"",
"expires": "Sat, 23 Nov 2019 12:05:00 GMT",
"last-modified": "Thu, 17 Oct 2019 07:18:26 GMT",
"server": "ECS (bsa/EB17)",
"vary": "Accept-Encoding",
"x-cache": "HIT",
"content-length": "648"
},
"mimeType": "text/html",
"connectionReused": false,
"connectionId": 0,
"remoteIPAddress": "93.184.216.34",
"remotePort": 443,
"fromDiskCache": true,
"fromServiceWorker": false,
"fromPrefetchCache": false,
"encodedDataLength": 0,
"timing": {
"requestTime": 178604.514472,
"proxyStart": -1,
"proxyEnd": -1,
"dnsStart": -1,
"dnsEnd": -1,
"connectStart": -1,
"connectEnd": -1,
"sslStart": -1,
"sslEnd": -1,
"workerStart": -1,
"workerReady": -1,
"sendStart": 0.179,
"sendEnd": 0.179,
"pushStart": 0,
"pushEnd": 0,
"receiveHeadersEnd": 0.824
},
"protocol": "h2",
"securityState": "secure",
"securityDetails": {
"protocol": "TLS 1.3",
"keyExchange": "",
"keyExchangeGroup": "P-256",
"cipher": "AES_256_GCM",
"certificateId": 0,
"subjectName": "www.example.org",
"sanList": [
"www.example.org",
"example.com",
"example.edu",
"example.net",
"example.org",
"www.example.com",
"www.example.edu",
"www.example.net"
],
"issuer": "DigiCert SHA2 Secure Server CA",
"validFrom": 1543363200,
"validTo": 1606910400,
"signedCertificateTimestampList": [],
"certificateTransparencyCompliance": "unknown"
}
},
"frameId": "9D57A9C83BE59C00C8A87B300ADA30B8"
}
},
{
"method": "Page.frameStartedLoading",
"params": {
"frameId": "9D57A9C83BE59C00C8A87B300ADA30B8"
}
},
{
"method": "Page.frameNavigated",
"params": {
"frame": {
"id": "9D57A9C83BE59C00C8A87B300ADA30B8",
"loaderId": "B008822534A6FDC6F47E9345E71955AC",
"url": "https://example.com/",
"securityOrigin": "https://example.com",
"mimeType": "text/html"
}
}
},
{
"method": "Network.dataReceived",
"params": {
"requestId": "B008822534A6FDC6F47E9345E71955AC",
"timestamp": 178604.550809,
"dataLength": 1256,
"encodedDataLength": 0
}
},
{
"method": "Network.loadingFinished",
"params": {
"requestId": "B008822534A6FDC6F47E9345E71955AC",
"timestamp": 178604.516411,
"encodedDataLength": 0,
"shouldReportCorbBlocking": false
}
},
{
"method": "Page.domContentEventFired",
"params": {
"timestamp": 178604.55525
}
},
{
"method": "Page.loadEventFired",
"params": {
"timestamp": 178604.557057
}
},
{
"method": "Page.frameStoppedLoading",
"params": {
"frameId": "9D57A9C83BE59C00C8A87B300ADA30B8"
}
}
]
however, no entries are found:
{
"log": {
"version": "1.2",
"creator": {
"name": "chrome-har",
"version": "0.11.4",
"comment": "https://github.com/sitespeedio/chrome-har"
},
"pages": [],
"entries": []
}
}
I've come across an issue where headers are missing from requests and responses despite being sent by the browser.
There was an update in Chrome which introduced new protocol events that contain request information from network
service. This change introduced Network.requestWillBeSentExtraInfo and Network.responseReceivedExtraInfo events in the devtools log.
Comparing the HAR generated by the Chrome Browser to one from chrome-har
you can see the extra information is grouped into the request.
Using the request for a CSS file as an example, you can see in the Devtools logs that there is a Network.requestWillBeSentExtraInfo
and a Network.responseReceivedExtraInfo
but this extra information is not present in a har generated by chrome-har while it is present in a har generated by the chrome web inspector.
I'm happy to put together a PR to include this information in chrome-har
the same way it's included by the Chrome Browser.
When I include cached items in my HAR, I get very high receive times for anything cached. I think it's somehow relating when the request was first cached to now, because it increases every time I do a capture.
I fixed it in my project like so:
const timings = entry.timings || {};
if (entry.cache.beforeRequest) {
timings.receive = 0;
} else {
timings.receive = formatMillis(
(params.timestamp - entry._requestTime) * 1000 -
entry.__receiveHeadersEnd
);
}
I am happy to PR this, although I am not familiar enough with this project to know if it's the right thing to do or not.
P.S. This project is amazing. Doing this by hand would have been such a nightmare, after seeing all of the intricacies here!
vulnerability GHSA-72xf-g2v4-qvf3
Just curious.
When i try
const puppeteer = require('puppeteer');
const PuppeteerHar = require('puppeteer-har');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const har = new PuppeteerHar(page);
await har.start({ path: 'results.har' });
await page.goto('https://www.mbcreation.net/har/index.html');
await har.stop();
await browser.close();
})();
Response is
{"log":{"version":"1.2","creator":{"name":"chrome-har","version":"0.2.3","comment":"https://github.com/sitespeedio/chrome-har"},"pages":[],"entries":[]}}
Destination page is actually empty but i thought i would have load time for it anyway ?
<title> Har </title>Thanks for your answer / help
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.