Puppeteer 框架

2020-06-29 14:18 更新

class:frame

class: Frame v0.9.0 在每一個時間點,頁面通過 page.mainFrame() 和 frame.childFrames() 方法暴露當前框架的細節(jié)。 Frame 對象的生命周期由 3 個事件控制,它們通過 page 對象監(jiān)聽:

  • 'frameattached' - 當框架被頁面加載時觸發(fā)。一個框架只會被加載一次。
  • 'framenavigated' - 當框架改變URL時觸發(fā)。
  • 'framedetached' - 當框架被頁面分離時觸發(fā)。一個框架只會被分離一次。 一個獲得框架樹的例子:

const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser = >{
    const page = await browser.newPage();
    await page.goto('https://www.google.com/chrome/browser/canary.html');
    dumpFrameTree(page.mainFrame(), '');
    await browser.close();
    function dumpFrameTree(frame, indent) {
        console.log(indent + frame.url());
        for (let child of frame.childFrames()) dumpFrameTree(child, indent + '  ');
    }
});

一個從 iframe 元素中獲取文本的例子:

const frame = page.frames().find(frame = >frame.name() === 'myframe');
const text = await frame.$eval('.selector', element = >element.textContent);
console.log(text);

Methods

  • frame.$(selector)v0.9.0
  • frame.$$(selector)v0.9.0
  • frame.$$eval(selector, pageFunction[, ...args])v0.9.0
  • frame.$eval(selector, pageFunction[, ...args])v0.9.0
  • frame.$x(expression)v0.9.0
  • frame.addScriptTag(options)v0.9.0
  • frame.addStyleTag(options)v0.9.0
  • frame.childFrames()v0.9.0
  • frame.click(selector[, options])v0.9.0
  • frame.content()v0.9.0
  • frame.evaluate(pageFunction, ...args)v0.9.0
  • frame.evaluateHandle(pageFunction, ...args)v0.9.0
  • frame.executionContext()v0.9.0
  • frame.focus(selector)v0.9.0
  • frame.goto(url, options)v0.9.0
  • frame.hover(selector)v0.9.0
  • frame.isDetached()v0.9.0
  • frame.name()v0.9.0
  • frame.parentFrame()v0.9.0
  • frame.select(selector, ...values)v0.9.0
  • frame.setContent(html)v0.9.0
  • frame.tap(selector)v0.9.0
  • frame.title()v0.9.0
  • frame.type(selector, text[, options])v0.9.0
  • frame.url()v0.9.0
  • frame.waitFor(selectorOrFunctionOrTimeout[, options[, ...args]])v0.9.0
  • frame.waitForFunction(pageFunction[, options[, ...args]])v0.9.0
  • frame.waitForNavigation(options)v0.9.0
  • frame.waitForSelector(selector[, options])v0.9.0
  • frame.waitForXPath(xpath[, options])v0.9.0-

Methods

frame.$(selector)v0.9.0

  • selector <string> Selector to query frame for
  • returns: <Promise<?ElementHandle>> Promise which resolves to ElementHandle pointing to the frame element. 這個方法在框架中查詢指定的選擇器。如果在框架中沒有匹配的元素會返回 null

frame.$$(selector)v0.9.0

  • selector <string> Selector to query frame for
  • returns: <Promise<Array<ElementHandle>>> Promise which resolves to ElementHandles pointing to the frame elements. 這個方法會在框架中執(zhí)行 document.querySelectorAll 方法。如果沒有元素匹配會返回 []

frame.$$eval(selector, pageFunction[, ...args])v0.9.0

  • selector <string> A selector to query frame for
  • pageFunction <function> Function to be evaluated in browser context
  • ...args <...Serializable|JSHandle> Arguments to pass to pageFunction
  • returns: <Promise<Serializable>> Promise which resolves to the return value of pageFunction 這個方法會在框架中執(zhí)行 Array.from(document.querySelectorAll(selector)) 方法,然后將返回值傳給 pageFunction 函數(shù)的第一個參數(shù)。 如果 pageFunction 返回了一個Promise,那么 frame.$$eval 將會等待Promise resolve之后返回它的值。 例子:
    const divsCounts = await frame.$$eval('div', divs => divs.length);

frame.$eval(selector, pageFunction[, ...args])v0.9.0

  • selector <string> A selector to query frame for
  • pageFunction <function> Function to be evaluated in browser context
  • ...args <...Serializable|JSHandle> Arguments to pass to pageFunction
  • returns: <Promise<Serializable>> Promise which resolves to the return value of pageFunction

這個方法會在框架中執(zhí)行 document.querySelector 方法,然后將返回值傳給 pageFunction 函數(shù)的第一個參數(shù)。如果沒有匹配到任何元素,則會拋出一個錯誤。 如果 pageFunction 返回了一個 Promise,那么 frame.$eval 將會等待 Promise 并解析后返回它的值。 例如:

const searchValue = await frame.$eval('#search', el = >el.value);
const preloadHref = await frame.$eval('link[rel=preload]', el = >el.href);
const html = await frame.$eval('.main-container', e = >e.outerHTML);

frame.$x(expression)v0.9.0

  • expression <string> Expression to - evaluate.
  • returns: <Promise<Array<ElementHandle>>>

frame.$x(expression)v0.9.0 expression <string> Expression to evaluate. returns: <Promise<Array<ElementHandle>>>

這個方法用來執(zhí)行 XPath 表達式。

frame.addScriptTag(options)v0.9.0

  • options <Object>
    • url <string> URL of a script to be added.
    • path <string> Path to the JavaScript file to be injected into frame. If path is a relative path, then it is resolved relative to current working directory.
    • content <string> Raw JavaScript content to be injected into frame.
    • type <string> Script type. Use 'module' in order to load a Javascript ES6 module. See script for more details.
  • returns: <Promise<ElementHandle>> which resolves to the added tag when the script's onload fires or when the script content was injected into frame.

將 url 或腳本內(nèi)容添加到 <script> 標簽中。

frame.addStyleTag(options)v0.9.0

  • options <Object>

    • url <string> URL of the <link> tag.
    • path <string> Path to the CSS file to be injected into frame. If path is a relative path, then it is resolved relative to current working directory.
    • content <string> Raw CSS content to be injected into frame.
    • returns: <Promise<ElementHandle>> which resolves to the added tag when the stylesheet's onload fires or when the CSS content was injected into frame.

    根據(jù)樣式路徑或內(nèi)容往頁面中添加 <link rel="stylesheet"> 或 <style type="text/css"> 樣式標簽。

    frame.childFrames()v0.9.0

    returns: <Array<Frame>>

frame.click(selector[, options])v0.9.0

  • selector <string> A selector to search for element to click. If there are multiple elements satisfying the selector, the first will be clicked.
  • options <Object>
    • button <string> left, right, or middle, defaults to left.
    • clickCount <number> defaults to 1. See UIEvent.detail.
    • delay <number> Time to wait between mousedown and mouseup in milliseconds. Defaults to 0.
  • returns: <Promise> Promise which resolves when the element matching selector is successfully clicked. The Promise will be rejected if there is no element matching selector

    這個方法選擇傳入的元素,如果必要的話會將元素滾動到可視區(qū)域,之后使用 page.mouse 點擊元素的內(nèi)容。如果沒有匹配到元素,會拋出異常。 注意:如果 click() 觸發(fā)了導(dǎo)航事件,那么就會有一個由 page.waitForNavigation() 產(chǎn)生的 promise 要被解析,你可能會得到一個 promise 的競爭狀態(tài)。正確的處理 click 和 wait for navigation 的方式如下:

const [response] = await Promise.all([ page.waitForNavigation(waitOptions), frame.click(selector, clickOptions),]);

frame.content()v0.9.0

  • returns: <Promise<String>> 獲取框架完整的HTML內(nèi)容,包括 doctype。

frame.evaluate(pageFunction, ...args)v0.9.0

  • pageFunction <function|string> Function to be evaluated in browser context
  • ...args <...Serializable|JSHandle> Arguments to pass to pageFunction
  • returns: <Promise<Serializable>> Promise which resolves to the return value of pageFunction 如果傳給 frame.evaluate 的函數(shù)返回了一個 promise,那么 frame.evaluate 將會等到 promise resolve 時返回它的值。 如果傳給 frame.evaluate 的函數(shù)返回了一個非序列化的值,那么 frame.evaluate 將返回 undefined

const result = await frame.evaluate(() => {  return Promise.resolve(8 * 7);});
console.log(result); // 輸出 "56"

也可以給函數(shù)傳遞字符串。

console.log(await frame.evaluate('1 + 2')); // 輸出 "3"

ElementHandle 實例也可以作為

frame.evaluate 的參數(shù):
const bodyHandle = await frame.$('body');
const html = await frame.evaluate(body => body.innerHTML, bodyHandle);
await bodyHandle.dispose();

frame.evaluateHandle(pageFunction, ...args)v0.9.0

  • pageFunction <function|string> Function to be evaluated in the page context
  • ...args <...Serializable|JSHandle> Arguments to pass to pageFunction
  • returns: <Promise<JSHandle>> Promise which resolves to the return value of pageFunction as in-page object (JSHandle) frame.evaluate 和 frame.evaluateHandle 唯一的不同是 frame.evaluateHandle 返回頁面對象(JSHandle)。 如果傳給 frame.evaluateHandle的函數(shù)返回了一個 Promise,那么 frame.evaluateHandle 將會等到 promise resolve 時返回它的值。

const aWindowHandle = await frame.evaluateHandle(() => Promise.resolve(window));aWindowHandle; // Handle for the window object.

也可以向函數(shù)傳遞字符串。

const aHandle = await frame.evaluateHandle('document'); 
// Handle for the 'document'.

JSHandle 實例也可以作為 frame.evaluateHandle 的參數(shù):

const aHandle = await frame.evaluateHandle(() = >document.body);
const resultHandle = await frame.evaluateHandle(body = >body.innerHTML, aHandle);
console.log(await resultHandle.jsonValue());
await resultHandle.dispose();

frame.executionContext()v0.9.0 returns: <Promise<ExecutionContext>> 返回解析為框架的默認執(zhí)行上下文的 promise。

frame.focus(selector)v0.9.0

  • selector <string> 一個選擇器元素。A selector of an element to focus. If there are multiple elements satisfying the selector, the first will be focused.
  • returns: <Promise> Promise which resolves when the element matching selector is successfully focused. The promise will be rejected if there is no element matching selector. 這個方法選擇傳入的元素并且使之獲得焦點。如果沒有匹配到元素,會拋出異常。

frame.goto(url, options)v0.9.0

  • url <string> URL to navigate frame to. The url should include scheme, e.g. https://.
  • options <Object> Navigation parameters which might have the following properties:
    • timeout <number> Maximum navigation time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout. The default value can be changed by using the page.setDefaultNavigationTimeout(timeout) method.
    • waitUntil <string|Array<string>> When to consider navigation succeeded, defaults to load. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:
      • load - consider navigation to be finished when the load event is fired.
      • domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
      • networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
      • networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
    • referer <string> Referer header value. If provided it will take preference over the referer header value set by page.setExtraHTTPHeaders().
  • returns: <Promise<?Response>> Promise which resolves to the main resource response. In case of multiple redirects, the navigation will resolve with the response of the last redirect.

如果存在下面的情況 frame.goto 將會拋出錯誤:

  • SSL 錯誤 (e.g. in case of self-
  • signed certificates).
  • 目標 URL 不可用。
  • 導(dǎo)航過程中 timeout 被觸發(fā)。
  • 主資源加載失敗。

注意frame.goto拋出或返回一個主資源響應(yīng)。 唯一的例外是導(dǎo)航到about:blank或?qū)Ш降骄哂胁煌?hash 的相同URL,這將成功并返回null。
注意 無頭模式將不支持導(dǎo)航到一個 PDF 文檔。具體見 upstream issue.

frame.hover(selector)v0.9.0

  • selector <string> A selector to search for element to hover. If there are multiple elements satisfying the selector, the first will be hovered.
  • returns: <Promise> Promise which resolves when the element matching selector is successfully hovered. Promise gets rejected if there's no element matching selector

這個方法選擇傳入的元素,如果必要的話會滾動到視野區(qū)域中,然后使用 page.mouse 方法將鼠標懸浮在元素的中心。 如果沒有匹配到元素,會拋出異常。

frame.isDetached()v0.9.0

  • returns: <boolean> 如果框架不被加載了返回 true,否則返回 false。

frame.name()v0.9.0

returns: <string> 返回框架在標簽中指定的 name 屬性。 如果 name 為空,返回 id。

注意 這個值在框架創(chuàng)建的時侯就就計算好了,如果之后修改屬性的話不會更新。

frame.parentFrame()v0.9.0

  • returns: <?Frame> Returns parent frame, if any. Detached frames and main frames return null.

frame.select(selector, ...values)v0.9.0

  • selector <string> A selector to query frame for
  • ...values <...string> Values of options to select. If the <select> has the multiple attribute, all values are considered, otherwise only the first one is taken into account.
  • returns: <Promise<Array<string>>> Returns an array of option values that have been successfully selected. 下拉框一旦選擇了所提供的選項,change 和 input 事件將會被觸發(fā)。 如果沒有匹配到下拉框,會拋出異常。

frame.select('select#colors', 'blue'); // 單選
frame.select('select#colors', 'red', 'green', 'blue'); // 多選

frame.setContent(html)v0.9.0

html <string> HTML markup to assign to the page. returns: <Promise>

frame.tap(selector)v0.9.0

selector <string> A selector to search for element to tap. If there are multiple elements satisfying the selector, the first will be tapped. returns: <Promise> 這個方法選擇傳入的元素,如果必要的話會滾動到視野區(qū)域中,然后使用 page.touchscreen 方法單擊元素中心。 如果沒有匹配到元素,會拋出異常。

frame.title()v0.9.0

  • returns: <Promise<string>> Returns page's title.

  • frame.type(selector, text[, options])v0.9.0
  • selector <string> A selector of an element to type into. If there are multiple elements satisfying the selector, the first will be used.
  • text <string> A text to type into a focused element.
  • options <Object> delay <number> Time to wait between key presses in milliseconds. Defaults to 0.
  • returns: <Promise>

對于每一個文本中的字符執(zhí)行 keydown、keypress / input, 和 keyup 事件
如果要輸入特殊按鍵,比如 Control 或者 ArrowDown,使用 keyboard.press。
frame.type('#mytextarea', 'Hello'); // 立即輸入frame.type('#mytextarea', 'World', {delay: 100}); // 延遲輸入, 操作更像用戶



frame.url()v0.9.0

  • returns: <string> 返回框架的 url。

frame.waitFor(selectorOrFunctionOrTimeout[, options[, ...args]])v0.9.0

  • selectorOrFunctionOrTimeout <string|number|function> A selector, predicate or timeout to wait for
  • options <Object> Optional waiting parameters
  • ...args <...Serializable|JSHandle> Arguments to pass to pageFunction
  • returns: <Promise<JSHandle>> Promise which resolves to a JSHandle of the success value 這個方法根據(jù)第一個參數(shù)類型的不同起到不同的作用:

  • 如果 selectorOrFunctionOrTimeout 是 string,那么第一個參數(shù)會被當作 selector 或者 xpath,取決于是不是以//開頭的,這是 frame.waitForSelector 或 frame.waitForXPath 的快捷方式。
  • 如果 selectorOrFunctionOrTimeout 是 function,那么第一個參數(shù)會當作條件等待觸發(fā),這是 frame.waitForFunction() 的快捷方式。
  • 如果 selectorOrFunctionOrTimeout 是 number,那么第一個參數(shù)會被當作毫秒為單位的時間,方法會在超時之后返回 promise。
  • 其他類型,將會拋出錯誤。

// wait for selectorawait page.waitFor('.foo');// wait for 1 secondawait page.waitFor(1000);// wait for predicateawait page.waitFor(() => !!document.querySelector('.foo'));

將 node.js 中的參數(shù)傳遞給 page.waitFor 函數(shù):

const selector = '.foo';
await page.waitFor(selector => !!document.querySelector(selector), {}, selector);

frame.waitForFunction(pageFunction[, options[, ...args]])v0.9.0

  • pageFunction <function|string> Function to be evaluated in browser context
  • options <Object> Optional waiting parameters
    • polling <string|number> An interval at which the pageFunction is executed, defaults to raf. If polling is a number, then it is treated as an interval in milliseconds at which the function would be executed. If polling is a string, then it can be one of the following values:
    • raf - to constantly execute pageFunction in requestAnimationFrame callback. This is the tightest polling mode which is suitable to observe styling changes.
    • mutation - to execute pageFunction on every DOM mutation.
    • timeout <number> maximum time to wait for in milliseconds. Defaults to 30000 (30 seconds). Pass 0 to disable timeout.
  • ...args <...Serializable|JSHandle> Arguments to pass to pageFunction
  • returns: <Promise<JSHandle>> Promise which resolves when the pageFunction returns a truthy value. It resolves to a JSHandle of the truthy value. waitForFunction 可以用來觀察可視區(qū)域大小是否改變。

const puppeteer = require('puppeteer');puppeteer.launch().then(async browser => {  const page = await browser.newPage();  const watchDog = page.mainFrame().waitForFunction('window.innerWidth < 100');  page.setViewport({width: 50, height: 50});  await watchDog;  await browser.close();});

將 node.js 中的參數(shù)傳遞給



page.waitForFunction 函數(shù):
const selector = '.foo';
await page.waitForFunction(selector => !!document.querySelector(selector), {}, selector);

frame.waitForNavigation(options)v0.9.0

  • options <Object> Navigation parameters which might have the following properties:
    • timeout <number> Maximum navigation time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout. The default value can be changed by using the page.setDefaultNavigationTimeout(timeout) method.
    • waitUntil <string|Array<string>> When to consider navigation succeeded, defaults to load. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:
    • load - consider navigation to be finished when the load event is fired.
    • domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
    • networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
    • networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
  • returns: <Promise<[?Response]>> Promise which resolves to the main resource response. In case of multiple redirects, the navigation will resolve with the response of the last redirect. In case of navigation to a different anchor or navigation due to History API usage, the navigation will resolve with null

這個例子:

const [response] = await Promise.all([  frame.waitForNavigation(), 
// The navigation promise resolves after navigation has finished  frame.click('a.my-link'),
// Clicking the link will indirectly cause a navigation]);

注意 使用 History API 去改變 URL 將會被認為是導(dǎo)航。

frame.waitForSelector(selector[, options])v0.9.0

  • selector <string> A selector of an element to wait for
  • options <Object> Optional waiting parameters
    • visible <boolean> wait for element to be present in DOM and to be visible, i.e. to not have display: none or visibility: hidden CSS properties. Defaults to false.
    • hidden <boolean> wait for element to not be found in the DOM or to be hidden, i.e. have display: none or visibility: hidden CSS properties. Defaults to false.
    • timeout <number> maximum time to wait for in milliseconds. Defaults to 30000 (30 seconds). Pass 0 to disable timeout.
  • returns: <Promise<ElementHandle>> Promise which resolves when element specified by selector string is added to DOM. 等待被選擇等待元素出現(xiàn)在頁面中。如果調(diào)用時選擇的元素已存在,則立即返回。如果在設(shè)定的毫秒時間之后還沒有出現(xiàn),則拋出異常。 這個方法可以在切換導(dǎo)航時使用:

const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser = >{
    const page = await browser.newPage();
    let currentURL;
    page.mainFrame().waitForSelector('img').then(() = >console.log('First URL with image: ' + currentURL));
    for (currentURL of['https://example.com', 'https://google.com', 'https://bbc.com']) await page.goto(currentURL);
    await browser.close();
});

frame.waitForXPath(xpath[, options])v0.9.0

  • xpath <string> A xpath of an element to wait for
  • options <Object> Optional waiting parameters
    • visible <boolean> wait for element to be present in DOM and to be visible, i.e. to not have display: none or visibility: hidden CSS properties. Defaults to false.
    • hidden <boolean> wait for element to not be found in the DOM or to be hidden, i.e. have display: none or visibility: hidden CSS properties. Defaults to false.
    • timeout <number> maximum time to wait for in milliseconds. Defaults to 30000 (30 seconds). Pass 0 to disable timeout.
  • returns: <Promise<ElementHandle>> Promise which resolves when element specified by xpath string is added to DOM. 等待 xpath 出現(xiàn)在頁面中。如果在調(diào)用函數(shù)的時候 xpath 已經(jīng)存在,會立即返回。如果在設(shè)定的毫秒時間之后還沒有出現(xiàn),則拋出異常。 這個方法可以在切換導(dǎo)航時使用:
    
    const puppeteer = require('puppeteer');
    puppeteer.launch().then(async browser = >{
    const page = await browser.newPage();
    let currentURL;
    page.mainFrame().waitForXPath('//img').then(() = >console.log('First URL with image: ' + currentURL));
    for (currentURL of['https://example.com', 'https://google.com', 'https://bbc.com']) await page.goto(currentURL);
    await browser.close();
    });

以上內(nèi)容是否對您有幫助:
在線筆記
App下載
App下載

掃描二維碼

下載編程獅App

公眾號
微信公眾號

編程獅公眾號