scrapy 2.3 項(xiàng)目截圖

2021-06-08 15:22 更新

這個例子演示了如何使用 coroutine syntax 在 ?process_item()? 方法。

此項(xiàng)管道向本地運(yùn)行的實(shí)例發(fā)出請求 Splash 呈現(xiàn)項(xiàng)目URL的屏幕截圖。下載請求響應(yīng)后，項(xiàng)目管道將屏幕截圖保存到文件中，并將文件名添加到項(xiàng)目中。

import hashlib
from urllib.parse import quote

import scrapy
from itemadapter import ItemAdapter

class ScreenshotPipeline:
    """Pipeline that uses Splash to render screenshot of
    every Scrapy item."""

    SPLASH_URL = "http://localhost:8050/render.png?url={}"

    async def process_item(self, item, spider):
        adapter = ItemAdapter(item)
        encoded_item_url = quote(adapter["url"])
        screenshot_url = self.SPLASH_URL.format(encoded_item_url)
        request = scrapy.Request(screenshot_url)
        response = await spider.crawler.engine.download(request, spider)

        if response.status != 200:
            # Error happened, return item.
            return item

        # Save screenshot to file, filename will be hash of url.
        url = adapter["url"]
        url_hash = hashlib.md5(url.encode("utf8")).hexdigest()
        filename = f"{url_hash}.png"
        with open(filename, "wb") as f:
            f.write(response.body)

        # Store filename in item.
        adapter["screenshot_filename"] = filename
        return item

以上內(nèi)容是否對您有幫助：

← scrapy 2.3 將項(xiàng)目寫入MongoDB

scrapy 2.3 重復(fù)篩選器 →

寫筆記

我要補(bǔ)充

scrapy 2.3 項(xiàng)目截圖

推薦文章

推薦教程

推薦課程