scrapy 2.3 從spiders調(diào)用shell來檢查響應

2021-06-09 10:08 更新

有時，您希望檢查在您的蜘蛛的某個點上正在處理的響應，如果只是檢查您期望的響應是否到達那里的話。

這可以通過使用 ?scrapy.shell.inspect_response? 功能。

下面是一個例子，說明如何從您的蜘蛛中命名它：

import scrapy


class MySpider(scrapy.Spider):
    name = "myspider"
    start_urls = [
        "http://example.com",
        "http://example.org",
        "http://example.net",
    ]

    def parse(self, response):
        # We want to inspect one specific response.
        if ".org" in response.url:
            from scrapy.shell import inspect_response
            inspect_response(response, self)

        # Rest of parsing code.

當你運行蜘蛛時，你會得到類似的東西：

2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.org> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x1e16b50>
...

>>> response.url
'http://example.org'

然后，可以檢查提取代碼是否正常工作：

>>> response.xpath('//h1[@class="fn"]')
[]

不，它不是。所以您可以在web瀏覽器中打開響應，看看它是否是您期望的響應：

>>> view(response)
True

最后，單擊ctrl-d（或在Windows中單擊ctrl-z）退出shell并繼續(xù)爬網(wǎng)：

>>> ^D
2014-01-23 17:50:03-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
...

請注意，您不能使用 ?fetch? 這里的快捷方式，因為報廢的引擎被外殼擋住了。然而，當你離開外殼后，蜘蛛會繼續(xù)在它停止的地方爬行，如上圖所示。

以上內(nèi)容是否對您有幫助：

← scrapy 2.3 Shell會話示例

scrapy 2.3 項目管道 →

寫筆記

我要補充

scrapy 2.3 從spiders調(diào)用shell來檢查響應

推薦文章

推薦教程

推薦課程