Python - Scrapy get date picker values [Selenium or Scrapy-Splash] -


disclaimer: have searched , tried work examples found on so, have been unable achieve result seek.

i trying scrape values newspaperarchive.com, among these values dates(yeah, month & day) paper published. newspaperarchive uses date picker ui , loads content through javascript/ajax calls(not entirely sure).

i trying dates, newspaperarchive provides date picker , loads , marks date paper published.

what want find out , possibly understand is:

  1. if can achieved scrapy-splash.
  2. how can achieve selenium if scrapy-splash wouldn't work use case.
  3. a sample code can learn future cases more helpful.

here example page on newspaperarchive.com http://newspaperarchive.com/us/hawaii/honolulu/hawaiian-gazette/

values are: year = 1895 month = february days = 1, 5, 8, 12, 15, 19, 22, 26 , continue loop through dates year , other years available in date picker news paper.

class newspaperarchivespider(crawlspider): name = "newspaperarchive"  allowed_domains = ["newspaperarchive.com"]  paper_link = [         "http://newspaperarchive.com/us/alabama/rainsville/" ]  start_urls = [paper paper in paper_link]  rules = (      # parse page grab data     rule(linkextractor(restrict_xpaths=(         '//li[@class="blurlink"]/a[@href]')),         callback='parse_page', follow=true), )   def parse_page(self, response):     self.log('parsing data page %s' % (response.url) , log.info)      item = newspaperarchiveitem()     item['paper_name'] = response.xpath(             '//div[@class="newbrc"]//li[6]/text()').extract()      item['paper_state'] = response.xpath(             '//div[@class="newbrc"]//li[4]/a/text()').extract()      item['paper_city'] = response.xpath(             '//div[@class="newbrc"]//li[5]/a/text()').extract()      item['paper_dates'] = ' '.join(response.xpath(         '//div[@class="span7 banner-img-txt"]//h1/text()'         ).extract()).strip()      return item 

thanks taking time read. appreciated. note: open other methods can use achieve task.


Comments

Popular posts from this blog

php - How to display all orders for a single product showing the most recent first? Woocommerce -

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

angularjs - How restrict admin panel using in backend laravel and admin panel on angular? -