Anonymous ID: c0089e Oct. 11, 2018, 2:50 p.m. No.3443046   🗄️.is 🔗kun   >>5013

Learning Python and starting to use Scrapy. (Not Scapy - packet crafting.)

One goal: Basically I want to copy as many QResearch breads as possible, offline viewable with everything in place, minus vids (links for those instead).

 

Can do:

scrapy shell 'https://8ch.net/qresearch/res/2352371.html'

or

>fetch(https://8ch.net/qresearch/res/2352371.html)

then

>>view(response)

 

These clones function fine as long as I have an internet connection, but because the CSS isn't copied it doesn't look the same offline and images aren't included. Regarding the CSS, I could copy the CSS file and edit the HTML to point to the local file, but I'd like to understand Scrapy and Python better while learning to automate the boring stuffs.

 

What I would like is a basic spider for Scrapy which will copy the bread including images, links in place of vid image, name file with bread name and include the CSS, save it to a file. Maybe create dir, copy CSS, modify html to point to new CSS local, if exist then ignore CSS chain.

Is including the CSS in the HTML inline possible with Scrapy?

Which is simpler for Py n00b?

 

A more advanced question, how to not include shill posts in the bread copy?

 

I've tried modifying existing examples but they don't work after mods. (Scrapy.org examples)

Any Links for lists of Scrapy spider examples and descriptions of their functions?