Python Web Crawlers (1)

Web Crawlers refer to programs or scripts that automatically grab information from the network according to certain rules. The World Wide Web is like a huge spider web. Our crawler is a spider on it, constantly grabbing the information we need.


Crawl

analysis

storage


Basic crawling operations

1. urllib

In Python 2.x, we can use urllib or urllib2 for web crawling, but in Python 3.x, urllib2 is removed. Can only be operated through urllib


import urllib.request

 

response = urllib.request.urlopen('https://mywaythere-ictsing.lofter.com')

print(response.read().decode('utf-8'))


Urllib with parameters


url = 'https://mywaythere-ictsing.lofter.com'

url = url + '?' + key + '=' + value1 + '&' + key2 + '=' + value2



2. requests

The requests library is a very practical HTPP client library and the most commonly used library for crawling operations. Requests library meets many needs


import requests

response = requests.get(url='https://mywaythere-ictsing.lofter.com') 

print(response.text)   

response = requests.get(url='https://mywaythere-ictsing.lofter.com', params={'key1':'value1', 'key2':'value2'})


评论

热度(9)