Web Crawlers refer to programs or scripts that automatically grab information from the network according to certain rules. The World Wide Web is like a huge spider web. Our crawler is a spider on it, constantly grabbing the information we need.
Crawl
analysis
storage
Basic crawling operations
1. urllib
In Python 2.x, we can use urllib or urllib2 for web crawling, but in Python 3.x, urllib2 is removed. Can only be operated through urllib
import urllib.request
response = urllib.request.urlopen('https://mywaythere-ictsing.lofter.com')
print(response.read().decode('utf-8'))
Urllib with parameters
url = 'https://mywaythere-ictsing.lofter.com'
url = url + '?' + key + '=' + value1 + '&' + key2 + '=' + value2
2. requests
The requests library is a very practical HTPP client library and the most commonly used library for crawling operations. Requests library meets many needs
import requests
response = requests.get(url='https://mywaythere-ictsing.lofter.com')
print(response.text)
response = requests.get(url='https://mywaythere-ictsing.lofter.com', params={'key1':'value1', 'key2':'value2'})
评论