Python Web Crawlers （1）-Eva

Python Web Crawlers （1）

Web Crawlers refer to programs or scripts that automatically grab information from the network according to certain rules. The World Wide Web is like a huge spider web. Our crawler is a spider on it, constantly grabbing the information we need.

Crawl

analysis

storage

Basic crawling operations

1. urllib

In Python 2.x, we can use urllib or urllib2 for web crawling, but in Python 3.x, urllib2 is removed. Can only be operated through urllib

import urllib.request

response = urllib.request.urlopen('https://mywaythere-ictsing.lofter.com')

print(response.read().decode('utf-8'))

Urllib with parameters

url = 'https://mywaythere-ictsing.lofter.com'

url = url + '?' + key + '=' + value1 + '&' + key2 + '=' + value2

2. requests

The requests library is a very practical HTPP client library and the most commonly used library for crawling operations. Requests library meets many needs

import requests

response = requests.get(url='https://mywaythere-ictsing.lofter.com')

print(response.text)

response = requests.get(url='https://mywaythere-ictsing.lofter.com', params={'key1':'value1', 'key2':'value2'})

Eva

Python Web Crawlers （1）

评论

热度(9)