基本信息
源码名称:《用python写网络爬虫》随书源码
源码大小:3.57M
文件格式:.zip
开发语言:Python
更新时间:2019-07-26
   友情提示:(无需注册或充值,赞助后即可获取资源下载链接)

     嘿,亲!知识可是无价之宝呢,但咱这精心整理的资料也耗费了不少心血呀。小小地破费一下,绝对物超所值哦!如有下载和支付问题,请联系我们QQ(微信同号):813200300

本次赞助数额为: 2 元 
   源码介绍



目录
第1章 网络爬虫简介
1.1 网络爬虫何时有用.......................
1.2 网络爬虫是否合法 ……………………………………………………………………·2
1.3 背景调研…………………………………………………………………………………··3
1.3.1 检查 robots. t xt….......…...............................….............................3
1.3.2 检查网站 地图.........................…......................…........................4
1.3.3 估算网站 大小..........….......…………………………………………………5
1.3.4 识别网站 所用技术 …………………………………………………………·7
1.3.5 寻找网站 所有者….......…·· ………………………………………………··7
1.4 编写第一个网络爬虫…..........................................….........…·················8
1.4.1 下载网 页….........................…..........….........................................9
1.4.2 网站 地图爬虫….........................…...................…······················12
1.4.3 ID遍历爬虫·············································································13
1.4.4 链接爬虫………………………………………………………………………··15
1.
5 本章小结…··………………………………………………………………………………n
第2章 数据抓取 23
2.1 分析网页…….................................….........................….....................…·23
2.2 三种网页抓取方法......................................…..................................…·26
2.2.1 正则表达式……………………………………………………………………·26
目 录
2.2.2 B eau tifu l Soup ······· ··…........................................……················28
2.2.3 Lxml·………………………………………………………………·················30
2.2.4 性 能对 比.........................….........…................….........……········· 32
2.2.5 结论.........……..............................................................……· ·······35
2.2.6 为链接爬虫添加抓取回调… ....................................................35
2.3 本章小结 ….......… .......…………………………………………………………………38
第3章 下载缓存 39
3.1 为链接爬虫添加缓存支持…· ·················· ··…·…....................................39
3.2 磁盘缓存............................................................................................... 42
3.2.1 实现.........………………………………………………………………………·44
3.2.2 缓存测试..............……………………………………………………………46
3.2.3 节省磁盘空间…….......................….........…................……········ 46
3.2.4 清理过期数据…........................................................................47
3.2.5 缺点………………………………………………….......…........….........…· 48
3.3 数据库缓存...........................................................................................”
3.3.1 NoSQL是什么..............….......….......................……........…······50
3.3.2 安装M ongoDB …...................…….....................…····················50
3.3.3 M ongoDB概述.......….........….................…...........................…50
3.3.4 M ongoDB 缓存实现.................................…...............….......…52
3.3.5 压缩...............….........................................................................”
3.3.6 缓存测试…·”…....................…..........…........................…···········54
3.4 本章小结 ….........…............... ………………………………………………………妇
第4章 并发下载 57
4.1 100万个网页.................…................................…..............…................57
4.2 串行爬虫.........................................…..................…................…........…60
4.3 多线程爬虫...................................…..................….........…...............…··60
2 
目 录
4.3.1 线程和进程如何工作...............................................................“
4.3.2 实现........................................…........................…..................…61
4.3.3 多进程爬虫…............................................................................63
4.4 性能….........................…..........................….................….........……........67
4.5 本章小结 …··….......…......................….......…................….......................68
第5章 动态内容 69
5.1 动态网页示例...............................................……..................................69
5.2 对动态网页进行逆向工程..............…·….........…............................……72
5.3 渲染动态网页.............................….......................….............................77
5.3.1 PyQt 还是PySid e……………………………………………………………78
5.3.2 执行JavaScript
··········….......……………………………………………..7g
5.3.3 使用WebKit 与 网站 交互.......…................…........................…80
5.3.
4 Selenium ................................................................................... 85
5.4 本章小结 .......…..........……………………………………………………...............gg
第6章 表单交E 89
6
.1 登录表单….........…....................................................….........................90
6
.2 支持内容更新的登录脚本扩展…..........................…......................…..97
6
.3 使用Mechanize模块实现自动化表单处理.........….......…................ 100
6
.4 本章小结 ………………………………………………………………………………·102
第7章 验证码处理 103
7.1 注册账号…··……………………………………………………………………………· 103
7.2 光学字符识别………………………………………………………………………·…106
7.3 处理复杂验证码….........…..........…................….......................…........ 111
7.3.1 使用验证码处理服务..............................…................…......... 112
7.3.2 9kw入门………………………………………………………………………112
3 
目 录
7.3.3 与 注册功 能集成…..............……··············································119
7.4 本章小结 ..............…........…··………………………………………………………120
第8章 Scrapy
121
8.1 安装.............................…..................................……........................…··121
8.2 启动项目.........................…..........................................….......…··········122
8.2.
1 定义模型.........………………………………………………………………123
8.2.2 创建爬虫........…………………………………………………………·······124
8.2.3 使用 sh el 命l 令抓取…...........…………………………………………·128
8.2.4 检查结果..............….......…………………………………………………129
8.2.5 中断与恢复爬虫”…...............……..............…··························132
8.3 使用Port ia 编写可视化爬虫.............................……···························133
8.3.1 安装…..............................….............…·····································133
8.3.2 标注……………………………………………………………………………··136
8.3.3 优化爬虫…··…………………………………………………………………·138
8.3.4 检查结果…......................………………………………………………··140
8.4 使用Scrap ely 实现自动化抓取.........…..........................................…141
8.5 本章小结…........................…..........…........…......................…··············142
第9章 总结 143 句300OOAU’I句3
呵