基本信息
源码名称:《用python写网络爬虫》随书源码
源码大小:3.57M
文件格式:.zip
开发语言:Python
更新时间:2019-07-26
友情提示:(无需注册或充值,赞助后即可获取资源下载链接)
嘿,亲!知识可是无价之宝呢,但咱这精心整理的资料也耗费了不少心血呀。小小地破费一下,绝对物超所值哦!如有下载和支付问题,请联系我们QQ(微信同号):813200300
本次赞助数额为: 2 元×
微信扫码支付:2 元
×
请留下您的邮箱,我们将在2小时内将文件发到您的邮箱
源码介绍
目录 第1章 网络爬虫简介 1.1 网络爬虫何时有用....................... 1.2 网络爬虫是否合法 ……………………………………………………………………·2 1.3 背景调研…………………………………………………………………………………··3 1.3.1 检查 robots. t xt….......…...............................….............................3 1.3.2 检查网站 地图.........................…......................…........................4 1.3.3 估算网站 大小..........….......…………………………………………………5 1.3.4 识别网站 所用技术 …………………………………………………………·7 1.3.5 寻找网站 所有者….......…·· ………………………………………………··7 1.4 编写第一个网络爬虫…..........................................….........…·················8 1.4.1 下载网 页….........................…..........….........................................9 1.4.2 网站 地图爬虫….........................…...................…······················12 1.4.3 ID遍历爬虫·············································································13 1.4.4 链接爬虫………………………………………………………………………··15 1. 5 本章小结…··………………………………………………………………………………n 第2章 数据抓取 23 2.1 分析网页…….................................….........................….....................…·23 2.2 三种网页抓取方法......................................…..................................…·26 2.2.1 正则表达式……………………………………………………………………·26 目 录 2.2.2 B eau tifu l Soup ······· ··…........................................……················28 2.2.3 Lxml·………………………………………………………………·················30 2.2.4 性 能对 比.........................….........…................….........……········· 32 2.2.5 结论.........……..............................................................……· ·······35 2.2.6 为链接爬虫添加抓取回调… ....................................................35 2.3 本章小结 ….......… .......…………………………………………………………………38 第3章 下载缓存 39 3.1 为链接爬虫添加缓存支持…· ·················· ··…·…....................................39 3.2 磁盘缓存............................................................................................... 42 3.2.1 实现.........………………………………………………………………………·44 3.2.2 缓存测试..............……………………………………………………………46 3.2.3 节省磁盘空间…….......................….........…................……········ 46 3.2.4 清理过期数据…........................................................................47 3.2.5 缺点………………………………………………….......…........….........…· 48 3.3 数据库缓存...........................................................................................” 3.3.1 NoSQL是什么..............….......….......................……........…······50 3.3.2 安装M ongoDB …...................…….....................…····················50 3.3.3 M ongoDB概述.......….........….................…...........................…50 3.3.4 M ongoDB 缓存实现.................................…...............….......…52 3.3.5 压缩...............….........................................................................” 3.3.6 缓存测试…·”…....................…..........…........................…···········54 3.4 本章小结 ….........…............... ………………………………………………………妇 第4章 并发下载 57 4.1 100万个网页.................…................................…..............…................57 4.2 串行爬虫.........................................…..................…................…........…60 4.3 多线程爬虫...................................…..................….........…...............…··60 2 目 录 4.3.1 线程和进程如何工作...............................................................“ 4.3.2 实现........................................…........................…..................…61 4.3.3 多进程爬虫…............................................................................63 4.4 性能….........................…..........................….................….........……........67 4.5 本章小结 …··….......…......................….......…................….......................68 第5章 动态内容 69 5.1 动态网页示例...............................................……..................................69 5.2 对动态网页进行逆向工程..............…·….........…............................……72 5.3 渲染动态网页.............................….......................….............................77 5.3.1 PyQt 还是PySid e……………………………………………………………78 5.3.2 执行JavaScript ··········….......……………………………………………..7g 5.3.3 使用WebKit 与 网站 交互.......…................…........................…80 5.3. 4 Selenium ................................................................................... 85 5.4 本章小结 .......…..........……………………………………………………...............gg 第6章 表单交E 89 6 .1 登录表单….........…....................................................….........................90 6 .2 支持内容更新的登录脚本扩展…..........................…......................…..97 6 .3 使用Mechanize模块实现自动化表单处理.........….......…................ 100 6 .4 本章小结 ………………………………………………………………………………·102 第7章 验证码处理 103 7.1 注册账号…··……………………………………………………………………………· 103 7.2 光学字符识别………………………………………………………………………·…106 7.3 处理复杂验证码….........…..........…................….......................…........ 111 7.3.1 使用验证码处理服务..............................…................…......... 112 7.3.2 9kw入门………………………………………………………………………112 3 目 录 7.3.3 与 注册功 能集成…..............……··············································119 7.4 本章小结 ..............…........…··………………………………………………………120 第8章 Scrapy 121 8.1 安装.............................…..................................……........................…··121 8.2 启动项目.........................…..........................................….......…··········122 8.2. 1 定义模型.........………………………………………………………………123 8.2.2 创建爬虫........…………………………………………………………·······124 8.2.3 使用 sh el 命l 令抓取…...........…………………………………………·128 8.2.4 检查结果..............….......…………………………………………………129 8.2.5 中断与恢复爬虫”…...............……..............…··························132 8.3 使用Port ia 编写可视化爬虫.............................……···························133 8.3.1 安装…..............................….............…·····································133 8.3.2 标注……………………………………………………………………………··136 8.3.3 优化爬虫…··…………………………………………………………………·138 8.3.4 检查结果…......................………………………………………………··140 8.4 使用Scrap ely 实现自动化抓取.........…..........................................…141 8.5 本章小结…........................…..........…........…......................…··············142 第9章 总结 143 句300OOAU’I句3 呵