1、锁定某个主题抓取; 2、能够产生日志文本文件,格式为:时间戳(timestamp)、URL; 3、抓取某一URL时最多允许建立2个连接(注意:本地作网页解析的线程数则不限) 4、遵守文明蜘蛛规则:必须分析robots.txt文件和meta tag有无限制;一个线程抓完一个网页后要sleep 2秒钟; 5、能对HTML网页进行解析,提取出链接URL,能判别提取的URL是否已处理过,不重复解析已crawl过的网页; 6、能够对spider/crawler程序的一些基本参数进行设置,包括:抓取深度(depth)、种子URL等; 7、使用User-agent向服务器表明自己的身份; 8、产生抓取统计信息:包括抓取速度、抓取完成所需时间、抓取网页总数;重要变量和所有类、方法加注释; 9、请遵守编程规范,如类、方法、文件等的命名规范, 10、可选:GUI图形用户界面、web界面,通过界面管理spider/crawler,包括启停、URL增删等
标签: 日志
上传时间: 2013-12-22
上传用户:wang5829
This a GBA(Game Boy Advance) animation sample code. It continue and reverse display 45 BMPs on GBA screen let it looks like an animation.
标签: GBA animation continue Advance
上传时间: 2013-12-12
上传用户:change0329
How well do you really know Java? Are you a code sleuth? Have you ever spent days chasing a bug caused by a trap or pitfall in Java or its libraries? Do you like brainteasers? Then this is the book for you!
上传时间: 2013-11-25
上传用户:王庆才
Student status management system is development two aspects that typical information management system, IMS( MIS), its development includes primarily the background database creates with support and the front end applies the procedure.Creates to rise to the former request the consistency of data is strong with the integrity, the library that the safeness of data like.But request the latter very much to apply the procedure function complete, easy usage etc. characteristics.
标签: management development information Student
上传时间: 2015-11-01
上传用户:1101055045
<%@ LANGUAGE="VBSCRIPT" %> <!--#include file="conn.asp" --> <% ProductClass_2=request("ProductClass_2") set rs=server.createobject("adodb.recordset") sqltext="select * from Product" if request("Product_Name")<>"" then sqltext=sqltext &" where Product_Name like %"& request("Product_Name") &"% " else sqltext=sqltext &" where Product_Name like %"& "" &"% " end if if request("Product_Class")<>"" then sqltext=sqltext &" and Class_1 like %"& request("Product_Class") &"% " end if
标签: ProductClass lt LANGUAGE VBSCRIPT
上传时间: 2013-11-25
上传用户:wl9454
a system for management of library,this system has implement a lot of fuctions ,I appriciate u will like it
标签: system appriciate management implement
上传时间: 2015-11-06
上传用户:努力努力再努力
This a USB core stack for the built-in USB device of LPC214x microcontrollers. It handles the hardware interface and USB enumeration/configuration. Also included are application examples like a USB joystick HID and USB serial port emulation.
标签: microcontrollers USB the built-in
上传时间: 2015-11-14
上传用户:talenthn
windows开源代码 Microsoft Windows is a complex operating system. It offers so many features and does so much that it s impossible for any one person to fully understand the entire system. This complexity also makes it difficult for someone to decide where to start concentrating the learning effort. Well, I always like to start at the lowest level by gaining a solid
标签: Microsoft operating features windows
上传时间: 2015-11-24
上传用户:zhuyibin
This the implementation of structural SVM for training complex alignment models for protein sequence alignment, especially for homology modeling. The structural SVM algorithm can incorporate many relevant features like secondary structure, relative exposed surface area, profiles and their various interaction into the alignment model. It was developed under Linux and compiles under gcc, built upon the svm^light software by Thorsten Joachims.
标签: implementation structural for alignment
上传时间: 2014-01-11
上传用户:chenbhdt
Abstract: By using gateway systems on large 32-bit platforms, networks of small, 8- and 16-bit microcontrollers can be monitored and controlled over the Internet. With embedded Linux, these gateways are easily moved from full-blown host PCs to embedded platforms like the PC104. In this class you will learn about hardware platforms that support embedded Linux, Linux kernel configuration, feature selection, installation, booting and tuning.
标签: bit platforms Abstract networks
上传时间: 2014-01-05
上传用户:kytqcool