xhSong's Blog

百度之星2012-2013成绩抓取python脚本

2013年1月10日 10:28

这段时间的百度之星又错过了。老了，也做不动了，第二次题目a了一个，第二个一个dp的题目感觉方法对的，可硬是没有调过，看来真的是更不行了。

对于今年的Astar我就不吐槽了。查成绩有点麻烦，翻页翻到手抽筋，写一python脚本，把抓出来的成绩贴贴吧里面，结果被删贴，贴脚本也被删贴，郁闷！

还是把程序贴这里吧

#!/usr/bin/python
# coding=utf8
import sys
import urllib2
from re import sub

problemurl="http://astar.baidu.com/index.php?r=home/detail&id=10"


def analysisPage(html, csvfile):
	html = sub(r'[\s]+', ' ', html)
	tbody = sub(r'(^.*<tbody>)|(</tbody>.*$)', "", html)
	items = sub(r'[\s]+', ' ', sub(r'<[^<>]*>', ' ', tbody))
	items = items.strip().split(' ')
	for i in range(len(items) / 5):
		record = ",".join(items[i*5:i*5+5])
		print record
		csvfile.write(record + "\n")
	
def getResult(problemurl):
	csvfile = open("result.csv", "w")
	csvfile.write('编号,用户名,语言,文件名,得分\n')
	pageid = 0
	while True:
		pageid += 1
		url = "%s&BccSubmitLogs_page=%d&ajax=projects-submit-logs" % (problemurl, pageid)
		html = urllib2.urlopen(url).read()
		analysisPage(html, csvfile)
		if html.find(u"下一页") == -1 or html.find('class="next hidden"') != -1:
			break
	csvfile.close()

if __name__ == '__main__':
#	reload(sys)
#	sys.setdefaultencoding("utf8")
	#getResult(problemurl)
	#exit(1)
	if len(sys.argv) != 2:
		print "Usage: astar2012.py problem_url"
		exit(1)
	getResult(sys.argv[1])

总结一下，现在接触过下面这些python库了

PIL(Python Imaging Lib)/Image：图像处理的库
cv/cv2：计算机视觉
numpy：NumPy is the fundamental package for scientific computing with Python
math：数学库
csv：csv文件处理
MySQLdb：链接mysql数据库
mlpy：机器学习库
matplotlib：It provides both a very quick way to visualize data from Python and publication-quality figures in many formats. 像matlab那么，可以画出很漂亮的图
M2Crypto、Crypto、pyecc：密码学库（hash，对称加密算法，非对称加密算法，签名认证等）
webpy：构建一个轻量级网站A minimalist web framework written in Python
urllib/urllib2：url访问，页面抓取等
re：正则表达式处理
os、sys：顾名思义，就是系统、文件的一些操作
ConfigParser：配置文件处理
Tkinter：图形界面库

python的各种库还是很强大的，o(∩∩)o...哈哈

Posted by xhSong Filed in python Tagged by python 库百度之星成绩 Comments[3]

hustsxh

分类

最新评论

最新留言

链接

功能

百度之星2012-2013成绩抓取python脚本

2013年1月10日 10:28