01图
百度之星2012-2013成绩抓取python脚本

如何获取京东商品价格和商品评分

xhSong posted @ 2012年12月15日 22:50 in python with tags 京东 360buy 商品价格 图片 商品评分 , 3888 阅读

由于课程作业的需求,我们要抓取京东商品的信息。在抓取商品的价格、评论人数以及评分的时候遇到一些麻烦。下面提供我的解决方案。

1. 商品价格:

京东上面的商品价格都是用图片显示的,不过庆幸的时,所有数字的字体、大小、颜色都是一样的。简单起见,直接把二维图片变成灰度图,取个反,影射成一维结构后进行切割和匹配。用了一个简单的匹配评估函数,只是对于数字不变的情况总是能得到正确的结果,如果字体、颜色、大小等变了,估计就得不到正确结果了。具体代码如下

#!/usr/bin/python
# coding=utf-8
import Image, sys

class PriceReco:
    img_data = []
    size_x, size_y = 0, 0
    def __init__(self, filename): #加载变换图片
        try:
            img = Image.open(filename)
        except:
            print filename, "load error"
            return 
        self.size_x, self.size_y = img.size
        self.img_data = list(img.convert('L').getdata())
        for i in range(0, len(self.img_data)):
            self.img_data[i] = 255 - self.img_data[i]
        #print filename, "load success, image size is", self.size_x, self.size_y
        #print self.img_data
        
    def getone(self, single): #识别单个数字
        table_value = [
                [189, 378, 945, 1512, 2079, 1701, 1701, 1134, 945, 378, 189], #¥
                [567, 567], # .
                [1323, 1701, 756, 378, 378, 378, 756, 1701, 1323], # 0
                [378, 378, 2079, 2079, 189, 189], # 1
                [567, 945, 756, 756, 756, 756, 945, 945, 567], # 2
                [756, 1134, 378, 567, 567, 567, 1323, 1512, 756], # 3
                [378, 378, 378, 378, 378, 378, 2079, 2079, 189, 189], # 4
                [378, 1512, 1134, 567, 567, 567, 945, 1134, 756], # 5
                [1134, 1512, 945, 756, 567, 567, 945, 1134, 567], # 6
                [189, 189, 378, 756, 945, 945, 945, 756, 378], # 7
                [756, 1512, 1323, 567, 567, 567, 1323, 1512, 756], # 8
                [567, 1134, 945, 567, 567, 756, 945, 1512, 1134], # 9
                ]
        table_key = ['¥', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
        key_id, min_value = 0, 100000
        #print single
        for k in range(0, len(table_key)):
            #print len(table_value[k]), len(single)
            value = 10 * (len(table_value[k]) - len(single)) ** 2
            #print value
            for i in range(0, min(len(table_value[k]), len(single))):
                value += (table_value[k][i] - single[i]) ** 2
            #print value
            if value < min_value:
                key_id, min_value = k, value
                #print "updata: ", key_id, min_value
        #print min_value
        if min_value > 100:
            return 'N'
        else:
            return table_key[key_id]
            
    def recognita(self): #切分和识别图片
        cnt = [0] * self.size_x
        for x in range(0, self.size_x):
            for y in range(0, self.size_y):
                index = y * self.size_x + x
                cnt[x] += self.img_data[index]
        #print cnt
        x = 0
        number = ""
        while x < self.size_x:
            if cnt[x]:
                single = []
                while x < self.size_x and cnt[x]:
                    single.append(cnt[x])
                    x += 1
                number += self.getone(single)
            x += 1
        return number
        
if __name__ == '__main__':
    if len(sys.argv) != 2:
        print "Usage: price_reco image"
    price = PriceReco(sys.argv[1])
    print price.recognita() 

2. 评论人数以及商品评分

这个问题就简单多了,虽然京东304(重定向)了n次,稍加分析就可以知道,直接访问如下url就可以得到评论人数和商品评分了

http://club.360buy.com/ProductPageService.aspx?method=GetCommentSummaryBySkuId&referenceId=$id&callback=GetCommentSummaryBySkuId

其中 $id是商品的id,这个得到很简单。

 

于是就成功攻破京东,o(∩∩)o...哈哈!

 

  • 无匹配
Avatar_small
Rose Permewan 说:
2018年7月15日 14:26

Developers of web solutions can get lots of help with take professional guidance from here that is very nice site. Those students who are interested into hiring of british essay writer help can read reviews for this purpose.

Avatar_small
cleaning services ab 说:
2019年10月24日 21:34

Staying floors sparkling and hygienic is about the toughest steps. However, if you clean any floors your body, you still require to schedule high quality external and additionally internal huge cleans as which sure to rise the longevity on your floor working surface. Also, professional housecleaning companies possess right housecleaning products and additionally equipment to guarantee the best consequences for any kind of floor.

Avatar_small
supercardhack.com 说:
2020年6月27日 04:25

WWE Supercard Hack And Cheats – Unlimited Credits in 2020

Avatar_small
cleaning services du 说:
2021年8月26日 14:03

Cleaning your house is not an easy task. It can be an art in which shows how well you cleaned your property and arranged it. DIALAMAID Purifiers advice one to clean your property which meets the best standards. A well organized cleaning method makes not merely your residence look elegant but in addition with sanitation.

Avatar_small
maids in dubai 说:
2021年9月29日 17:44

Throughout the years, workload contains gotten fatter and individuals jobs are actually more professional. Gone is the days as you were anticipated to keep ones own office vacuumed. Companies employ the service of out a lot of these jobs at present and for better reason. Professional detergents do a terrific job for the fee for the product. Companies realize the quality of developing their organisations cleaned as a result of someone dedicated the domain.

Avatar_small
monthly cleaning ser 说:
2023年8月30日 16:56

Even, it might be good to be sure for review articles from local yahoo results which were usually for sale in the upper part of the good results listings. You should include the neighborhood and state your house is in to ascertain local search engine rankings and review articles. Occasionally notice a a small number of low search positions but do not allow that dissuade you as they are dealing with the average person. It just really the norm to enjoy a lot from bad reviews a red the flag.


登录 *


loading captcha image...
(输入验证码)
or Ctrl+Enter