<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>鬼の领地 &#187; 我的实验室</title>
	<atom:link href="http://blog.upsuper.org/tag/%e6%88%91%e7%9a%84%e5%ae%9e%e9%aa%8c%e5%ae%a4/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.upsuper.org</link>
	<description>the place where there are some ghost appearing...</description>
	<lastBuildDate>Fri, 06 Aug 2010 12:57:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>人人网的好友交集小工具</title>
		<link>http://blog.upsuper.org/a-tool-to-get-intersection-of-friends-in-renren/</link>
		<comments>http://blog.upsuper.org/a-tool-to-get-intersection-of-friends-in-renren/#comments</comments>
		<pubDate>Sat, 09 Jan 2010 05:06:30 +0000</pubDate>
		<dc:creator>upsuper</dc:creator>
				<category><![CDATA[小程序]]></category>
		<category><![CDATA[CGI]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[人人网]]></category>
		<category><![CDATA[我的实验室]]></category>

		<guid isPermaLink="false">http://blog.upsuper.org/?p=952</guid>
		<description><![CDATA[昨晚花了几个小时写了个计算人人网（其实我还是更喜欢叫他校内）当中，任意两个用户的好友之间交集的在线小工具，可以到我的实验室里看看这个小工具：人人网好友交集。 话说能写出这样的工具，主要有赖于我的空间提供商将系统换为 FreeBSD 后可以解析 Python 了，而且也没有限制 CGI，所以就成功了~ 由于必须从人人网，而非本地，的用户页面，而非接口，获取数据，速度自然不可能快。所以我才有了那个提示：这一过程可能非常缓慢。不过事实上，经过测试，在我的网站的服务器上，速度还是相当了得的，基本上 2s-5s 就能出结果！想我自己的机子上好的时候都要 7s，不好的时候就根本出不来了…… 如果单线程进行获取，那就真可以等死人了……所以用了5线程并发，每个线程维护一个链接……此外还连接的是手机人人网，以换来较小的传输流量，较快的解析速度，和较容易的处理方式…… 最后，看代码之前声明一下，这个是以 AGPLv3 协议发布的，根据这一协议，如果你修改了这个代码并且使用你修改过的代码为他人服务，你所修改的代码也必须公开，并且以 AGPLv3 协议发布。 由于前台界面的东西大家都有办法获取，我就不给了，这里直接给出那个后台处理程序： 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 [...]]]></description>
			<content:encoded><![CDATA[<p>昨晚花了几个小时写了个计算人人网（其实我还是更喜欢叫他校内）当中，任意两个用户的好友之间交集的在线小工具，可以到<a href="http://lab.upsuper.org">我的实验室</a>里看看这个小工具：<a href="http://lab.upsuper.org/urenren/">人人网好友交集</a>。</p>
<p>话说能写出这样的工具，主要有赖于我的空间提供商将系统换为 FreeBSD 后可以解析 Python 了，而且也没有限制 CGI，所以就成功了~<br />
<span id="more-952"></span><br />
由于必须从人人网，而非本地，的用户页面，而非接口，获取数据，速度自然不可能快。所以我才有了那个提示：这一过程可能非常缓慢。不过事实上，经过测试，在我的网站的服务器上，速度还是相当了得的，基本上 2s-5s 就能出结果！想我自己的机子上好的时候都要 7s，不好的时候就根本出不来了……</p>
<p>如果单线程进行获取，那就真可以等死人了……所以用了5线程并发，每个线程维护一个链接……此外还连接的是手机人人网，以换来较小的传输流量，较快的解析速度，和较容易的处理方式……</p>
<p>最后，看代码之前声明一下，这个是以 <a href="http://www.fsf.org/licensing/licenses/agpl-3.0.html">AGPLv3</a> 协议发布的，根据这一协议，如果你修改了这个代码并且使用你修改过的代码为他人服务，你所修改的代码也必须公开，并且以 AGPLv3 协议发布。</p>
<p>由于前台界面的东西大家都有办法获取，我就不给了，这里直接给出那个后台处理程序：</p>

<div class="wp_codebox"><table><tr id="p9522"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
</pre></td><td class="code" id="p952code2"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python</span>
<span style="color: #808080; font-style: italic;"># - * - coding: UTF-8 - * -</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Copyright (C) 2010 Upsuper &lt;quanxunzhen@gmail.com&gt;</span>
<span style="color: #808080; font-style: italic;"># License: AGPLv3</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread, Lock
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">httplib</span> <span style="color: #ff7700;font-weight:bold;">import</span> HTTPConnection
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">cgi</span>, json
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>, <span style="color: #dc143c;">re</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># 常数设置</span>
THREADS_NUM = <span style="color: #ff4500;">5</span>             <span style="color: #808080; font-style: italic;"># 最大线程数</span>
RENREN_USER = <span style="color: #483d8b;">''</span>            <span style="color: #808080; font-style: italic;"># 人人网用户名</span>
RENREN_PWD  = <span style="color: #483d8b;">''</span>            <span style="color: #808080; font-style: italic;"># 人人网密码</span>
USER_AGENT  = <span style="color: #483d8b;">'urenren 0.1'</span> <span style="color: #808080; font-style: italic;"># 提交给人人网的 User-Agent</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># JSON 输出函数</span>
json_dump = <span style="color: #ff7700;font-weight:bold;">lambda</span> v: json.<span style="color: black;">dump</span><span style="color: black;">&#40;</span>v, <span style="color: #dc143c;">sys</span>.<span style="color: black;">stdout</span><span style="color: black;">&#41;</span>
<span style="color: #808080; font-style: italic;"># 编译匹配用正则表达式</span>
parse_re = <span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span>r<span style="color: #483d8b;">'&lt;td&gt;&lt;p&gt;&lt;a href=&quot;[^<span style="color: #000099; font-weight: bold;">\?</span>]+<span style="color: #000099; font-weight: bold;">\?</span>id=(<span style="color: #000099; font-weight: bold;">\d</span>+)[^&quot;]*&quot;&gt;'</span>
                      r<span style="color: #483d8b;">'&lt;img src=&quot;([^&quot;]+)&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;/p&gt;'</span>
                      r<span style="color: #483d8b;">'&lt;a href=&quot;[^&quot;]+&quot;&gt;([^&lt;]+)&lt;/a&gt;&lt;/td&gt;'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> RequestThread<span style="color: black;">&#40;</span>Thread<span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">id</span>, sid, friends<span style="color: black;">&#41;</span>:
    Thread.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
    <span style="color: #008000;">self</span>.__page = <span style="color: #483d8b;">'/getfriends.do?curpage=%%d&amp;id=%d&amp;sid=%s'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #008000;">id</span>, sid<span style="color: black;">&#41;</span>
    <span style="color: #008000;">self</span>.__friends = friends
    <span style="color: #008000;">self</span>.__conn = HTTPConnection<span style="color: black;">&#40;</span><span style="color: #483d8b;">'3g.renren.com'</span><span style="color: black;">&#41;</span>
&nbsp;
  <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">global</span> curpage, stop_sign
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #ff7700;font-weight:bold;">not</span> stop_sign:
      <span style="color: #808080; font-style: italic;"># 获取当前页面</span>
      curpage_lock.<span style="color: black;">acquire</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      page = curpage
      curpage += <span style="color: #ff4500;">1</span>
      curpage_lock.<span style="color: black;">release</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
      <span style="color: #808080; font-style: italic;"># 连接获取数据</span>
      conn = <span style="color: #008000;">self</span>.__conn
      conn.<span style="color: black;">request</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'GET'</span>, <span style="color: #008000;">self</span>.__page <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>page, <span style="color: black;">&#41;</span>, <span style="color: #008000;">None</span>, <span style="color: black;">&#123;</span>
        <span style="color: #483d8b;">'User-Agent'</span>: USER_AGENT
      <span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
      data = conn.<span style="color: black;">getresponse</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
      <span style="color: #808080; font-style: italic;"># 处理获取的数据</span>
      friend_iter = parse_re.<span style="color: black;">finditer</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span>
      t_stop_sign = <span style="color: #008000;">True</span>
      <span style="color: #ff7700;font-weight:bold;">for</span> f <span style="color: #ff7700;font-weight:bold;">in</span> friend_iter:
        t_stop_sign = <span style="color: #008000;">False</span>
        <span style="color: #008000;">id</span> = <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>f.<span style="color: black;">group</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.__friends<span style="color: black;">&#91;</span><span style="color: #008000;">id</span><span style="color: black;">&#93;</span> = <span style="color: black;">&#40;</span>f.<span style="color: black;">group</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#41;</span>, f.<span style="color: black;">group</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
      stop_sign = t_stop_sign
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> readFriends<span style="color: black;">&#40;</span><span style="color: #008000;">id</span><span style="color: black;">&#41;</span>:
  <span style="color: #808080; font-style: italic;"># 初始化多线程</span>
  <span style="color: #ff7700;font-weight:bold;">global</span> curpage, curpage_lock, stop_sign
  curpage = <span style="color: #ff4500;">0</span>
  curpage_lock = Lock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  stop_sign = <span style="color: #008000;">False</span>
  threads = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
  friends = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
&nbsp;
  <span style="color: #808080; font-style: italic;"># 创建线程</span>
  <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">xrange</span><span style="color: black;">&#40;</span>THREADS_NUM<span style="color: black;">&#41;</span>:
    threads.<span style="color: black;">append</span><span style="color: black;">&#40;</span>RequestThread<span style="color: black;">&#40;</span><span style="color: #008000;">id</span>, sid, friends<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
  <span style="color: #808080; font-style: italic;"># 开始执行线程</span>
  <span style="color: #ff7700;font-weight:bold;">for</span> t <span style="color: #ff7700;font-weight:bold;">in</span> threads:
    t.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  <span style="color: #808080; font-style: italic;"># 等待线程结束</span>
  <span style="color: #ff7700;font-weight:bold;">for</span> t <span style="color: #ff7700;font-weight:bold;">in</span> threads:
    t.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
  <span style="color: #ff7700;font-weight:bold;">return</span> friends
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
  <span style="color: #808080; font-style: italic;"># 初始化CGI输出</span>
  <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Content-Type: text/plain'</span>
  <span style="color: #ff7700;font-weight:bold;">print</span>
&nbsp;
  <span style="color: #808080; font-style: italic;"># 获取ID信息</span>
  form = <span style="color: #dc143c;">cgi</span>.<span style="color: black;">FieldStorage</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  id1, id2 = <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">0</span>
  <span style="color: #ff7700;font-weight:bold;">if</span> form.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'id1'</span><span style="color: black;">&#41;</span>: id1 = <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>form<span style="color: black;">&#91;</span><span style="color: #483d8b;">'id1'</span><span style="color: black;">&#93;</span>.<span style="color: black;">value</span><span style="color: black;">&#41;</span>
  <span style="color: #ff7700;font-weight:bold;">if</span> form.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'id2'</span><span style="color: black;">&#41;</span>: id2 = <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>form<span style="color: black;">&#91;</span><span style="color: #483d8b;">'id2'</span><span style="color: black;">&#93;</span>.<span style="color: black;">value</span><span style="color: black;">&#41;</span>
  <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: black;">&#40;</span>id1 <span style="color: #ff7700;font-weight:bold;">and</span> id2<span style="color: black;">&#41;</span>:
    json_dump<span style="color: black;">&#40;</span><span style="color: black;">&#123;</span><span style="color: #483d8b;">'error'</span>: <span style="color: #008000;">True</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span>
&nbsp;
  <span style="color: #808080; font-style: italic;"># 登入人人网</span>
  conn = HTTPConnection<span style="color: black;">&#40;</span><span style="color: #483d8b;">'3g.renren.com'</span><span style="color: black;">&#41;</span>
  conn.<span style="color: black;">request</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'POST'</span>, <span style="color: #483d8b;">'/login.do'</span>,
      <span style="color: #dc143c;">urllib</span>.<span style="color: black;">urlencode</span><span style="color: black;">&#40;</span><span style="color: black;">&#123;</span><span style="color: #483d8b;">'email'</span>: RENREN_USER, <span style="color: #483d8b;">'password'</span>: RENREN_PWD<span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>,
      <span style="color: black;">&#123;</span>
        <span style="color: #483d8b;">'Content-Type'</span>: <span style="color: #483d8b;">'application/x-www-form-urlencoded'</span>,
        <span style="color: #483d8b;">'User-Agent'</span>: USER_AGENT,
      <span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
  response = conn.<span style="color: black;">getresponse</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  data = response.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  conn.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  match = <span style="color: #dc143c;">re</span>.<span style="color: black;">search</span><span style="color: black;">&#40;</span>r<span style="color: #483d8b;">'sid=([0-9a-f]+)'</span>, data, <span style="color: #dc143c;">re</span>.<span style="color: black;">I</span><span style="color: black;">&#41;</span>
  <span style="color: #ff7700;font-weight:bold;">global</span> sid
  sid = match.<span style="color: black;">group</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
&nbsp;
  <span style="color: #808080; font-style: italic;"># 读取好友</span>
  friends1 = readFriends<span style="color: black;">&#40;</span>id1<span style="color: black;">&#41;</span>
  friends2 = readFriends<span style="color: black;">&#40;</span>id2<span style="color: black;">&#41;</span>
  <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>friends1<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>friends2<span style="color: black;">&#41;</span>:
    friends1, friends2 = friends2, friends1
&nbsp;
  <span style="color: #808080; font-style: italic;"># 判断交集</span>
  intersection = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
  <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> friends1.<span style="color: black;">iterkeys</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span> i <span style="color: #ff7700;font-weight:bold;">in</span> friends2:
      intersection.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>i, friends2<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, friends2<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
  json_dump<span style="color: black;">&#40;</span><span style="color: black;">&#123;</span><span style="color: #483d8b;">'error'</span>: <span style="color: #008000;">False</span>, <span style="color: #483d8b;">'count'</span>: <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>intersection<span style="color: black;">&#41;</span>, <span style="color: #483d8b;">'data'</span>: intersection<span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
  <span style="color: #ff7700;font-weight:bold;">try</span>:
    main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  <span style="color: #ff7700;font-weight:bold;">except</span>:
    json_dump<span style="color: black;">&#40;</span><span style="color: black;">&#123;</span><span style="color: #483d8b;">'error'</span>: <span style="color: #008000;">True</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>代码中的配置常数 RENREN_USER 必须是一个已有的人人网账户的帐户名或邮箱地址，反正就是可以用来登入人人网的东西，RENREN_PWD 也必须是前面的账户对应的密码。因为人人网不允许未登入用户查看各种页面（大约是为了防止搜索引擎的人肉吧……不过其实我们有办法让搜索引擎索引到那些内容的~嘿嘿，突然有了一些邪恶的想法~），所以只能弄一个帐号来登入了。</p>
<p>人人网的手机版为了适应部分手机浏览器不支持 Cookies，用了 SID 的策略，这样我也就可以放心大胆地不管 Cookies 了~这个程序还是蛮好写的说~也只有 Python 能提供如此快速的程序构建~</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.upsuper.org/a-tool-to-get-intersection-of-friends-in-renren/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
