Friday, December 16, 2011

Python Gnutella Crawler

Update: I create a gist for this ... get some https://gist.github.com/ptony82/5092551

At one point in time I was bored, so I wrote this gnutella crawler. It's pretty slow, unless you run them in parallel. Maybe I will run it on Google App Engine one day and write a paper about how Gnutella is still alive and well even after the shutdown of Limewire. If it's P2P, it's forever.


import socket, sys

def main():

    peers = []
    peers.append((sys.argv[1], int(sys.argv[2])))

    count = 0

    for peer in peers:
        makecall(peer, peers)
        count += 1
        if count == 3:
            break

def makecall(endpoint, peers):

    request = ("GNUTELLA CONNECT/0.6\r\n"
               "User-Agent: LimeWire (crawl)\r\n"
               "X-Ultrapeer: False\r\n"
               "Query-Routing: 0.1\r\n"
               "Crawler: 0.1\r\n\r\n")

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    try:
        sock.connect(endpoint)
    except:
        sock.close()
        return

    sock.send(request)
    response = sock.recv(1024)
    print response

    for line in response.split('\n'):
        if(line.startswith('Peers')):
          for peer in line[7:].split(','):
              host, port = peer.strip().split(':')
              peers.append((host, int(port)))

    sock.close()

if __name__ == "__main__":
    main();

No comments:

Post a Comment