Sunday, August 29, 2010

HTML Parsing using HTMLParser

Simple script to get the video download link from here, which are hosted at archive.org
This script is adapted from here.

HTMLParser usage
from Python documentation:

Usage:
    p = HTMLParser()
    p.feed(data)
    ...
    p.close()

Start tags are handled by calling handle_starttag() or
handle_startendtag(); end tags by handle_endtag().  The
data between tags is passed from the parser to the derived class
by calling handle_data() with the data as argument (the data
may be split up in arbitrary chunks).  Entity references are
passed by calling handle_entityref() with the entity
reference as the argument.  Numeric character references are
passed to handle_charref() with the string containing the
reference as the argument.

import sys
import urllib
import HTMLParser
import re

class GetLinks(HTMLParser.HTMLParser):
    def handle_starttag(self,tag,attrs):
        if tag == 'a':
            for name,value in attrs:
                if name == 'href':
                    if re.search('ArabicLanguageCourseVideos',value):
                        print(value)
                    
gl = GetLinks()
url = 'http://www.lqtoronto.com/videodl.html'

urlconn = urllib.urlopen(url)

# read and put the downloaded html code into url content
urlcontents = urlconn.read()

# input the downloaded material into HTMLParser's member function 
# for parsing
gl.feed(urlcontents)

Friday, August 27, 2010

Socket Exception Handlers

Based on book 'Foundations of Python Network Programming' chapter 2.

Based on Python documentation, there are four socket exceptions (error, herror, gaierror, timeout). Here, I only use the socket.error only. It is used for general I/O and communication problems.

For an illustration, try the following from command line:
> python.exe socket.py google.com 80 index.html

import socket,sys

# standard input
host = sys.argv[1]
port = sys.argv[2]
filename = sys.argv[3]

# error handler 
def errorHandler(message,e):
    print('{0} {1}' .format(message,e))
    sys.exit(1)
    
# create socket 
try:
    s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
except socket.error, e:
    errorHandler('Socket creation error: ',e)

# input port manipulation
try:
    port = int(port)
except ValueError,e:
    errorHandler('Error port number: ',e)

# connection initiation
try:
    s.connect((host,port))
except socket.error, e:
    errorHandler('error socket initiation: ',e)
    
# sending HTTP request
try:
    s.sendall("GET %s HTTP/1.0\r\n\r\n" % filename)
except socket.error, e:
    errorHandler('Error sending HTTP request: ',e)
    
# receiving data from server
while 1:
    try:
        buf = s.recv(2048)
    except socket.error, e:
        errorHandler('Error receiving data: ',e)
    if not len(buf):
        break
    sys.stdout.write(buf)

Monday, August 23, 2010

Install zope.interface for Twisted

Twisted uses Zope Interface to define and document APIs.
  1. Download it from here. It's in egg format. Choose the appropriate Python version with ours.
  2. To install .egg, we need 'Easy Install' that is part of setuptools. So, we have to install setuptools first. Download it from here. Choose the appropriate Python version with ours. Because our aim is to be able to install .egg, choose setuptools in .exe format.
  3. Run the installer. 
  4. It will install a new executable file called 'easy_install.exe' under Python's 'Scripts' folder.
  5. To install the zope.interface in .egg format, I did the following in Command Line:
c:\Python26\Scripts\easy_install.exe c:\zope.interface-3.6.1-py2.6-win32.egg
Now we can use the zope.interface for our twisted.

---
Further readings:

Thursday, August 19, 2010

SyntaxHighlighter

SyntaxHighlighter is a fully functional self-contained code syntax highlighter developed in JavaScript. In short, it beautifies your code posts, something like this for Python codes:
class SimpleDescriptor(object):

    def __get__(self, instance, owner):
        # Check if the value has been set
        if (not hasattr(self, "_value")):
            raise AttributeError
        print "Getting value: %s" % self._value
        return self._value

    def __set__(self, instance, value):
        print "Setting to %s" % value
        self._value = value

    def __delete__(self, instance):
        del(self._value)
This article provides very-easy steps to utilize SyntaxHighlighter.
-

Checking Input Type

The idea:
check whether input is numeric or not. First, try to convert the input into float object. If failed, then it is not numeric for sure :p
while True:
    try:
        radius = float(input('masukkan jari-jari lingkaran: '))
        break
    except ValueError:
        print('masukkan angka!')
-