Welcome, guest | Sign In | My Account | Store | Cart

Just a snippet. I needed to grab the data between <LI> list item </LI> tags. Highly annoying that I couldn't find it anywhere else.

Python, 4 lines
1
2
3
4
import re


output  = re.compile('<li>(.*?)</li>', re.DOTALL |  re.IGNORECASE).findall(input)

This is useful for grabbing the data you need if it's in an html page and you don't want to bother learning the INSANELY badly documented html or sgml parsers in python.

3 comments

Harvey Thomas 20 years, 5 months ago  # | flag

Very simplistic. Unfortunately this will only work in the simplest of cases where the contents of a is just text. Mixed content is allowed in so, you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.

Harvey Thomas 20 years, 5 months ago  # | flag

Very simplistic. Unfortunately this will only work in the simplest of cases where the contents of a is just text. Mixed content is allowed in so, you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.

Harvey Thomas 20 years, 5 months ago  # | flag

Sorry. The comment above should read:

Unfortunately this will only work in the simplest of cases where the contents of a li element is just text. Mixed content is allowed in li elements, so you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.

Created by Brendan Barry on Wed, 20 Aug 2003 (PSF)
Python recipes (4591)
Brendan Barry's recipes (1)

Required Modules

Other Information and Tasks