Just a snippet. I needed to grab the data between <LI> list item </LI> tags. Highly annoying that I couldn't find it anywhere else.
1 2 3 4 | import re
output = re.compile('<li>(.*?)</li>', re.DOTALL | re.IGNORECASE).findall(input)
|
This is useful for grabbing the data you need if it's in an html page and you don't want to bother learning the INSANELY badly documented html or sgml parsers in python.
Tags: web
Very simplistic. Unfortunately this will only work in the simplest of cases where the contents of a is just text. Mixed content is allowed in so, you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.
Very simplistic. Unfortunately this will only work in the simplest of cases where the contents of a is just text. Mixed content is allowed in so, you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.
Sorry. The comment above should read:
Unfortunately this will only work in the simplest of cases where the contents of a li element is just text. Mixed content is allowed in li elements, so you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.