Welcome, guest | Sign In | My Account | Store | Cart

Grabbing text between HTML tags (Python recipe) by Brendan Barry
ActiveState Code (http://code.activestate.com/recipes/217019/)

Just a snippet. I needed to grab the data between <LI> list item </LI> tags. Highly annoying that I couldn't find it anywhere else.

      import re

output  = re.compile('<li>(.*?)</li>', re.DOTALL |  re.IGNORECASE).findall(input)

This is useful for grabbing the data you need if it's in an html page and you don't want to bother learning the INSANELY badly documented html or sgml parsers in python.

Tags: web

3 comments

Harvey Thomas 20 years, 6 months ago # | flag

Very simplistic. Unfortunately this will only work in the simplest of cases where the contents of a is just text. Mixed content is allowed in so, you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.

Harvey Thomas 20 years, 6 months ago # | flag

Sorry. The comment above should read:

Unfortunately this will only work in the simplest of cases where the contents of a li element is just text. Mixed content is allowed in li elements, so you will get markup as well. The worst case is if you have nested lists, when you won't be pairing the correct tags.

Created by Brendan Barry on Wed, 20 Aug 2003 (PSF)

◄	Python recipes (4591)	►
◄	Brendan Barry's recipes (1)	►

Required Modules

Other Information and Tasks

Licensed under the PSF License
Viewed 20736 times
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Grabbing text between HTML tags (Python recipe) by Brendan Barry ActiveState Code (http://code.activestate.com/recipes/217019/)