Beautiful Soup (HTML parser)

For other uses, see Beautiful Soup.

Beautiful Soup
Original author(s)	Leonard Richardson

Stable release	4.5.1 / August 2, 2016 (2016-08-02)
Repository	code.launchpad.net/beautifulsoup/
Written in	Python
Platform	Python
Type	HTML parser library, Web scraping
License	Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+^[1]
Website	www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.^[2]

It is available for Python 2.6+ and Python 3.

Code example

# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

References

↑ "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself
↑ "Beautiful Soup website". Retrieved 18 April 2012.

This article is issued from Wikipedia - version of the 12/2/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Beautiful Soup (HTML parser)

Code example

See also

References