How about this?
from bs4
import BeautifulSoup
soup = BeautifulSoup(unicode(response))
wrapper = soup.new_tag('div', ** {
"data-role": "content"
})
body_children = list(soup.body.children)
soup.body.clear()
soup.body.append(wrapper)
for child in body_children:
wrapper.append(child)
I recently hit upon this same situation, and I'm not content with any of the other answers here. Iterating through a massive list and rebuilding the DOM doesn't seem acceptable to me performance-wise, and the other solution wraps the body, not the body's contents. Here's my solution:
soup.body.wrap(soup.new_tag("div", ** {
"data-role": "content"
})).wrap(soup.new_tag("body"))
soup.body.body.unwrap()
A perfect use case for BeautifulSoup's wrap()
:
from bs4 import BeautifulSoup, Tag
response = """
<body>
<p>test1</p>
<p>test2</p>
</body>
"""
soup = BeautifulSoup(response, 'html.parser')
wrapper = soup.new_tag('div', **{"data-role": "content"})
soup.body.wrap(wrapper)
print soup.prettify()
prints:
<div data-role="content">
<body>
<p>
test1
</p>
<p>
test2
</p>
</body>
</div>
UPD:
from bs4 import BeautifulSoup
response = """<html>
<head>
<title>test</title>
</head>
<body>
<p>test</p>
</body>
</html>
"""
soup = BeautifulSoup(response)
wrapper = soup.new_tag('div', **{"data-role": "content"})
soup.body.wrap(wrapper)
print soup.prettify()
How can I wrap <div data-role="content"></div> around the contents of html body with beautiful soup?,Very simply, this approach just wraps the body twice, first with the new tag, then another body. Then I use BeautifulSoup’s unwrap method to delete the original body while maintaining the contents.,This appends the wrapper to the body, but doesn’t wrap the body contents like I need,A perfect use case for BeautifulSoup’s wrap():
from bs4
import BeautifulSoup
soup = BeautifulSoup(u "%s" % response)
wrapper = soup.new_tag('div', ** {
"data-role": "content"
})
soup.body.append(wrapper)
for content in soup.body.contents:
wrapper.append(content)
I’ve gotten to here, but now I end up with duplicate body elements like this <body><div data-role="content"><body>content here</body></div></body>
from bs4
import BeautifulSoup
soup = BeautifulSoup(u "%s" % response)
wrapper = soup.new_tag('div', ** {
"data-role": "content"
})
new_body = soup.new_tag('body')
contents = soup.body.replace_with(new_body)
wrapper.append(contents)
new_body.append(wrapper)
from bs4
import BeautifulSoup
soup = BeautifulSoup(unicode(response))
wrapper = soup.new_tag('div', ** {
"data-role": "content"
})
body_children = list(soup.body.children)
soup.body.clear()
soup.body.append(wrapper)
for child in body_children:
wrapper.append(child)
I recently hit upon this same situation, and I’m not content with any of the other answers here. Iterating through a massive list and rebuilding the DOM doesn’t seem acceptable to me performance-wise, and the other solution wraps the body, not the body’s contents. Here’s my solution:
soup.body.wrap(soup.new_tag("div", ** {
"data-role": "content"
})).wrap(soup.new_tag("body"))
soup.body.body.unwrap()
A perfect use case for BeautifulSoup’s wrap()
:
from bs4 import BeautifulSoup, Tag
response = """
<body>
<p>test1</p>
<p>test2</p>
</body>
"""
soup = BeautifulSoup(response, 'html.parser')
wrapper = soup.new_tag('div', **{"data-role": "content"})
soup.body.wrap(wrapper)
print soup.prettify()
<div data-role="content">
<body>
<p>
test1
</p>
<p>
test2
</p>
</body>
</div>
UPD:
from bs4 import BeautifulSoup
response = """<html>
<head>
<title>test</title>
</head>
<body>
<p>test</p>
</body>
</html>
"""
soup = BeautifulSoup(response)
wrapper = soup.new_tag('div', **{"data-role": "content"})
soup.body.wrap(wrapper)
print soup.prettify()
Last Updated : 03 Mar, 2021,GATE CS 2021 Syllabus
Output:
Div Content
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.,The default is formatter="minimal". Strings will only be processed enough to ensure that Beautiful Soup generates valid HTML/XML:,Tag, NavigableString, and BeautifulSoup cover almost everything you’ll see in an HTML or XML file, but there are a few leftover bits. The only one you’ll probably ever need to worry about is the comment:,The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string:
html_doc = """
<html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.
</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())
# <html>
#
<head>
# <title>
# The Dormouse's story
# </title>
# </head>
#
<body>
# <p class="title">
# <b>
# The Dormouse's story
# </b>
# </p>
# <p class="story">
# Once upon a time there were three little sisters; and their names were
# <a class="sister" href="http://example.com/elsie" id="link1">
# Elsie
# </a>
# ,
# <a class="sister" href="http://example.com/lacie" id="link2">
# Lacie
# </a>
# and
# <a class="sister" href="http://example.com/tillie" id="link2">
# Tillie
# </a>
# ; and they lived at the bottom of a well.
# </p>
# <p class="story">
# ...
# </p>
# </body>
#
</html>
soup.title
# <title>The Dormouse's story</title>
soup.title.name
# u'title'
soup.title.string
# u'The Dormouse's story'
soup.title.parent.name
# u'head'
soup.p
# <p class="title"><b>The Dormouse's story</b></p>
soup.p['class']
# u'title'
soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
for link in soup.find_all('a'): print(link.get('href')) # http: //example.com/elsie # http: //example.com/lacie # http: //example.com/tillie
print(soup.get_text()) # The Dormouse 's story # # The Dormouse 's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # #...
from bs4 import BeautifulSoup
with open("index.html") as fp:
soup = BeautifulSoup(fp)
soup = BeautifulSoup("<html>data
</html>")
We want to simply extract the item name and price from each entry and display it as a list. So step one is to examine the source of the page to determine how we can search on the Html. It looks like we have some Bootstrap classes we can search on among other things.,When the code above runs, the quotes variable gets assigned a list of all the elements from the Html document that is a span tag with a class of text. Printing out that quotes variable gives us the output we see below. The entire Html tag is captured along with its inner contents. ,As you can see from the above markup, there is a lot of data that kind of just looks all mashed together. The purpose of web scraping is to be able to access just the parts of the web page that we are interested in. Many software developers will employ regular expressions for this task, and that is definitely a viable option. The Python Beautiful Soup library is a much more user-friendly way to extract the information we want. ,Finally, we’ll just add some code to fetch all the tags for each quote as well. This one is a little trickier because we first need to fetch each outer wrapping div of each collection of tags. If we didn’t do this first step, then we could fetch all the tags but we wouldn’t know how to associate them to a quote and author pair. Once the outer div is captured, we can drill down further by using the find_all() function again on *that* subset. From there we have to add an inner loop to the first loop to complete the process.
scraper.py
import requests
from bs4
import BeautifulSoup
url = 'http://quotes.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
print(soup)
This gives us a nice output with just the quotes we are interested in.
C: pythonvrequestsScriptspython.exe C: /python/vrequests / scraper.py“ The world as we have created it is a process of our thinking.It cannot be changed without changing our thinking.”“It is our choices, Harry, that show what we truly are, far more than our abilities.”“There are only two ways to live your life.One is as though nothing is a miracle.The other is as though everything is a miracle.”“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”“Imperfection is beauty, madness is genius and it 's better to be absolutely ridiculous than absolutely boring.”“ Try not to become a man of success.Rather become a man of value.”“It is better to be hated for what you are than to be loved for what you are not.”“I have not failed.I 've just found 10,000 ways that won' t work.”“A woman is like a tea bag; you never know how strong it is until it 's in hot water.”“ A day without sunshine is like, you know, night.” Process finished with exit code 0
Now we get the quotes and each associated author when the script is run.
C: pythonvrequestsScriptspython.exe C: /python/vrequests / scraper.py“ The world as we have created it is a process of our thinking.It cannot be changed without changing our thinking.” --Albert Einstein “ It is our choices, Harry, that show what we truly are, far more than our abilities.” --J.K.Rowling “ There are only two ways to live your life.One is as though nothing is a miracle.The other is as though everything is a miracle.” --Albert Einstein “ The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.” --Jane Austen “ Imperfection is beauty, madness is genius and it 's better to be absolutely ridiculous than absolutely boring.” --Marilyn Monroe “ Try not to become a man of success.Rather become a man of value.” --Albert Einstein “ It is better to be hated for what you are than to be loved for what you are not.” --André Gide “ I have not failed.I 've just found 10,000 ways that won' t work.” --Thomas A.Edison “ A woman is like a tea bag; you never know how strong it is until it 's in hot water.” --Eleanor Roosevelt “ A day without sunshine is like, you know, night.” --Steve Martin Process finished with exit code 0