如何去除爬取网站数据中的转义字符？

作者: nijia 发布: 2024年11月8日 6阅读 0评论

如何去除爬取网站数据中的转义字符？针对问题中出现的 “” 和 “ &#8…

如何去除爬取网站数据中的转义字符？

针对问题中出现的 “” 和 “

” 等转义字符，可以通过以下方法将其去除：

1.使用正则表达式：

import re  html = "<p style="width: 100%;">(.*)</p>" dr = re.compile(r'<[^>]+>', re.s)  contant =re.findall(findcontant1, item) if len(contant) <= 0:     contant = re.findall(findcontant2, item) contant = dr.sub('', str(contant))

登录后复制

2.使用beautifulsoup进行解析：

from bs4 import BeautifulSoup import re  html = "<p style="width: 100%;">(.*)</p>" soup = BeautifulSoup(html, "html.parser") contant = soup.find('p').text

登录后复制

经过上述处理，即可去除转义字符，获得干净的文本内容。

以上就是如何去除爬取网站数据中的转义字符？的详细内容，更多请关注php中文网其它相关文章！

本文来自网络，不代表甲倪知识立场，转载请注明出处：http://www.spjiani.cn/wp/4660.html