Issue
I need to read a table from confluence page and store it into a Json or csv file. When I ran my code and try to print the confluence page content, I got response below. However, When I am trying to parse table and print rows I am getting an error.
why BeautifulSoup is not able to find the table?
Error:
C:\User>testScript_confulenec
None
Traceback (most recent call last):
File "C:\User\testScript_confulenec.py", line 34, in <module>
for response_data in table.find_all('tbody'):
AttributeError: 'NoneType' object has no attribute 'find_all'
Confluence page content from python
{'results': [{'id': '2533457945', 'type': 'page', 'status': 'current', 'title': 'checkTable', 'space': {'id': 11111111, 'key': 'demo', 'name': 'Monitoring', 'type': 'global', 'status': 'current', '_expandable': {'settings': '/rest/api/space/demo/settings', 'metadata': '', 'operations': '', 'lookAndFeel': '/rest/api/settings/lookandfeel?spaceKey=demo', 'identifiers': '', 'permissions': '', 'icon': '', 'description': '', 'theme': '/rest/api/space/demo/theme', 'history': '', 'homepage': '/rest/api/content/11111111'}, '_links': {'webui': '/spaces/demo', 'self': 'https://test.net/wiki/rest/api/space/demo'}}, 'macroRenderedOutput': {},
'body': {'view': {'value': '<div class="table-wrap">
<table data-layout="wide" data-local-id="5f6d5d7f-00ee-4788-8999-e16daab2ba6c" class="confluenceTable"><tbody>
<tr><td class="confluenceTd"><p>id</p></td>
<td class="confluenceTd"><p>Id</p></td>
<td class="confluenceTd"><p>Name</p></td>
<td class="confluenceTd"><p>severity</p></td>
<td class="confluenceTd"><p>timeAvailable</p></td>
<td class="confluenceTd"><p>timeProcessing</p></td>
<td class="confluenceTd"><p>timeDelivering</p></td>
<td class="confluenceTd"><p>Group</p></td>
</tr><tr><td class="confluenceTd"><p>1</p></td>
<td class="confluenceTd"><p>1</p></td><td
class="confluenceTd"><p>test</p></td>
<td class="confluenceTd"><p>P1</p></td>
<td class="confluenceTd"><p>10</p></td>
<td class="confluenceTd"><p>10</p></td>
<td class="confluenceTd"><p>10</p></td>
<td class="confluenceTd"><p>test_group</p></td>
</tr><tr><td class="confluenceTd"><p>1</p></td>
<td class="confluenceTd"><p>2</p></td><td
class="confluenceTd"><p>test2</p></td>
<td class="confluenceTd"><p>P1</p></td>
<td class="confluenceTd"><p>10</p></td>
<td class="confluenceTd"><p>10</p></td>
<td class="confluenceTd"><p>10</p></td>
<td class="confluenceTd"><p>test2_group</p></td>
testScript_confulenec.py
# This code sample uses the 'requests' library:
# http://docs.python-requests.org
import requests
from requests.auth import HTTPBasicAuth
import json
from bs4 import BeautifulSoup
url = "https://test.net/wiki/rest/api/content?spaceKey=demo&title=checkTable&expand=space,body.view"
auth = HTTPBasicAuth("[email protected]", "********")
headers = {
"Accept": "application/json"
}
response = requests.request(
"GET",
url,
headers=headers,
auth=auth
)
#print(json.dumps(json.loads(response.text), sort_keys=True, indent=4, separators=(",", ": ")))
#print(response.json())
#Parsing the HTML file
soup = BeautifulSoup(response.text, 'html.parser')
#selecting the table
table = soup.find('table', class_ = 'confluenceTable')
print(table)
#storing all rows into one variable
for response_data in table.find_all('tbody'):
rows = response_data.find_all('tr')
print(rows)
Solution
As mentioned the response contains JSON not valid HTML so you have to extract the HTML string from ....['body']['view']['value']
first:
soup = BeautifulSoup(response.json()['results'][0]['body']['view']['value'], 'html.parser')
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.