Saturday, October 8, 2022

[FIXED] read_excel -read hyperlinks from column with also NaN values

October 08, 2022 excel, hyperlink, pandas, python-3.x No comments

Issue

I found this instruction for reading the links within the one column that contains hyperlinks.

What I have i excel that contains column, that have 2 kinds of values: NaN or Link with: text and hyperlink. Example:

NaN
NaN
NaN
Link_text_1:www.something...
Link_text_2:www.something_else...
NaN

When I load the excel file, it return only the Text from column and hyperlink is removed.

I tried special instructions, but they do not work.

Base of my code is:

path=r'path_to_file'
df = pd.read_excel(path, sheet_name='docs')
link_column = df[["Unnamed: 18"]]
print(link_column)

And then I will see NaN or Link_text.

I tried:

df_2 = pd.read_excel(path, sheet_name='Jobs', converters={"Unnamed: 18": lambda x: str(x.value) + "|"+ str(x.hyperlink.target)})

But it return error:

df_2 = pd.read_excel(path, sheet_name='Jobs', converters={"Unnamed: 18": lambda x: str(x.value) + "|"+ str(x.hyperlink.target)}) # read sheet Jobs from excel file
AttributeError: 'str' object has no attribute 'value'

I have tried to google this error but I could not find something that works for me.

Solution

I'm not sure that pandas is capable of parsing Excel hyperlinks.

So, as a workaround, you can use openpyxl.worksheet.hyperlink module to get a list of the hyperlinks in a worksheet/column then create a series in your dataframe based on this list.

Try this :

from openpyxl import load_workbook
import pandas as pd

wb = load_workbook('test.xlsx')
ws = wb['Sheet1']

list_of_links = []
for i in range(2, ws.max_row + 1):
    try:
        list_of_links.append(ws.cell(row=i, column=1).hyperlink.target)
    except AttributeError:
        list_of_links.append(np.nan)
        
df = pd.read_excel('test.xlsx')
df['Links'] = list_of_links

# Output:

print(df)

          ColA                          Links
0          NaN                            NaN
1          NaN                            NaN
2          NaN                            NaN
3  Link_text_1       http://www.something.../
4  Link_text_2  http://www.something_else.../
5          NaN                            NaN

# Worksheet used :

Answered By - M92_

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, October 8, 2022

[FIXED] read_excel -read hyperlinks from column with also NaN values

Issue

Solution

# Output:

# Worksheet used :

0 comments:

Post a Comment

Popular Posts

Labels