Issue
I am using (the very underrated) xlsx2html to convert Excel files to HTML files that I can then manipulate with Beautiful Soup for other processes. An issue I'm running into is that xlsx2html does not add line breaks to cells with multiple lines as shown in the image below.
This results in an output like the below where all of the lines are on a single line:
I need an output where each line is on its own line:
I know I need to add line breaks within the cells with multiple lines, but I cannot figure out how to do that programmatically. My thought process is to search each tag and count the number of lines of code it contains. If more than 1 line of code, add a line break (or preferably a paragraph tag). However, I cannot figure out how to count the number of lines.
I wish I had code to share what I've tried to do, but I'm at a complete blank. Any ideas? Thank you!
Solution
This works for me, just add <br>
to the string in spots where line doesn't end with >\n
, but still ends with \n
. I suck at regex so I just used replace
in a hacky way.
Input:
Code:
from xlsx2html import xlsx2html
import random
in_file = xlsx2html("/path/to/your/Untitled spreadsheet.xlsx")
with open("/path/to/output_html.html", "w") as f:
in_file.seek(0)
v = in_file.read()
# I suck at regex, but you can do re.sub("pattern", "<br>", v)
# Where "pattern" selects lines that don't end with ">\n"
hsh = str(random.getrandbits(128))
v = v.replace(">\n", hsh)
v = v.replace("\n", "<br>")
v = v.replace(hsh, ">\n")
print(v)
f.write(v)
Output:
Answered By - cosmic_inquiry
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.