Issue
I have been trying to write a short script to be run directly in jupyter notebook. It simply scrolls through texts (400 words on avg.) in pandas df and asks user for a label.
I am struggling with finding an elegant solution that would highlight all substrings 'eu' in the text to be printed out.
In an other thread, I have found this printmd function that I use to highlight the "eu" substring. However, this only works for the first appearance and breaks the lines as well.
import sys
from IPython.display import clear_output
from IPython.display import Markdown, display
def printmd(string):
display(Markdown(string))
printmd('**bold**')
labels = []
for i in range(0,len(SampleDf)):
clear_output() # clear the output before displaying another article
print(SampleDf.loc[i]['article_title'])
lc = SampleDf.loc[i]['article_body'].lower() # the search is case sensitive
pos = lc.find('eu') # where is the 'eu' mentioned
print(SampleDf.loc[i]['article_body'][:pos])
printmd('**eu**')
print(SampleDf.loc[i]['article_body'][pos+2:])
var = input("press y if the text is irrelevant" )
if var == 'y':
label = 0 # 0 for thrash
else:
label = 1 # 1 for relevant
labels.append(label)
I would love to get rid of the line breaks introduced by the separate print statements and highlight all mentions of the "eu".
Solution
Look at this as string processing, not an output problem. If I'm understanding your needs properly, this is a simple replace
usage:
new_text = old_text.replace("eu", "**eu**")
If you still need your single-token mode, then
suppressing a line feed is a simple matter of using the print
parameter for that purpose:
print('**eu**', end='')
Answered By - Prune
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.