Issue
I have a small program file, here is the relevant code:
import numpy as np
import pandas as pd
from docx import Document
#### Setup the file names, also make provisions for having the user select the file ####
SHRD_filename = "SHRD - SVN 12485.docx"
SHDD_filename = "SHDD - SVN 12485.doc"
#SHRD_name = PCB_utility.get_file('Select SHRD file')
#SHDD_name = PCB_utility.get_file('Select SHDD file')
data = []
keys = {}
document_SHRD = Document(SHRD_filename)
tables_SHRD = document_SHRD.tables[30]
for i, row in enumerate(tables_SHRD.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
df_SHRD = pd.DataFrame.from_dict(data)
#cols = df_SHRD.columns.tolist()
print(df_SHRD.tail(20))
s = df_SHRD['HLR Trace Tag'].str.split(' ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1)
s.name = 'HLR Tags'
del df_SHRD['HLR Trace Tag']
df_SHRD.join(s)
When I initially make the dataframe, it looks like this:
300 HLR-0000094 HLR-0000095 HLR-0000340 LRU-0000440
301 HLR-0000094 HLR-0000095 HLR-0000341 LRU-0000441
302 HLR-0000094 HLR-0000095 HLR-0000342 LRU-0000442
303 HLR-0000675 LRU-0000745
304 HLR-0000676 LRU-0000746
305 HLR-0000677 LRU-0000747
306 HLR-0000678 LRU-0000748
307 HLR-0000679 LRU-0000749
308 HLR-0000680 LRU-0000750
I need to split the HLR tags into individual rows. At the end of my program it comes back as this:
300 LRU-0000440
301 LRU-0000441
302 LRU-0000442
303 LRU-0000745
304 LRU-0000746
305 LRU-0000747
306 LRU-0000748
307 LRU-0000749
308 LRU-0000750
But when I retype:
In [25]:df_SHRD.join(s)
Out[25]:
300 LRU-0000440 HLR-0000094
300 LRU-0000440 HLR-0000095
300 LRU-0000440 HLR-0000340
301 LRU-0000441 HLR-0000094
301 LRU-0000441 HLR-0000095
301 LRU-0000441 HLR-0000341
302 LRU-0000442 HLR-0000094
302 LRU-0000442 HLR-0000095
302 LRU-0000442 HLR-0000342
303 LRU-0000745 HLR-0000675
304 LRU-0000746 HLR-0000676
305 LRU-0000747 HLR-0000677
306 LRU-0000748 HLR-0000678
307 LRU-0000749 HLR-0000679
308 LRU-0000750 HLR-0000680
[457 rows x 2 columns]
Any help would be appreciated on why the command works in the IPython window but not in the script.
Solution
DataFrame.join
(other, ...
)Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
Returns:
joined
:DataFrame
join
is not an inplace operation. It returns a result that must be assigned back to another variable if you want to store the result.df = df_SHRD.join(s)
IPython displays results when printing variables without the
print
call, while running through a script does not. This is because of IPython's REPL nature. In either case, you must assign the result back. Try printingdf_SHRD.join(s)
followed bydf_SHRD
in IPython, and you'll see.
Answered By - cs95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.