Issue
I need to count the position of each variable value in the data frame. For example let's use this data frame:
Create the DataFrame
data = {
'ol': ['H_KXKnn1_01_p_lk0', 'H_KXKnn1_02_p_lk0', 'H_KXKnn1_03_p_lk0'],
'nl': [12.01, 89.01, 25.01],
'nol': ['Xn', 'Ln', 'Rn'],
'nolp': [68, 70, 72],
'nolxx': [0.0, 1.0, 5.0]
}
df = pd.DataFrame(data)
and I am saving this data frame as .dat
df.to_csv('your_file.dat', sep='\t', index=False)
When I count the position where each value character starts and ends in the .dat file I would have:
variable position (start,end)
ol (0,17)
nl (18,23)
nol (24,26)
nolp (27,29)
nolxx (30,33)
I am considering the "_", ".", and space as character as well. However when I run this code that iterate over each column:
for col in df.columns:
col_length = df[col].astype(str).apply(len).max() + df[col].astype(str).apply(lambda x: x.count('_') + x.count('.')).max()
positions[col] = (current_pos, current_pos + col_length - 1)
current_pos += col_length + 1
positions_df = pd.DataFrame(list(positions.items()), columns=['Variable', 'Position'])
it returns the following values:
Variable Position
ol (0, 20)
nl (22, 27)
nol (29, 30)
nolp (32, 33)
nolxx (35, 38)
I am not sure why it is returning different numbers/position. Any help how I can do that is very welcome! Thank you!!
Solution
The length of all strings in first column is 17
. You are adding an additional value to it that makes the result different. Since you have 3 '_'
's in the string, it becomes 17 + 3 = 20.
df[col].astype(str).apply(lambda x: x.count('_') + x.count('.')
Here is the modified version of your code that produces the same output as you get from your first code :
positions = {}
current_pos = 0
for col in df.columns:
col_length = df[col].astype(str).apply(len).max()
positions[col] = (current_pos, current_pos + col_length)
current_pos += col_length + 1
positions_df = pd.DataFrame(list(positions.items()), columns=['Variable', 'Position'])
And here is the output :
Variable Position
0 ol (0, 17)
1 nl (18, 23)
2 nol (24, 26)
3 nolp (27, 29)
4 nolxx (30, 33)
Answered By - Mohsen_Fatemi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.