Issue
I have a sample data (Data_sample_truncated.txt
) which I truncated from a big data. It has 3 fields - "Index", "Time" and "RxIn.Density[**x**, ::]"
Here I used x as integer as x
can vary for any range. In this data it is 0-15
. The combination of the 3 column fields is unique. For different "Index"
field the "Time" and "RxIn.Density[**x**, ::]"
can be same or different. For each new "Index"
value the data has a blank line
and almost similar column headers except for "RxIn.Density[**x**, ::]"
where x
is increasing when new "Index"
value is reached. The data which I export from ADS
(circuit simulation software) gives me like this format while exporting.
Now I want to format the data so that all the data are merged together under 3 unique column fields - "Index", "Time" and "RxIn.Density"
. You can see I want to remove the strings [**x**, ::]
in the new dataframe of the 3rd column. Here is the sample final data file that I want after formatting (Data-format_I_want_after_formatting.txt
). So I want the following -
- The blank lines (or rows) to be removed
- All the other header lines to be removed keeping the top header only and changing the 3rd column header to
"RxIn.Density"
- Keeping all the data merged under the unique column fields -
"Index", "Time" and "RxIn.Density"
, even if the data values are duplicate.
My MATLAB code is in the below:
import pandas as pd
#create DataFrame from csv with columns f and v
df = pd.read_csv('Data_sample_truncated.txt', sep="\s+", names=['index','time','v'])
#boolean mask for identify columns of new df
m = df['v'].str.contains('RxIn')
#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()
#get original ordering for new columns
#cols = df['g'].unique()
#remove rows with same values in v and g columns
#df = df[df['v'] != df['g']]
df = df.drop_duplicates(subset=['index', 'time'], keep=False)
df.to_csv('target.txt', index=False, sep='\t')
The generated target.txt
file is not what I wanted. You can check it here. Can anyone help what is wrong with my code and what to do to fix it so that I wan my intended formatting?
I am using Spyder 3.2.6 (Anaconda) where python 3.6.4 64-bit is embedded.
Solution
You can just filter out rows, that you do not want(check this):
import pandas as pd
df = pd.read_csv('Data_sample_truncated.txt', sep="\s+")
df.columns = ["index","time","RxIn.Density","1"]
del df["1"]
df = df[~df["RxIn.Density"].str.contains("Rx")].reset_index(drop=True)
df.to_csv('target.txt', index=False, sep='\t')
Answered By - Dmitriy Kisil
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.