Issue
We have a Panda data frame as below.
Unnamed:0 T1 T2 T3 ... T120
0 cheetah Running x1 x2 x1 ... x3
1 Running Jaguar x1 x10 x3 ... x7
2 Cougar Running x1 x2 x1 ... x3
3 Bengal Tiger Running x5 x2 x4 ... x17
4 Sleeping Bali Tiger x55 x61 x11 ... x31
5 Javan Leopard Sleeping x42 x67 x17 ... x34
6 Leopard Running x2 x5 x2 ... x3
7 Bengal Tiger Running x5 x2 x4 ... x17
.. ... ... ... ... ... ...
199 Florida Panther Eating x71 x80 x101 ... x94
200 Running Eastern Cougar x5 x1 x2 ... x3
201 Congo Lion Sleeping x57 x61 x14 ... x38
We are trying to reorganize this data frame as below. In the above data frame, the first column is unnamed. We try to check for known common behaviors such as "Running", "Sleeping", etc, in that unnamed column and try to rearrange the data frame as below.
Unnamed:0 T1 T2 T3 ... T120
0 cheetah Running x1 x2 x1 ... x3
1 Running Jaguar x1 x10 x3 ... x7
2 Cougar Running x1 x2 x1 ... x3
3 Bengal Tiger Running x5 x2 x4 ... x17
4 Running Eastern Cougar x5 x1 x2 ... x3
5 Bengal Tiger Running x5 x2 x4 ... x17
6 Leopard Running x2 x5 x2 ... x3
4 Sleeping Bali Tiger x55 x61 x11 ... x31
5 Javan Leopard Sleeping x42 x67 x17 ... x34
6 Congo Lion Sleeping x57 x61 x14 ... x38
.. ... ... ... ... ... ...
201 Florida Panther Eating x71 x80 x101 ... x94
I attempted it in the following way but I added a name to the column. I tried df[df.columns.str.contains('^Unnamed')]
but no luck.
import pandas as pd
df = pd.read_csv('a_behav_cat.csv')
df_new = pd.DataFrame()
df_new = df_new.append(df[df["name"].str.contains("Running")])
df_new = df_new.append(df[df["name"].str.contains("Sleeping")])
print(df_new.to_string())
QUESTION 1:
I think there should be a better or Pythonic way to do this. Appreciate your kind consideration on this. Also this checks for the exact match of the string which is not ideal as the dataset may have simple "running" and simple "sleeping" :), etc. I tried .lower()
function which didn't work.
PURPOSE: The purpose of this is to identify for a single observation how many different x categories are there. Here T1, T2, T3, ... T120 are observations. We need to identify how many common values are there for each observation. i.e. for T1, for "Running" there are 3, 'x1's and 3 'x5's and 1 'x2'.
To do this first we thought of rearranging the data frame as described above.
Yet, we are not sure whether this rearrangement is required for the purpose. Also, the output looks multi-dimensional. That is for T1, for "Running" how many x1, x3, x5s are there need to be stored. Similarly, this needs to be applied to other behaviors such as "eating", "sleeping", etc.
QUESTION2: What is the best approach to achieve this? Any data structure that is suitable for this purpose? Are there a better way of achieving the purpose stated above without rearranging the dataframe?
Here's a sample csv if you wanna do a test.
,T1,T2,T3,T4
cheetah Running,x1,x2,x1,x3
Running Jaguar,x1,x10,x3,x7
Cougar Running,x1,x2,x1,x3
Bengal Tiger Running,x5,x2,x4,x17
Sleeping Bali Tiger,x55,x61,x11,x31
Javan Leopard Sleeping,x42,x67,x17,x34
Leopard Running,x2,x5,x2,x3
Bengal Tiger Running,x5,x2,x4,x17
Florida Panther Eating,x71,x80,x101,x94
Running Eastern Cougar,x5,x1,x2,x3
Congo Lion Sleeping,x57,x61,x14,x38
Solution
IIUC, you can use a dict mapping to set the right category to the row:
# Your list of patterns
MAPPING = {'S': ['sleep', 'sleeping'],
'R': ['run', 'running'],
'E': ['eat', 'eating']}
# Reverse the mapping (swap keys and values)
rev = {v: k for k, l in MAPPING.items() for v in l}
# Create the regex pattern
pat = fr"\b({'|'.join(rev)})\b"
# Extract from data
df['CAT'] = df['Unnamed: 0'].str.lower().str.extract(pat, expand=False).map(rev)
Output:
>>> df
Unnamed: 0 T1 T2 T3 T4 CAT
0 cheetah Running x1 x2 x1 x3 R
1 Running Jaguar x1 x10 x3 x7 R
2 Cougar Running x1 x2 x1 x3 R
3 Bengal Tiger Running x5 x2 x4 x17 R
4 Sleeping Bali Tiger x55 x61 x11 x31 S
5 Javan Leopard Sleeping x42 x67 x17 x34 S
6 Leopard Running x2 x5 x2 x3 R
7 Bengal Tiger Running x5 x2 x4 x17 R
8 Florida Panther Eating x71 x80 x101 x94 E
9 Running Eastern Cougar x5 x1 x2 x3 R
10 Congo Lion Sleeping x57 x61 x14 x38 S
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.