Issue
I am trying to drop specific rows of dates where there are only one class, so far there are no errors shown during the isolating of single class dates but the deletion part is where I'm having troubles, to explain briefly, I isolated the dates where there are only one class into a to_drop_dates list, I then attempted to go through the dataframe to find said dates to drop them but when I try to find other dates that should not have been deleted, it came up empty
from google.colab import drive
drive.mount('/content/gdrive')
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
data = pd.read_csv('gdrive/My Drive/Colab_Notebooks/classproject/classdata.csv', parse_dates=['visit_date'], index_col='visit_date')
doctor_id = data['doctor_id']
visit_date = data.index.date
count = 0
data['count'] = [1 for _ in range(len(data))]
index = 1
for row in range(1, len(data)):
if doctor_id[index] == class_id[index - 1] and timestamp[index] == timestamp[index-1]:
# set the count for this index to +1 of the last index
data['count'][index] = data['count'][index - 1] + 1
#data count column = data count column previous count + 1
# (otherwise do nothing and leave it at 1)
# move on to next index
index += 1
data.head(10)
import numpy as np
unique_dates, unique_date_count = np.unique(visit_date, return_counts=True)
unique_date_count[:3]
to_drop_dates = []
unique_dates_index = 0
for row in unique_dates:
for row in unique_date_count:
while unique_dates_index < len(unique_dates):
if unique_date_count[unique_dates_index] == 1:
to_drop_dates.append(unique_dates[unique_dates_index])
unique_dates_index += 1
to_drop_dates[:9]
The output of to_drop_dates:
[datetime.date(2023, 4, 2),
datetime.date(2023, 6, 12),
datetime.date(2023, 6, 15),
datetime.date(2023, 6, 16),
datetime.date(2023, 9, 1),
datetime.date(2023, 9, 17)]
data_index = 0
dropdates_index = 0
for row in visit_date:
while data_index < len(visit_date):
for row in to_drop_dates:
while dropdates_index < len(to_drop_dates):
if to_drop_dates[dropdates_index] == visit_date[data_index]:
data.drop(index = visit_date)
dropdates_index += 1
data_index += 1
data['Date'] = data.index.date
df2 = data[(data['Date'] == "2021-09-28")]
print("Filter rows by dates:\n",df2)
Output:
Filter rows by dates:
Empty DataFrame
Columns: [queue_id, doctor_id, visit_purpose, count, Date]
Index: []
I tried doing this fix but now it messed up my counting of the dates from here part 1
ndates = dict(data.index.value_counts())
keep_dates = [k for (k,v) in ndates.items() if v > 1]
udf = data.loc[data.index.isin(keep_dates)]
udf
output:
count / date / class
3 / 2020-10-19 / A
5 / 2020-10-19 / A
1 / 2023-10-20 / A
1 / 2023-10-20 / A
1 / 2023-10-09 / A
2 / 2023-10-09 / B
1 / 2023-08-07 / A
1 / 2023-08-07 / B
3 / 2023-08-07 / A
1 / 2023-08-07 / B
1 / 2023-02-03 / A
1 / 2023-02-03 / B
supposed output:
count / timestamp / class_id
1 / 2021-09-27 06:00:00 / A
2 / 2021-09-27 03:00:00 / A
3 / 2021-09-27 01:00:00 / A
1 / 2021-09-27 08:29:00 / C
1 / 2021-05-23 08:08:49 / B
2 / 2021-05-23 03:21:49 / B
1 / 2021-05-23 01:22:11 / C
I also tried the other fix also same problem, may have to take into account the counting column
out = data[data.groupby(data.index.date).transform('size').gt(1)]
date / count / class
2021-09-28 / 1 / C
2021-09-27 / 1 / A
2021-09-27 / 2 / A
2021-09-27 / 3 / A
2021-09-27 / 4 / A
2021-02-01 / 1 / A
2021-02-02 / 1 / B
2021-02-01 / 1 / A
2021-09-28 / 1 / C
2021-09-28 / 2 / C
Solution
You can use groupby_transform
to remove unique dates:
out = data[data.groupby(data.index.date).transform('size').gt(1)]
Output:
>>> out
col1
2024-01-13 16:00:00 2
2024-01-16 18:00:00 5
2024-01-16 06:00:00 7
2024-01-13 18:00:00 8
Input:
>>> data
col1
2024-01-14 17:00:00 0
2024-01-11 06:00:00 1
2024-01-13 16:00:00 2 # 2 instances of 2024-01-13
2024-01-09 04:00:00 3
2024-01-17 07:00:00 4
2024-01-16 18:00:00 5 # 2 instances of 2024-01-16
2024-01-18 23:00:00 6
2024-01-16 06:00:00 7 # 2 instances of 2024-01-16
2024-01-13 18:00:00 8 # 2 instances of 2024-01-13
2024-01-07 22:00:00 9
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.