Issue
So I'm trying to read in an excel document and then to select x amount of rows that are random and not replaced. I'm getting the Error when I try to run and would love for some guidance. I'm writing a Jupyter Notebook using VS Code.
#import libraries.
import os
import subprocess
import sys
import pandas as pd
import numpy as np
import tkinter as tk
#allow user to browse for specific excel file
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
sizeOfSample = 10
#read in excel as dataframe after user selects file in explorer
df = pd.read_excel (file_path)
#select random rows from df to display.
number_of_rows = df.shape[0]
random_indices = np.random.choice(number_of_rows, size=sizeOfSample, replace=False)
random_rows = df[random_indices, :]
print (random_rows)
This is the output I'm getting.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1716/1509119795.py in <module>
21 number_of_rows = initArr.shape[0]
22 random_indices = np.random.choice(number_of_rows, size=sizeOfSample, replace=False)
---> 23 random_rows = initArr[random_indices, :]
24
25 print (random_rows)
C:\Python39\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
C:\Python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3359 casted_key = self._maybe_cast_indexer(key)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
3363 raise KeyError(key) from err
C:\Python39\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
C:\Python39\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '(array([ 109, 1280, 427, 531, 1563, 102, 1774, 802, 560, 0]), slice(None, None, None))' is an invalid key
Solution
Replace:
random_rows = df[random_indices, :]
By:
random_rows = df.loc[random_indices, :]
But you can use:
random_rows = df.sample(n=sizeOfSample, replace=True)
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.