Issue
** Problem **
Replace existing column value in destination data frame based on separate lookup type operations to match Code/Value column in another separate source data frame, and update, e.g., replace text in destination data frame column with Code/Value of source data. Basically replacing something like '10' with 'Your Full Name'.
** Code attempt with Key Error **
This effort raised a Key Error.
countynames.set_index('CountyCode')
employee['County.Code'] = countynames.lookup(countynames.index, countynames['CountyCode'])
** Potential Solution Idea **
Something similar to have apply() function lookup employee['County.Code'] in dataframe 'countynames' and replace/overwite/update the existing employee['County.Code'] with the countynames['Value'].
Looking into alternative approaches, as my first attempt resulted in KeyError.
### potential approach 1:
employee['County.Code'] = countynames.apply(lambda x: employee.loc[x['County.Code'], x['Value']], axis=1)
### potential approach 2:
employee['County.Code']<- lapply(employee, function(x) look$class[match(x, look$CountyCode)])
** Experimental code **
employee = pd.read_csv("employee_data.csv")
countynames = pd.read_csv("County Codes.csv")
employee['County.Code']
0 34
1 34
2 34
3 34
4 55
Name: County.Code, dtype: int64
Source, lookup data frame:
countynames.head()
CountyCode Value
0 1 Alameda
1 2 Alpine
2 3 Amador
3 4 Butte
4 5 Calaveras
** Error: KeyError **
Error is raised on columns.get_loc(item)
KeyError Traceback (most recent call last)
Input In [410], in <cell line: 2>()
1 countynames.set_index('CountyCode')
----> 2 employee['County.Code'] = countynames.lookup(countynames.index, countynames['CountyCode'])
File ~\anaconda3\lib\site-packages\pandas\core\frame.py:4602, in DataFrame.lookup(self, row_labels, col_labels)
4600 result = np.empty(n, dtype="O")
4601 for i, (r, c) in enumerate(zip(row_labels, col_labels)):
-> 4602 result[i] = self._get_value(r, c)
4604 if is_object_dtype(result):
4605 result = lib.maybe_convert_objects(result)
File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3615, in DataFrame._get_value(self, index, col, takeable)
3612 series = self._ixs(col, axis=1)
3613 return series._values[index]
-> 3615 series = self._get_item_cache(col)
3616 engine = self.index._engine
3618 if not isinstance(self.index, MultiIndex):
3619 # CategoricalIndex: Trying to use the engine fastpath may give incorrect
3620 # results if our categories are integers that dont match our codes
3621 # IntervalIndex: IntervalTree has no get_loc
File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3931, in DataFrame._get_item_cache(self, item)
3926 res = cache.get(item)
3927 if res is None:
3928 # All places that call _get_item_cache have unique columns,
3929 # pending resolution of GH#33047
-> 3931 loc = self.columns.get_loc(item)
3932 res = self._ixs(loc, axis=1)
3934 cache[item] = res
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
KeyError: 1
Solution
Always hard without having the data.
But try:
employee['County.Code'].replace(countynames.set_index("CountyCode")["Value"].to_dict(), inplace=True)
Answered By - bitflip
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.