Issue
I really wanted to use pd.options.mode.chained_assignment = None
, but I wanted a code clean of error.
My start code:
import datetime
import altair as alt
import operator
import pandas as pd
s = pd.read_csv('../../data/aparecida-small-sample.csv', parse_dates=['date'])
city = s[s['city'] == 'Aparecida']
Based on @dpkandy's code:
city['total_cases'] = city['totalCases']
city['total_deaths'] = city['totalDeaths']
city['total_recovered'] = city['totalRecovered']
tempTotalCases = city[['date','total_cases']]
tempTotalCases["title"] = "Confirmed"
tempTotalDeaths = city[['date','total_deaths']]
tempTotalDeaths["title"] = "Deaths"
tempTotalRecovered = city[['date','total_recovered']]
tempTotalRecovered["title"] = "Recovered"
temp = tempTotalCases.append(tempTotalDeaths)
temp = temp.append(tempTotalRecovered)
totalCases = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_cases:Q', title = None))
totalDeaths = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_deaths:Q', title = None))
totalRecovered = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_recovered:Q', title = None))
(totalCases + totalRecovered + totalDeaths).encode(color=alt.Color('title', scale = alt.Scale(range = ['#106466','#DC143C','#87C232']), legend = alt.Legend(title="Legend colour"))).properties(title = "Cumulative number of confirmed cases, deaths and recovered", width = 800)
This code works perfectly and loaded normally the visualization image, but it still shows the pandas error, asking to try to set .loc[row_indexer,col_indexer] = value instead
, then I was reading the documentation "Returning a view versus a copy" whose linked cited and also tried this code, but it still shows the same error. Here is the code with loc
:
# 1st attempt
tempTotalCases.loc["title"] = "Confirmed"
tempTotalDeaths.loc["title"] = "Deaths"
tempTotalRecovered.loc["title"] = "Recovered"
# 2nd attempt
tempTotalCases["title"].loc = "Confirmed"
tempTotalDeaths["title"].loc = "Deaths"
tempTotalRecovered["title"].loc = "Recovered"
Here is the error message:
<ipython-input-6-f16b79f95b84>:6: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tempTotalCases["title"] = "Confirmed"
<ipython-input-6-f16b79f95b84>:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tempTotalDeaths["title"] = "Deaths"
<ipython-input-6-f16b79f95b84>:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tempTotalRecovered["title"] = "Recovered"
Jupyter and Pandas version:
$ jupyter --version
jupyter core : 4.7.1
jupyter-notebook : 6.3.0
qtconsole : 5.0.3
ipython : 7.22.0
ipykernel : 5.5.3
jupyter client : 6.1.12
jupyter lab : 3.1.0a3
nbconvert : 6.0.7
ipywidgets : 7.6.3
nbformat : 5.1.3
traitlets : 5.0.5
$ pip show pandas
Name: pandas
Version: 1.2.4
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/gus/PUC/.env/lib/python3.9/site-packages
Requires: pytz, python-dateutil, numpy
Required-by: ipychart, altair
Update 2
I followed the answer, it worked, but there is another problem:
temp = tempTotalCases.append(tempTotalDeaths)
temp = temp.append(tempTotalRecovered)
Error log:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value, self.name)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
<ipython-input-7-b2649a676837> in <module>
17 tempTotalRecovered.loc["title"] = _("Recovered")
18
---> 19 temp = tempTotalCases.append(tempTotalDeaths)
20 temp = temp.append(tempTotalRecovered)
21
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity, sort)
7980 to_concat = [self, other]
7981 return (
-> 7982 concat(
7983 to_concat,
7984 ignore_index=ignore_index,
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
296 )
297
--> 298 return op.get_result()
299
300
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/reshape/concat.py in get_result(self)
514 obj_labels = obj.axes[1 - ax]
515 if not new_labels.equals(obj_labels):
--> 516 indexers[ax] = obj_labels.get_indexer(new_labels)
517
518 mgrs_indexers.append((obj._mgr, indexers))
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
3169
3170 if not self.is_unique:
-> 3171 raise InvalidIndexError(
3172 "Reindexing only valid with uniquely valued Index objects"
3173 )
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Solution
This SettingWithCopyWarning
is a warning
and not an error
. The importance in this distinction is that pandas
isn't sure whether your code will produce the intended output so is letting the programmer make this decision where as a error
means that something is definitely wrong.
The SettingWithCopyWarning
is warning you about the difference between when you do something like df['First selection']['Second selection']
compared to df.loc[:, ('First selection', 'Second selection')
.
In the first case 2 separate events occur df['First selection']
takes place, then the object returned from this is used for the next seleciton returned_df['Second selection']
. pandas
has no way to know whether the returned_df
is the original df
or just temporary 'view' of this object. Most of the time is doesn't matter (see docs for more info)...but if you want to change a value on a temporary view of an object you'll be confused as to why your code runs error free but you don't see changes you made reflected. Using .loc
bundles 'First selection'
and 'Second selection'
into one call so pandas
can guarantee that what's returned is not just a view.
The documentation you linked show's you why your attempts to use .loc
didn't work at you intended (eg. taken from docs):
def do_something(df): foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows! # ... many lines here ... # We don't know whether this will modify df or not! foo['quux'] = value return foo
You have something similar in your code. Look at how tempTotalCases
is created:
city = s[s['city'] == 'Aparecida']
# some lines of code
tempTotalCases = city[['date','total_cases']]
And then some more lines of code before you attempt to do:
tempTotalCases.loc["title"] = "Confirmed"
So pandas
throws the warning.
Separate from your original question you might find df.rename()
useful. Link to docs.
You'll be able to do something like:
city = city.rename(columns={'totalCases': 'total_cases',
'totalDeaths': 'total_deaths',
'totalRecovered': 'total_recovered})
Answered By - Jason
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.