Issue
I had this doubt, often datasets have the Age column values in either int or float datatype (Eg Titanic). So suppose the column has all float values, should you convert them all to int or let it be just like that while feeding it to ML Model, Does it have any harm or adverse effects in prediction results and what's the right way?
Solution
age
is a continuous variable: every moment that passes you age, you don't age incrementally once a year, so the data type which most closely reflects reality is a float
and not an integer
. However using a float
or an integer
depends on the use case, eg:
- Are you using
age
as a feature describing how old people are? Better use float (eg a person who is 59.9 is older than a person who is 59.1 and may be more likely to develop certain medical conditions, or maybe less physically fit and less likely to survive in an event of a sinking ship) - Are you reporting on
age
groups? Might be better off rounding to nearest integer (eg 39.9 -> 40, 34.2 -> 34) and potentially binning (eg 25-34, 35-45) - Are you looking at age from a legal standpoint? (e.g. analysis of underage drinking) then you should use the rounded down
int
value (eg if legal age is 16 and a person is 15.9, legally they are 15 and therefore underage drinking) - etc...
As a general remark you'll often find that there is no single "right way"
of dealing with data, it all depends on the use case.
Answered By - Max
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.