Issue
I am trying to group a dataframe by certain columns and then for each group, pass its column series as a list to a custom
function or lambda
and get a single aggregated result.
Here's a df:
orgid. appid. p. type. version
-------------------------------------------------
24e78b 4ef36d 1 None 3.3.7
24e78b 4ef36d 2 None 3.4.1
24e78b 4ef36d 1 None 3.3.7-beta-1
24e78b 4ef36d 1 None 3.4.0-mvn.1
24e78b 4ef36d 2 None 3.4.0-beta.5
24e78b 4ef36d 1 None 3.4.0-beta.1
24e78b 4ef36d 1 None 3.4.0
24e78b 4ef36d 1 None 3.3.5
So I have a function that takes a list of versions and returns a max
version string.
>> versions = ['3.4.0-mvn.1', '3.4.0-beta.1', '3.4.0', '3.3.7-beta-1', '3.3.7', '3.3.5', '3.4.0-beta-1']
>> str(max(map(semver.VersionInfo.parse, versions)))
'3.4.0'
Now I want to group the dataframe and then each group's version
series is passed to this function as a list and return a single version string.
I tried:
>> g = df.groupby(['orgid', 'appid', 'p', 'type'])
>> g['version'].apply(lambda x: str(max(map(semver.VersionInfo.parse, x.tolist()))))
Series([], Name: version, dtype: float64)
I get a empty series.
Expected output:
orgid. appid. p. type. version
24e78b 4ef36d 1 None 3.4.0
24e78b 4ef36d 2 None 3.4.1
I am also referencing this Pandas group by multiple custom aggregate function on multiple columns post here.
But couldn't get it right.
Solution
Try:
import semver
df["version"] = df["version"].apply(semver.VersionInfo.parse)
out = df.groupby(["orgid", "appid", "p", "type"], as_index=False).max()
print(out)
Prints:
orgid appid p type version
0 24e78b 4ef36d 1 None 3.4.0
1 24e78b 4ef36d 2 None 3.4.1
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.