python - Sorting within each subgroup and summing first three values -
i have pandas data frame , there 3 columns, state_name, county_name, population. population numeric data. question want answer looking @ 3 populous counties each state, 3 populous states. think first need groupby state_name , county_name. can that. after confused how proceed. new pandas, guidance help
here's dummy data (please include sample of data in future).
state_name,county_name,population state1,state1_a,100 state1,state1_b,8000 state1,state1_c,75 state1,state1_d,876 state1,state1_e,2938 state2,state2_a,200 state2,state2_b,16000 state2,state2_c,75 state2,state2_d,876 state2,state2_e,5876
let's set index state_name , county_name, , select 'population' column return multiindexed pandas.series
df = pd.read_clipboard() # have done index_col=[0,1] here df = df.set_index(['state_name','county_name']) s = df.population
now can series.groupby , use nlargest on (wouldn't work on dataframe, that's why use series):
s.groupby(level='state_name').nlargest(3) state_name state_name county_name state1 state1 state1_b 8000 state1_e 2938 state1_d 876 state2 state2 state2_b 16000 state2_e 5876 state2_d 876 name: population, dtype: int64
Comments
Post a Comment