python - Calculate row value referring previous rows in panda dataframe -
i'm trying figure out way compare values different rows in dataframe calculate new column.
i've found these ways:
- iterate on rows (i'm looking vectorized solution):
for index, row in df.iterrows(): ....
- merge same dataframe multiple times using shift on index, this:
d1 = data.shift() data.merge(d1[["value col"]], how="inner", left_index=true, right_index=true)
is there way access current dataframe apply method:
dataframe.apply(myfunction(row),axis=1) def my_function(row, current_dataframe) index = row.name row_to_compare = current_dataframe.iloc[index-delta] row["new column"] = calc(row["value], row_to_compare["value"]) return row
passing argument doesn't seem work:
data.apply(date_diff,axis=1,args=(data)) or data.apply(lambda row,df: date_diff(row, df),axis=1,args=(data))
keeps saying:
> valueerror: truth value of dataframe ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all()
is there way make work?
thank you.
what want calculate?
if simple enough can vectorize completely. note can add column rather having separate merge step.
df["same"] = df[col] == df[col2].shift()
if bit more complex can split multiple steps above? still fast.
if need multiple columns , rows you have use apply process row row or column column quite slow. worst answer iterate! should never need this.
Comments
Post a Comment