python - Most efficient method to combine pandas DataFrames which have the same column value -
for example, have 2 dataframe contain identical sample name different feature data.
i want compare how many samples existed in both dataframe.
data here
a dummy way achieve problem have though about:
hit = 0 in range(0,len(df1),1): j in range(0,len(df2),1): if df1.sample_name.iloc[i] == df2.sample_name.iloc[j]: hit+=1
i thouth loop procedure may waste lot of time. there simple technology takcle with?
beside, how extract subset of each dataframe idential sample_name , connect feature data new dataframe.
i have tried pd.concat(df1, df2, keys = 'sample_name')
here's vectorized approach using numpy broadcasting
hit
value -
np.count_nonzero(df1.sample_name.values[:,none] == df2.sample_name.values)
Comments
Post a Comment