python - efficiently convert uneven list of lists to minimal containing array padded with nan -
consider list of lists l
l = [[1, 2, 3], [1, 2]]
if convert np.array
i'll 1 dimensional object array [1, 2, 3]
in first position , [1, 2]
in second position.
print(np.array(l)) [[1, 2, 3] [1, 2]]
i want instead
print(np.array([[1, 2, 3], [1, 2, np.nan]])) [[ 1. 2. 3.] [ 1. 2. nan]]
i can loop, know how unpopular loops are
def box_pir(l): lengths = [i in map(len, l)] shape = (len(l), max(lengths)) = np.full(shape, np.nan) i, r in enumerate(l): a[i, :lengths[i]] = r return print(box_pir(l)) [[ 1. 2. 3.] [ 1. 2. nan]]
how do in fast, vectorized way?
timing
setup functions
%%cython import numpy np def box_pir_cython(l): lengths = [len(item) item in l] shape = (len(l), max(lengths)) = np.full(shape, np.nan) i, r in enumerate(l): a[i, :lengths[i]] = r return
def box_divikar(v): lens = np.array([len(item) item in v]) mask = lens[:,none] > np.arange(lens.max()) out = np.full(mask.shape, np.nan) out[mask] = np.concatenate(v) return out def box_hpaulj(lol): return np.array(list(zip_longest(*lol, fillvalue=np.nan))).t def box_simon(lol): max_len = len(max(lol, key=len)) return np.array([x + [np.nan]*(max_len-len(x)) x in lol]) def box_dawg(lol): cols=len(max(lol, key=len)) rows=len(lol) aoa=np.empty((rows,cols, )) aoa.fill(np.nan) idx in range(rows): aoa[idx,0:len(lol[idx])]=lol[idx] return aoa def box_pir(l): lengths = [len(item) item in l] shape = (len(l), max(lengths)) = np.full(shape, np.nan) i, r in enumerate(l): a[i, :lengths[i]] = r return def box_pandas(l): return pd.dataframe(l).values
this seems close 1 of this question
, padding zeros
instead of nans
. interesting approaches posted there, along mine
based on broadcasting
, boolean-indexing
. so, modify 1 line post there solve case -
def boolean_indexing(v, fillval=np.nan): lens = np.array([len(item) item in v]) mask = lens[:,none] > np.arange(lens.max()) out = np.full(mask.shape,fillval) out[mask] = np.concatenate(v) return out
sample run -
in [32]: l out[32]: [[1, 2, 3], [1, 2], [3, 8, 9, 7, 3]] in [33]: boolean_indexing(l) out[33]: array([[ 1., 2., 3., nan, nan], [ 1., 2., nan, nan, nan], [ 3., 8., 9., 7., 3.]]) in [34]: boolean_indexing(l,-1) out[34]: array([[ 1, 2, 3, -1, -1], [ 1, 2, -1, -1, -1], [ 3, 8, 9, 7, 3]])
i have posted few runtime results there posted approaches on q&a, useful.
Comments
Post a Comment