python - efficiently convert uneven list of lists to minimal containing array padded with nan -


consider list of lists l

l = [[1, 2, 3], [1, 2]] 

if convert np.array i'll 1 dimensional object array [1, 2, 3] in first position , [1, 2] in second position.

print(np.array(l))  [[1, 2, 3] [1, 2]] 

i want instead

print(np.array([[1, 2, 3], [1, 2, np.nan]]))  [[  1.   2.   3.]  [  1.   2.  nan]] 

i can loop, know how unpopular loops are

def box_pir(l):     lengths = [i in map(len, l)]     shape = (len(l), max(lengths))     = np.full(shape, np.nan)     i, r in enumerate(l):         a[i, :lengths[i]] = r     return  print(box_pir(l))  [[  1.   2.   3.]  [  1.   2.  nan]] 

how do in fast, vectorized way?


timing

enter image description here

enter image description here

setup functions

%%cython import numpy np  def box_pir_cython(l):     lengths = [len(item) item in l]     shape = (len(l), max(lengths))     = np.full(shape, np.nan)     i, r in enumerate(l):         a[i, :lengths[i]] = r     return 

def box_divikar(v):     lens = np.array([len(item) item in v])     mask = lens[:,none] > np.arange(lens.max())     out = np.full(mask.shape, np.nan)     out[mask] = np.concatenate(v)     return out  def box_hpaulj(lol):     return np.array(list(zip_longest(*lol, fillvalue=np.nan))).t  def box_simon(lol):     max_len = len(max(lol, key=len))     return np.array([x + [np.nan]*(max_len-len(x)) x in lol])  def box_dawg(lol):     cols=len(max(lol, key=len))     rows=len(lol)     aoa=np.empty((rows,cols, ))     aoa.fill(np.nan)     idx in range(rows):         aoa[idx,0:len(lol[idx])]=lol[idx]     return aoa  def box_pir(l):     lengths = [len(item) item in l]     shape = (len(l), max(lengths))     = np.full(shape, np.nan)     i, r in enumerate(l):         a[i, :lengths[i]] = r     return  def box_pandas(l):     return pd.dataframe(l).values 

this seems close 1 of this question, padding zeros instead of nans. interesting approaches posted there, along mine based on broadcasting , boolean-indexing. so, modify 1 line post there solve case -

def boolean_indexing(v, fillval=np.nan):     lens = np.array([len(item) item in v])     mask = lens[:,none] > np.arange(lens.max())     out = np.full(mask.shape,fillval)     out[mask] = np.concatenate(v)     return out 

sample run -

in [32]: l out[32]: [[1, 2, 3], [1, 2], [3, 8, 9, 7, 3]]  in [33]: boolean_indexing(l) out[33]:  array([[  1.,   2.,   3.,  nan,  nan],        [  1.,   2.,  nan,  nan,  nan],        [  3.,   8.,   9.,   7.,   3.]])  in [34]: boolean_indexing(l,-1) out[34]:  array([[ 1,  2,  3, -1, -1],        [ 1,  2, -1, -1, -1],        [ 3,  8,  9,  7,  3]]) 

i have posted few runtime results there posted approaches on q&a, useful.


Comments

Popular posts from this blog

php - How to display all orders for a single product showing the most recent first? Woocommerce -

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

angularjs - How restrict admin panel using in backend laravel and admin panel on angular? -