Limits on Python Lists? -
i'm trying assimilate bunch of information usable array this:
for (dirpath, dirnames, filenames) in walk('e:/machin lerning/econ/full_set'): ndata.extend(filenames) in ndata: currfile = open('e:/machin lerning/econ/full_set/' + str(i),'r') rawdata.append(currfile.read().splitlines()) currfile.close() rawdata = numpy.array(rawdata) order,file in enumerate(rawdata[:10]): in rawdata[order]: r = i.split(',') pdata.append(r) fdata.append(pdata) pdata = [] fdata = numpy.array(fdata) plt.figure(1) plt.plot(fdata[:,1,3])
edit: after printing ftada.shape when using first 10 txt files
for order,file in enumerate(rawdata[:10]):
i see (10, 500, 7). if not limit size of this, , instead
for order,file in enumerate(rawdata):
then fdata.shape (447,) seems happens whenever increase number of elements through in rawdata array above 13... it's not specific location either - changed
for order,file in enumerate(rawdata[11:24):
and worked fine. aaaaahhh in case it's useful: here's sample of text files looks like:
20080225,a,31.42,31.79,31.2,31.5,30575 20080225,aa,36.64,38.95,36.48,38.85,225008 20080225,aapl,118.59,120.17,116.664,119.74,448847
looks fdata
array, , error in fdata[:,1,3]
. tries index fdata
3 indices, slice, 1, , 3. if fdata
2d array, produce error - too many indices
.
when 'indexing' errors, figure out shape
of offending array. don't guess. add debug statement print(fdata.shape)
.
===================
taking file sample, list of lines:
in [822]: txt=b"""20080225,a,31.42,31.79,31.2,31.5,30575 ...: 20080225,aa,36.64,38.95,36.48,38.85,225008 ...: 20080225,aapl,118.59,120.17,116.664,119.74,448847 """ in [823]: txt=txt.splitlines() in [826]: fdata=[] in [827]: pdata=[]
read 1 'file':
in [828]: in txt: ...: r=i.split(b',') ...: pdata.append(r) ...: fdata.append(pdata) ...: ...: in [829]: fdata out[829]: [[[b'20080225', b'a', b'31.42', b'31.79', b'31.2', b'31.5', b'30575 '], ....]]] in [830]: np.array(fdata) out[830]: array([[[b'20080225', b'a', b'31.42', b'31.79', b'31.2', b'31.5', b'30575 '], ...]]], dtype='|s8') in [831]: _.shape out[831]: (1, 3, 7)
read 'identical file"
in [832]: in txt: ...: r=i.split(b',') ...: pdata.append(r) ...: fdata.append(pdata) in [833]: len(fdata) out[833]: 2 in [834]: np.array(fdata).shape out[834]: (2, 6, 7) in [835]: np.array(fdata).dtype out[835]: dtype('s8')
note dtype - string of 8 characters. since on value per line string, can't convert whole thing numbers.
now read different 'file' (one less line, 1 less value)
in [836]: txt1=b"""20080225,a,31.42,31.79,31.2,31.5,30575 ...: 20080225,aa,36.64,38.95,36.48,38.85 """ in [837]: txt1=txt1.splitlines() in [838]: in txt1: ...: r=i.split(b',') ...: pdata.append(r) ...: fdata.append(pdata) in [839]: len(fdata) out[839]: 3 in [840]: np.array(fdata).shape out[840]: (3, 8) in [841]: np.array(fdata).dtype out[841]: dtype('o')
now lets add 'empty' file - no rows pdata
[]
in [842]: fdata.append([]) in [843]: np.array(fdata).shape out[843]: (4,) in [844]: np.array(fdata).dtype out[844]: dtype('o')
array shape , dtype have totally changed. can no longer create uniform 3d array lines.
the shape after 10 files, (10, 500, 7), means 10 files, 500 lines each, 7 columns each line. 1 file or more of full 400 different. last iteration suggests 1 empty.
Comments
Post a Comment