xidel - How to have always the same number of results in xpath even if some tags are not present? -
i try crawl data website. target sites not details given. example 1 profile has name, birthday
given , other 1 name
.
i try grasp tags xidel , xpath work charm, when there wouldnt few tags missing (because detail not present)
so ask solution can fill these notexistant tags empty 1 end set of data same length.
i convert data csv afterwards , when tag missing, data 1 column off.
my xidel requests looks this:
xidel 'http://www.icaec.org/users/index' -f '//section[@id="content-area"]//article//h5/a' -e 'concat("`",join(//div[@id="members-info"]/(h5 | span) | //div[@class="row pic-professionsal-details"]/div[2]/div | //div[@class="row pic-professionsal-details"]/following-sibling::div/div[1]//div,"`;`"),"`")' | sed "s/\"/\\\"/g" | sed "s/\`/\"/g" >> icaec.csv
the xpath expression in question one:
'concat("`",join(//div[@id="members-info"]/(h5 | span) | //div[@class="row pic-professionsal-details"]/div[2]/div | //div[@class="row pic-professionsal-details"]/following-sibling::div/div[1]//div,"`;`"),"`")'
which more or less concatenation of
//div[@id="members-info"]/(h5 | span) //div[@class="row pic-professionsal-details"]/div[2]/div //div[@class="row pic-professionsal-details"]/following-sibling::div/div[1]//div
xidel supports xpath , xquery 3.0 can create sequences replace missing items default, instance given
<items> <item> <foo>foo 1</foo> <bar>bar 1</bar> </item> <item> <foo>foo 2</foo> </item> <item> <bar>bar 3</bar> </item> </items>
the xquery 3.0 expression
string-join(//item!string-join(((foo, 'foo default')[1], (bar, 'bar default')[1]), ';'), ' ')
outputs
foo 1;bar 1 foo 2;bar default foo default;bar 3
Comments
Post a Comment