powershell - Loop and join 2 functions to move files in folders by category & rename by original title -
for matching category filename use code
gci *.pdf | foreach { (iwr "https://arxiv.org/abs/$($_.basename)")` -match 'primary-subject">(.*?)</span>'; $matches[1] }
to have idea mean http://i.imgur.com/57kjjr6.png
to rename indipendently: can use not useful because process folder folder , take long time (number of folders high number)
#all pdfs | rename { query arxiv abstract filename, use page title + ".pdf"} get-childitem *.pdf | rename-item -newname { $title = (invoke-webrequest "https://arxiv.org/abs/$($_.basename)").parsedhtml.title $title = $title -replace '[\\/:\*\?"<>\|]', '-' # replace forbidden characters "$title.pdf" # in filenames - }
i should make folders (without [folder])
[folder] information theory (cs.it) [folder] number theory (math.nt) ....
i try join 2 operations:
moving subject
[folder] geometric topology (cs.it) | |__ [file] 1611.00066 |__ [file] ..... [folder] number theory (math.nt) | |__ [file] 1611.00057
and renaming title
[folder] geometric topology (cs.it) | |__ [file] 1611.00066 |__ [file] ..... [folder] number theory (math.nt) | |__ [file] 1611.00057
for loop , join operation make .ps1 file. insert code don't work
$res=invoke-webrequest "https://arxiv.org/abs/$($_.basename" $rootpath="c:\temp" function clean-invalidfilenamechars { param( [parameter(mandatory=$true, position=0, valuefrompipeline=$true, valuefrompipelinebypropertyname=$true)] [string]$name ) $invalidchars = [io.path]::getinvalidfilenamechars() -join '' $re = "[{0}]" -f [regex]::escape($invalidchars) $res=($name -replace $re) return $res.substring(0, [math]::min(260, $res.length)) } function clean-invalidpathchars { param( [parameter(mandatory=$true, position=0, valuefrompipeline=$true, valuefrompipelinebypropertyname=$true)] [string]$name ) $invalidchars = [io.path]::getinvalidpathchars() -join '' $re = "[{0}]" -f [regex]::escape($invalidchars) $res=($name -replace $re) return $res.substring(0, [math]::min(248, $res.length)) } gci *.pdf | foreach { (iwr "https://arxiv.org/abs/$($_.basename)")` -match 'primary-subject">(.*?)</span>'; $matches[1] } #get date , cut format template, group subject , clean title , subject transformation dir , file name $grousubject=$res.parsedhtml.body.outertext | convertfrom-string -templatecontent $template | select @{n="subject";e={clean-invalidpathchars $_.subject}}, @{n="title";e={clean-invalidfilenamechars $_.title}} | group subject #create dir , files $grousubject | %{$path= "$rootpath\$($_.name)" ; $_.group.title | %{new-item -itemtype file -path "$path\$_" -force} } get-childitem *.pdf | rename-item -newname { $title = (invoke-webrequest "https://arxiv.org/abs/$($_.basename)").parsedhtml.title $title = $title -replace '[\\/:\*\?"<>\|]', '-' # replace forbidden characters "$title.pdf" # in filenames - }
my powershell version 4
edit: esmeraldo solution works http://i.imgur.com/neio868.png
thank you
function clean-invalidfilenamechars { param( [parameter(mandatory=$true, position=0, valuefrompipeline=$true, valuefrompipelinebypropertyname=$true)] [string]$name ) $invalidchars = [io.path]::getinvalidfilenamechars() -join '' $re = "[{0}]" -f [regex]::escape($invalidchars) $res=($name -replace $re) return $res.substring(0, [math]::min(260, $res.length)) } function clean-invalidpathchars { param( [parameter(mandatory=$true, position=0, valuefrompipeline=$true, valuefrompipelinebypropertyname=$true)] [string]$name ) $invalidchars = [io.path]::getinvalidpathchars() -join '' $re = "[{0}]" -f [regex]::escape($invalidchars) $res=($name -replace $re) return $res.substring(0, [math]::min(248, $res.length)) } $rootpath="c:\temp2" $rootpathresult="c:\tempresult" $template=@' [3] arxiv:1611.00057 [pdf, ps, other] title: {title*:holomorphy of adjoint $l$ functions quasisplit a2} authors: joseph hundley comments: 18 pages subjects: {subject:number theory (math.nt)} [4] arxiv:1611.00066 [pdf, other] title: {title*:many haken heegaard splittings} authors: alessandro sisto comments: 12 pages, 3 figures subjects: {subject:geometric topology (math.gt)} [5] arxiv:1611.00067 [pdf, ps, other] title: {title*:subsumed homoclinic connections , infinitely many coexisting attractors in piecewise-linear maps} authors: david j.w. simpson, christopher p. tuffley subjects: {subject:dynamical systems (math.ds)} [21] arxiv:1611.00114 [pdf, ps, other] title: {title*:faces of highest weight modules , universal weyl polyhedron} authors: gurbir dhillon, apoorva khare comments: recall preliminaries , results companion paper arxiv:1606.09640 subjects: {subject:representation theory (math.rt)}; combinatorics (math.co); metric geometry (math.mg) '@ #extract utils data , clean $listbook=gci $rootpath -file -filter *.pdf | foreach { new-object psobject -property @{file=$_.fullname; books= ((iwr "https://arxiv.org/abs/$($_.basename)").parsedhtml.body.outertext | convertfrom-string -templatecontent $template)}} | select file -expandproperty books | select file, @{n="subject";e={clean-invalidpathchars $_.subject}}, @{n="title";e={clean-invalidfilenamechars $_.title}} #build dirs , copy+rename file $listbook | %{$newpath="$rootpathresult\$($_.subject)"; new-item -itemtype directory -path "$newpath" -force; copy-item $_.file "$newpath\$($_.title).pdf" -force}
Comments
Post a Comment