AppleScript parsing html from site -

what i'm trying names of tv shows on wikipedia page.

ok, did first:

property showsweblist : {}  tell application "safari"     set loaddelay 2 -- in seconds; test system     make new document @ end of every document     set url of document 1 "http://en.wikipedia.org/wiki/list_of_television_programs_by_name"     delay loaddelay     set nrofuls javascript "document.getelementbyid('mw-content-text').queryselectorall('ul').length;" in document 1     set nrofuls nrofuls - 1 number     log nrofuls     repeat ws 1 nrofuls         delay loaddelay         set nroflis javascript "document.getelementbyid('mw-content-text').getelementsbytagname('ul')[" & ws & "].queryselectorall('li').length;" in document 1         set nroflis nroflis - 1 number         log nroflis          repeat rs 0 nroflis             delay 0.3             set ashow javascript "document.getelementbyid('mw-content-text').getelementsbytagname('ul')[" & ws & "].getelementsbytagname('li')[" & rs & "].getelementsbytagname('i')[0].getelementsbytagname('a')[0].innerhtml;" in document 1             if ashow not "" or "missing value"                 copy ashow end of showsweblist             end if         end repeat     end repeat end tell

and works how want to. problem takes 15 minutes until it's done , gotta have safari document in front whole time. thought pick whole code , parse it. not easy. how code looks now:

tell application "safari"     make new document @ end of every document     set url of document 1 "http://en.wikipedia.org/wiki/list_of_television_programs_by_name"      delay 4      set orghtml javascript "document.getelementbyid('mw-content-text').innerhtml;" in document 1     set orghtml orghtml text     set readytext extractbetween(orghtml, "<li><i><a ", "</a></i></li>")     log (item 0 of readytext)     set removearray extractbetween(readytext, "href", ">")     set completearray {}     repeat rt 0 (count readytext)         repeat ra 0 (count removearray)             if (item ra of removearray) in (item rt of readytext)                 set completename trim_line((item rt of readytext), (item ra of removearray), 1)                 set end of completearray completename             end if         end repeat     end repeat     log completearray  end tell  on extractbetween(searchtext, starttext, endtext)      set tid applescript's text item delimiters -- save them later.      set applescript's text item delimiters starttext -- find first one.      set liste text items of searchtext      set applescript's text item delimiters endtext -- find end one.      set extracts {}      repeat subtext in liste          if subtext contains endtext              copy text item 1 of subtext end of extracts          end if      end repeat      set applescript's text item delimiters tid -- original values.      return extracts  end extractbetween  on trim_line(this_text, trim_chars, trim_indicator)     -- 0 = beginning, 1 = end, 2 = both     set x length of trim_chars     -- trim beginning     if trim_indicator in {0, 2}         repeat while this_text begins trim_chars             try                 set this_text characters (x + 1) thru -1 of this_text string             on error                 -- text contains nothing trim characters                 return ""             end try         end repeat     end if     -- trim ending     if trim_indicator in {1, 2}         repeat while this_text ends trim_chars             try                 set this_text characters 1 thru -(x + 1) of this_text string             on error                 -- text contains nothing trim characters                 return ""             end try         end repeat     end if     return this_text end trim_line

not smooth , not working. somehow seems can't items out of list, because doesn't see list item. can me out?

cheers

i recommend different approach. dl source, , grab title between tags. whole script takes under 2 seconds. start with:

property baseurl : "http://en.wikipedia.org/wiki/list_of_television_programs_by_name" set rawhtml shell script "curl '" & baseurl & "'" set pretag "\" title=\"" -- " title=" set otid applescript's text item delimiters set applescript's text item delimiters pretag set rawlist text items of rawhtml set namelist {} repeat eachline in rawlist     set theoff offset of ">" in eachline     set thisname text 1 thru (theoff - 2) of eachline     -- add error checking here skip opening non-title hits, , fine-tune precise title string     set namelist namelist & return & thisname end repeat set applescript's text item delimiters otid return namelist

add little error checking, , tweak pretag , posttag fits best.

Search This Blog

Brent

AppleScript parsing html from site -

Comments

Post a Comment

Popular posts from this blog

ios - Change Storyboard View using Seague -

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -