java - Get bounding indices of non-unique words in a string -

suppose have following string:

 (def strg "apple orange apple")

i'd bounding indices of each non-unique word in string. first occurrence of apple should have bounding indices (0,4) while second occurence of apple should have bounding indices (13, 17).

one approach i've been playing first store indices of each character in string and, then, each index n, identify word boundaries looking space @ n-1 (yes, misses beginning-of-string words). if condition has been met, iterate thru next k characters until space hit---the character @ position before space second bounding index. first part of (failed) code

 (for [ch strg]        (let [indx  (int  (.indexof  strg  (str ch)))]             (cond  (= (subs ch indx-1 ) " " )             continue rest of above-described code logic

any ideas (clojure, java, or python fine) appreciated

it more typical clojure/java use indices of starting character , 1 after ending character, [0, 5] , [13, 18] instead. java's matcher return start , end of each match in manner.

(def strg "apple orange apple")  (defn re-indices [re s]    (let [m (re-matcher re s)]      ((fn step []         (when (. m find)           (cons [(. m start) (. m end)] (lazy-seq (step))))))))  (re-indices #"\s+" strg) ;=> ([0 5] [6 12] [13 18])

and subs use them appropriately

(->> (re-indices #"\s+" strg)      (group-by (partial apply subs strg))) ;=> {"apple" [[0 5] [13 18]], "orange" [[6 12]]}

from here can filter out substring keys more 1 indices pair.

Search This Blog

Brent

java - Get bounding indices of non-unique words in a string -

Comments

Post a Comment

Popular posts from this blog

ios - Change Storyboard View using Seague -

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -