java - Get bounding indices of non-unique words in a string -


suppose have following string:

 (def strg "apple orange apple") 

i'd bounding indices of each non-unique word in string. first occurrence of apple should have bounding indices (0,4) while second occurence of apple should have bounding indices (13, 17).

one approach i've been playing first store indices of each character in string and, then, each index n, identify word boundaries looking space @ n-1 (yes, misses beginning-of-string words). if condition has been met, iterate thru next k characters until space hit---the character @ position before space second bounding index. first part of (failed) code

 (for [ch strg]        (let [indx  (int  (.indexof  strg  (str ch)))]             (cond  (= (subs ch indx-1 ) " " )             continue rest of above-described code logic 

any ideas (clojure, java, or python fine) appreciated

it more typical clojure/java use indices of starting character , 1 after ending character, [0, 5] , [13, 18] instead. java's matcher return start , end of each match in manner.

(def strg "apple orange apple")  (defn re-indices [re s]    (let [m (re-matcher re s)]      ((fn step []         (when (. m find)           (cons [(. m start) (. m end)] (lazy-seq (step))))))))  (re-indices #"\s+" strg) ;=> ([0 5] [6 12] [13 18]) 

and subs use them appropriately

(->> (re-indices #"\s+" strg)      (group-by (partial apply subs strg))) ;=> {"apple" [[0 5] [13 18]], "orange" [[6 12]]} 

from here can filter out substring keys more 1 indices pair.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -