ruby on rails - How to parse related data and store the values to database -


i'm trying parse through web page, collect values , store them database.

here code commented-out database code:

require 'nokogiri' require 'open-uri'  doc = nokogiri::html(open("https://example.com/colors")) colors = doc.css(".colorcircle") colors_name = doc.css(".zw-m-c-txt")       colors.each |ele|        hex_code = ele.attr('style').split(";").first.split(":").last             colors_name.each |name|         color_name = name.text         puts " ++++++ hex_code #{hex_code}"          puts " ++++++ color_name  #{color_name}"         # color = colors.find_by(:hex_code => hex_code)         # if color.present?         #               color.update_attributes(:name => color_name)         #           else         #               model.colors.create(:name => color_name, :hex_code => hex_code)         #           end     end end 

here html source page detail:

 <span class="colorcircle" style="background-color:#eeeff4;"></span>  <p class="zw-m-c-txt"> <span class="fnt-14"> white orchid pearl </span></p>  <span class="colorcircle" style="background-color:#acabb0;"></span>  <p class="zw-m-c-txt"> <span class="fnt-14"> modern steel metallic </span></p>  <span class="colorcircle" style="background-color:#220909;"></span>  <p class="zw-m-c-txt"> <span class="fnt-14"> golden brown metallic </span></p>  <span class="colorcircle" style="background-color:#43161b;"></span>  <p class="zw-m-c-txt"> <span class="fnt-14"> carnelian red pearl </span></p>  <span class="colorcircle" style="background-color:#e8f1fa;"></span>  <p class="zw-m-c-txt"> <span class="fnt-14"> alabaster silver </span></p> 

i not able loop through sequentially , store database. here current output:

++++++ color_name    white orchid pearl  ++++++ hex_code #eeeff4 ++++++ color_name    white orchid pearl  ++++++ hex_code #acabb0 ++++++ color_name    white orchid pearl  ++++++ hex_code #220909 ++++++ color_name    white orchid pearl  ++++++ hex_code #43161b ++++++ color_name    white orchid pearl  ++++++ hex_code #e8f1fa ++++++ color_name    modern steel metallic  ++++++ hex_code #eeeff4 ++++++ color_name    modern steel metallic  ++++++ hex_code #acabb0 ++++++ color_name    modern steel metallic  ++++++ hex_code #220909 ++++++ color_name    modern steel metallic  ++++++ hex_code #43161b ++++++ color_name    modern steel metallic 

this expected output:

hex_code      #eeeff4 color_name    white orchid pearl  hex_code      #acabb0 color_name    modern steel metallic hex_code      #220909 color_name    golden brown metallic 

how expected output , save database corresponding hex_code color name?

here's i'd if wanted data:

require 'nokogiri'  doc = nokogiri::html(data.read)  data = doc.search('.colorcircle').map { |span|   hex = span['style'][/#([^;]+);$/, 1]   color = span.next_element.at('span').text.strip   [ hex, color ] }.to_h # => {"eeeff4"=>"white orchid pearl", #     "acabb0"=>"modern steel metallic", #     "220909"=>"golden brown metallic", #     "43161b"=>"carnelian red pearl", #     "e8f1fa"=>"alabaster silver"}  __end__ <span class="colorcircle" style="background-color:#eeeff4;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> white orchid pearl </span></p> <span class="colorcircle" style="background-color:#acabb0;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> modern steel metallic </span></p> <span class="colorcircle" style="background-color:#220909;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> golden brown metallic </span></p> <span class="colorcircle" style="background-color:#43161b;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> carnelian red pearl </span></p> <span class="colorcircle" style="background-color:#e8f1fa;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> alabaster silver </span></p> 

which, when used with:

data.each |k, v|   puts "hex_code: %s\ncolor_name: %s" % [k, v] end 

would output:

hex_code: eeeff4 color_name: white orchid pearl hex_code: acabb0 color_name: modern steel metallic hex_code: 220909 color_name: golden brown metallic hex_code: 43161b color_name: carnelian red pearl hex_code: e8f1fa color_name: alabaster silver 

but, there tables on internet these associations. rather parse 1 , try inject database table, i'd recommend finding 1 , create module or class stores data constants or hashes don't have hit database extract values. want absolutely fastest access possible if you're using values set colors in pages, or if you're presenting correlations of values color names. or create static page rendered, these associations , definitions not going change.

databases great things, doesn't seem time it.


this

ele.attr('style').split(";").first.split(":").last 

is brutal.

extracting hex-code string great application string slicing or regular expression. multiple ways:

style = "background-color:#eeeff4;"  style.split(':').last.chop # => "#eeeff4" style[-8..-2] # => "#eeeff4" style[/(#\h{3,6});$/, 1] # => "#eeeff4" 

using slice [-8..-2] error-prone because assumes value 6 characters long, hex values colors don't have be. #fff equivalent #ffffff instance, handling 3 or 6 character variants important.

in example above used /#([^;]+);$/ isn't quite concise /(#\h{3,6});$/, they've both got tradeoffs take pick if want use regex. , how work figure out, remember not opportunity hit data golden regular expression hammer; use them when they're best tool because can open door darkness , usher in lord of bugs.

and, deliberately excluded # in hex values. adding wastes space on redundant character lookups , in tables mileage might vary.


Comments

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -

ios - Change Storyboard View using Seague -