ruby on rails - How to parse related data and store the values to database -
i'm trying parse through web page, collect values , store them database.
here code commented-out database code:
require 'nokogiri' require 'open-uri' doc = nokogiri::html(open("https://example.com/colors")) colors = doc.css(".colorcircle") colors_name = doc.css(".zw-m-c-txt") colors.each |ele| hex_code = ele.attr('style').split(";").first.split(":").last colors_name.each |name| color_name = name.text puts " ++++++ hex_code #{hex_code}" puts " ++++++ color_name #{color_name}" # color = colors.find_by(:hex_code => hex_code) # if color.present? # color.update_attributes(:name => color_name) # else # model.colors.create(:name => color_name, :hex_code => hex_code) # end end end here html source page detail:
<span class="colorcircle" style="background-color:#eeeff4;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> white orchid pearl </span></p> <span class="colorcircle" style="background-color:#acabb0;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> modern steel metallic </span></p> <span class="colorcircle" style="background-color:#220909;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> golden brown metallic </span></p> <span class="colorcircle" style="background-color:#43161b;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> carnelian red pearl </span></p> <span class="colorcircle" style="background-color:#e8f1fa;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> alabaster silver </span></p> i not able loop through sequentially , store database. here current output:
++++++ color_name white orchid pearl ++++++ hex_code #eeeff4 ++++++ color_name white orchid pearl ++++++ hex_code #acabb0 ++++++ color_name white orchid pearl ++++++ hex_code #220909 ++++++ color_name white orchid pearl ++++++ hex_code #43161b ++++++ color_name white orchid pearl ++++++ hex_code #e8f1fa ++++++ color_name modern steel metallic ++++++ hex_code #eeeff4 ++++++ color_name modern steel metallic ++++++ hex_code #acabb0 ++++++ color_name modern steel metallic ++++++ hex_code #220909 ++++++ color_name modern steel metallic ++++++ hex_code #43161b ++++++ color_name modern steel metallic this expected output:
hex_code #eeeff4 color_name white orchid pearl hex_code #acabb0 color_name modern steel metallic hex_code #220909 color_name golden brown metallic how expected output , save database corresponding hex_code color name?
here's i'd if wanted data:
require 'nokogiri' doc = nokogiri::html(data.read) data = doc.search('.colorcircle').map { |span| hex = span['style'][/#([^;]+);$/, 1] color = span.next_element.at('span').text.strip [ hex, color ] }.to_h # => {"eeeff4"=>"white orchid pearl", # "acabb0"=>"modern steel metallic", # "220909"=>"golden brown metallic", # "43161b"=>"carnelian red pearl", # "e8f1fa"=>"alabaster silver"} __end__ <span class="colorcircle" style="background-color:#eeeff4;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> white orchid pearl </span></p> <span class="colorcircle" style="background-color:#acabb0;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> modern steel metallic </span></p> <span class="colorcircle" style="background-color:#220909;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> golden brown metallic </span></p> <span class="colorcircle" style="background-color:#43161b;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> carnelian red pearl </span></p> <span class="colorcircle" style="background-color:#e8f1fa;"></span> <p class="zw-m-c-txt"> <span class="fnt-14"> alabaster silver </span></p> which, when used with:
data.each |k, v| puts "hex_code: %s\ncolor_name: %s" % [k, v] end would output:
hex_code: eeeff4 color_name: white orchid pearl hex_code: acabb0 color_name: modern steel metallic hex_code: 220909 color_name: golden brown metallic hex_code: 43161b color_name: carnelian red pearl hex_code: e8f1fa color_name: alabaster silver but, there tables on internet these associations. rather parse 1 , try inject database table, i'd recommend finding 1 , create module or class stores data constants or hashes don't have hit database extract values. want absolutely fastest access possible if you're using values set colors in pages, or if you're presenting correlations of values color names. or create static page rendered, these associations , definitions not going change.
databases great things, doesn't seem time it.
this
ele.attr('style').split(";").first.split(":").last is brutal.
extracting hex-code string great application string slicing or regular expression. multiple ways:
style = "background-color:#eeeff4;" style.split(':').last.chop # => "#eeeff4" style[-8..-2] # => "#eeeff4" style[/(#\h{3,6});$/, 1] # => "#eeeff4" using slice [-8..-2] error-prone because assumes value 6 characters long, hex values colors don't have be. #fff equivalent #ffffff instance, handling 3 or 6 character variants important.
in example above used /#([^;]+);$/ isn't quite concise /(#\h{3,6});$/, they've both got tradeoffs take pick if want use regex. , how work figure out, remember not opportunity hit data golden regular expression hammer; use them when they're best tool because can open door darkness , usher in lord of bugs.
and, deliberately excluded # in hex values. adding wastes space on redundant character lookups , in tables mileage might vary.
Comments
Post a Comment