ruby - Opening multiple html files & outputting to .txt with Nokogiri -
just wondering if these 2 functions done using nokogiri or via more basic ruby commands.
require 'open-uri' require 'nokogiri' require "net/http" require "uri" doc = nokogiri.parse(open("example.html")) doc.xpath("//meta[@name='author' or @name='author']/@content").each |metaauth| puts "author: #{metaauth}" end doc.xpath("//meta[@name='keywords' or @name='keywords']/@content").each |metakey| puts "keywords: #{metakey}" end etc...
question 1: i'm trying parse directory of .html documents, information meta html tags, , output results text file if possible. tried simple *.html wildcard replacement, didn't seem work (at least not nokogiri.parse(open()) maybe works ::html or ::xml)
question 2: more important, possible output of meta content outputs text file replace puts command?
also forgive me if code overly complicated simple task being performed, i'm little new nokogiri / xpath / ruby.
thanks.
i have code similar.
please refer to:
module myparser html_file_dir = `your html file dir` def self.run(options = {}) file_list = dir.entries(html_file_dir).reject { |f| f =~ /^\./ } result = file_list.map |file| html = file.read("#{html_file_dir}/#{file}") doc = nokogiri::html(html) parse_to_hash(doc) end write_csv(result) end def self.parse_to_hash(doc) array = [] array << doc.css(`your select conditons`).first.content ... #add selector code css or xpath array end def self.write_csv(result) ::csv.open("`your out put file name`", 'w') |csv| result.each { |row| csv << row } end end end myparser.run
Comments
Post a Comment