ruby - Opening multiple html files & outputting to .txt with Nokogiri -


just wondering if these 2 functions done using nokogiri or via more basic ruby commands.

require 'open-uri' require 'nokogiri' require "net/http" require "uri"  doc = nokogiri.parse(open("example.html"))  doc.xpath("//meta[@name='author' or @name='author']/@content").each |metaauth| puts "author: #{metaauth}" end  doc.xpath("//meta[@name='keywords' or @name='keywords']/@content").each |metakey| puts "keywords: #{metakey}" end  etc... 

question 1: i'm trying parse directory of .html documents, information meta html tags, , output results text file if possible. tried simple *.html wildcard replacement, didn't seem work (at least not nokogiri.parse(open()) maybe works ::html or ::xml)

question 2: more important, possible output of meta content outputs text file replace puts command?

also forgive me if code overly complicated simple task being performed, i'm little new nokogiri / xpath / ruby.

thanks.

i have code similar.
please refer to:

module myparser   html_file_dir = `your html file dir`   def self.run(options = {})     file_list = dir.entries(html_file_dir).reject { |f| f =~ /^\./ }      result = file_list.map |file|       html = file.read("#{html_file_dir}/#{file}")       doc = nokogiri::html(html)       parse_to_hash(doc)     end     write_csv(result)   end    def self.parse_to_hash(doc)     array = []     array << doc.css(`your select conditons`).first.content     ... #add selector code css or xpath      array   end    def self.write_csv(result)     ::csv.open("`your out put file name`", 'w') |csv|       result.each { |row| csv << row }     end   end end  myparser.run 

Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -