strptime - Subsetting Data based on a date range in R -


update

i've managed load data of first 1000000 rows using following code:

newfile <- read.table("course_4_proj_1.txt", header=true, sep=";", na.strings = "?", nrows= 1000000, stringsasfactors=true) 

this head() returns, fyi

head(newfile)         date     time global_active_power global_reactive_power voltage global_intensity 1 16/12/2006 17:24:00               4.216                 0.418  234.84             18.4 2 16/12/2006 17:25:00               5.360                 0.436  233.63             23.0 3 16/12/2006 17:26:00               5.374                 0.498  233.29             23.0 4 16/12/2006 17:27:00               5.388                 0.502  233.74             23.0 5 16/12/2006 17:28:00               3.666                 0.528  235.68             15.8 6 16/12/2006 17:29:00               3.520                 0.522  235.02             15.0   sub_metering_1 sub_metering_2 sub_metering_3 1              0              1             17 2              0              1             16 3              0              2             17 4              0              1             17 5              0              1             17 6              0              2             17 

now need subset because need use data dates 2007-02-01 , 2007-02-02. think need convert date , time variables date/time classes in r using strptime() , as.date() functions, i'm not clear on how that. simplest/cleanest way this?

if size/memory not issue,

newfile <- read.table("course_4_proj_1.txt", header=true, sep=";", na.strings = "?", nrows= 1000000,      stringsasfactors=false) newfile$datetime <- paste(newfile$date, newfile$time),  newfile$datetime <- as.date(newfile$datetime, format = "%d/%m/%y %h:%m:%s") 

if computer weak , puny, can add packages, consider data.table package

library(data.table) newfile <- fread("course_4_proj_1.txt", na.strings = "?")  newfile[,datetime := as.date(paste(date, time), format = "%d/%m/%y %h:%m:%s")] 

and there further optimizations 1 can use. found answers here useful.

one can subset data.frame in normal way. here method using dplyr

library(dplyr) subsetted <- filter(newfile, datetime >= as.date("2006-02-01 00:00:00"), datetime < as.date("2006-02-03 00:00:00")) 

Comments

Popular posts from this blog

ios - Change Storyboard View using Seague -

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -