strptime - Subsetting Data based on a date range in R -
update
i've managed load data of first 1000000 rows using following code:
newfile <- read.table("course_4_proj_1.txt", header=true, sep=";", na.strings = "?", nrows= 1000000, stringsasfactors=true)
this head()
returns, fyi
head(newfile) date time global_active_power global_reactive_power voltage global_intensity 1 16/12/2006 17:24:00 4.216 0.418 234.84 18.4 2 16/12/2006 17:25:00 5.360 0.436 233.63 23.0 3 16/12/2006 17:26:00 5.374 0.498 233.29 23.0 4 16/12/2006 17:27:00 5.388 0.502 233.74 23.0 5 16/12/2006 17:28:00 3.666 0.528 235.68 15.8 6 16/12/2006 17:29:00 3.520 0.522 235.02 15.0 sub_metering_1 sub_metering_2 sub_metering_3 1 0 1 17 2 0 1 16 3 0 2 17 4 0 1 17 5 0 1 17 6 0 2 17
now need subset because need use data dates 2007-02-01 , 2007-02-02. think need convert date , time variables date/time classes in r using strptime()
, as.date()
functions, i'm not clear on how that. simplest/cleanest way this?
if size/memory not issue,
newfile <- read.table("course_4_proj_1.txt", header=true, sep=";", na.strings = "?", nrows= 1000000, stringsasfactors=false) newfile$datetime <- paste(newfile$date, newfile$time), newfile$datetime <- as.date(newfile$datetime, format = "%d/%m/%y %h:%m:%s")
if computer weak , puny, can add packages, consider data.table
package
library(data.table) newfile <- fread("course_4_proj_1.txt", na.strings = "?") newfile[,datetime := as.date(paste(date, time), format = "%d/%m/%y %h:%m:%s")]
and there further optimizations 1 can use. found answers here useful.
one can subset data.frame in normal way. here method using dplyr
library(dplyr) subsetted <- filter(newfile, datetime >= as.date("2006-02-01 00:00:00"), datetime < as.date("2006-02-03 00:00:00"))
Comments
Post a Comment