Calculate Difference between dates by group in R -
i'm using logistic exposure calculate hatching success bird nests. data set quite extensive , have ~2,000 nests, each unique id ("clutchid). need calculate number of days given nest exposed ("exposure"), or more simply, difference between 1st , last day. used following code:
hs_hatch$exposure=na for(i in 2:nrow(hs_hatch)){hs_hatch$exposure[i]=hs_hatch$datevisit[i]- hs_hatch$datevisit[i-1]}
where hs_hatch dataset , datevisit actual date. problem r calculating exposure value 1st date (which doesn't make sense).
what need calculate difference between 1st , last date given clutch. i've looked following:
exposure=ddply(hs_hatch, "clutchid", summarize, orderfrequency = as.numeric(diff.date(datevisit))) df %>% mutate(exposure = as.date(hs_hatch$datevisit, "%y-%m-%d")) %>% group_by(clutchid) %>% arrange(exposure) %>% mutate(lag=lag(datevisit), difference=datevisit-lag)
i'm still learning r appreciated.
edit: below sample of data i'm using
hs_hatch <- structure(list(clutchid = c(1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 5l ), datevisit = c("3/15/2012", "3/18/2012", "3/20/2012", "4/1/2012", "4/3/2012", "3/18/2012", "3/20/2012", "3/22/2012", "4/3/2012", "4/4/2012", "3/22/2012", "4/3/2012", "4/4/2012", "3/18/2012", "3/20/2012", "3/22/2012", "4/2/2012", "4/3/2012", "4/4/2012", "3/20/2012", "3/22/2012", "3/25/2012", "3/27/2012", "4/4/2012", "4/5/2012"), year = c(2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l), survive = c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(na, -25l), .names = c("clutchid", "datevisit", "year", "survive"), spec = structure(list(cols = structure(list( clutchid = structure(list(), class = c("collector_integer", "collector")), datevisit = structure(list(), class = c("collector_character", "collector")), year = structure(list(), class = c("collector_integer", "collector")), survive = structure(list(), class = c("collector_integer", "collector"))), .names = c("clutchid", "datevisit", "year", "survive")), default = structure(list(), class = c("collector_guess", "collector"))), .names = c("cols", "default"), class = "col_spec"))
collecting of comments...
load dplyr
we need dplyr
package problem. if load other packages, e.g. plyr
, can cause conflicts if both packages have functions same name. let's load dplyr
.
library(dplyr)
in future, may wish load tidyverse
instead -- includes dplyr
, other related packages, graphics, etc.
converting dates
let's convert datevisit
variable character strings r can interpret date. once this, allows r calculate differences in days subtracting 2 dates each other.
hs_hatch <- hs_hatch %>% mutate(date_visit = as.date(datevisit, "%m/%d/%y"))
the date format %m/%d/%y
different original code. date format needs match how dates in data. datevisit
has dates month/day/year, use %m/%d/%y
.
also, don't need specify dataset datevisit
inside mutate
, in hs_hatch$datevisit
, because it's looking in hs_hatch
. code hs_hatch %>% ...
says 'use hs_hatch
following steps'.
calculating exposures
to calculate exposure, need find first date, last date, , difference between two, each set of rows clutchid
. use summarize
, collapses data 1 row per clutchid
.
exposure <- hs_hatch %>% group_by(clutchid) %>% summarize(first_visit = min(date_visit), last_visit = max(date_visit), exposure = last_visit - first_visit)
first_visit = min(date_visit)
find minimum date_visit
each clutchid
separately, since using group_by(clutchid)
.
exposure = last_visit - first_visit
takes newly-calculated first_visit
, last_visit
, finds difference in days.
this creates following result:
clutchid first_visit last_visit exposure <int> <date> <date> <dbl> 1 1 2012-03-15 2012-04-03 19 2 2 2012-03-18 2012-04-04 17 3 3 2012-03-22 2012-04-04 13 4 4 2012-03-18 2012-04-04 17 5 5 2012-03-20 2012-04-05 16
if want keep original rows, can use mutate
in place of summarize
.
Comments
Post a Comment