Calculate Difference between dates by group in R -

- July 15, 2010

i'm using logistic exposure calculate hatching success bird nests. data set quite extensive , have ~2,000 nests, each unique id ("clutchid). need calculate number of days given nest exposed ("exposure"), or more simply, difference between 1st , last day. used following code:

hs_hatch$exposure=na     for(i in 2:nrow(hs_hatch)){hs_hatch$exposure[i]=hs_hatch$datevisit[i]- hs_hatch$datevisit[i-1]}

where hs_hatch dataset , datevisit actual date. problem r calculating exposure value 1st date (which doesn't make sense).

what need calculate difference between 1st , last date given clutch. i've looked following:

exposure=ddply(hs_hatch, "clutchid", summarize,                       orderfrequency = as.numeric(diff.date(datevisit)))   df %>%   mutate(exposure =  as.date(hs_hatch$datevisit, "%y-%m-%d")) %>%   group_by(clutchid) %>%   arrange(exposure) %>%   mutate(lag=lag(datevisit), difference=datevisit-lag)

i'm still learning r appreciated.

edit: below sample of data i'm using

hs_hatch <- structure(list(clutchid = c(1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l,                                          2l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 5l ), datevisit = c("3/15/2012", "3/18/2012", "3/20/2012", "4/1/2012",                   "4/3/2012", "3/18/2012", "3/20/2012", "3/22/2012", "4/3/2012",                   "4/4/2012", "3/22/2012", "4/3/2012", "4/4/2012", "3/18/2012",                   "3/20/2012", "3/22/2012", "4/2/2012", "4/3/2012", "4/4/2012",                   "3/20/2012", "3/22/2012", "3/25/2012", "3/27/2012", "4/4/2012",                   "4/5/2012"), year = c(2012l, 2012l, 2012l, 2012l, 2012l, 2012l,                                         2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l,                                         2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l,                                         2012l), survive = c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,                                                             1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l)), class = c("tbl_df",                                                                                                                                 "tbl", "data.frame"), row.names = c(na, -25l), .names = c("clutchid",                                                                                                                                                                                           "datevisit", "year", "survive"), spec = structure(list(cols = structure(list(                                                                                                                                                                                              clutchid = structure(list(), class = c("collector_integer",                                                                                                                                                                                                                                      "collector")), datevisit = structure(list(), class = c("collector_character",                                                                                                                                                                                                                                                                                             "collector")), year = structure(list(), class = c("collector_integer",                                                                                                                                                                                                                                                                                                                                               "collector")), survive = structure(list(), class = c("collector_integer",                                                                                                                                                                                                                                                                                                                                                                                                    "collector"))), .names = c("clutchid", "datevisit", "year",                                                                                                                                                                                                                                                                                                                                                                                                                               "survive")), default = structure(list(), class = c("collector_guess",                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "collector"))), .names = c("cols", "default"), class = "col_spec"))

collecting of comments...

load `dplyr`

we need dplyr package problem. if load other packages, e.g. plyr, can cause conflicts if both packages have functions same name. let's load dplyr.

library(dplyr)

in future, may wish load tidyverse instead -- includes dplyr , other related packages, graphics, etc.

converting dates

let's convert datevisit variable character strings r can interpret date. once this, allows r calculate differences in days subtracting 2 dates each other.

hs_hatch <- hs_hatch %>%  mutate(date_visit = as.date(datevisit, "%m/%d/%y"))

the date format %m/%d/%y different original code. date format needs match how dates in data. datevisit has dates month/day/year, use %m/%d/%y.

also, don't need specify dataset datevisit inside mutate, in hs_hatch$datevisit, because it's looking in hs_hatch. code hs_hatch %>% ... says 'use hs_hatch following steps'.

calculating exposures

to calculate exposure, need find first date, last date, , difference between two, each set of rows clutchid. use summarize, collapses data 1 row per clutchid.

exposure <- hs_hatch %>%      group_by(clutchid) %>%     summarize(first_visit = min(date_visit),                last_visit = max(date_visit),                exposure = last_visit - first_visit)

first_visit = min(date_visit) find minimum date_visit each clutchid separately, since using group_by(clutchid).

exposure = last_visit - first_visit takes newly-calculated first_visit , last_visit , finds difference in days.

this creates following result:

  clutchid first_visit last_visit exposure      <int>      <date>     <date>    <dbl> 1        1  2012-03-15 2012-04-03       19 2        2  2012-03-18 2012-04-04       17 3        3  2012-03-22 2012-04-04       13 4        4  2012-03-18 2012-04-04       17 5        5  2012-03-20 2012-04-05       16

if want keep original rows, can use mutate in place of summarize.

Search This Blog

Swift

Calculate Difference between dates by group in R -

load `dplyr`

converting dates

calculating exposures

Comments

Post a Comment

Popular posts from this blog

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

jsf - "PropertyNotWritableException: Illegal Syntax for Set Operation" error when setting value in bean -

arrays - Algorithm to find ideal starting spot in a circle -

Calculate Difference between dates by group in R -

load dplyr

converting dates

calculating exposures

Comments

Post a Comment

Popular posts from this blog

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

jsf - "PropertyNotWritableException: Illegal Syntax for Set Operation" error when setting value in bean -

arrays - Algorithm to find ideal starting spot in a circle -

load `dplyr`