Calculate Difference between dates by group in R -


i'm using logistic exposure calculate hatching success bird nests. data set quite extensive , have ~2,000 nests, each unique id ("clutchid). need calculate number of days given nest exposed ("exposure"), or more simply, difference between 1st , last day. used following code:

hs_hatch$exposure=na     for(i in 2:nrow(hs_hatch)){hs_hatch$exposure[i]=hs_hatch$datevisit[i]- hs_hatch$datevisit[i-1]} 

where hs_hatch dataset , datevisit actual date. problem r calculating exposure value 1st date (which doesn't make sense).

what need calculate difference between 1st , last date given clutch. i've looked following:

exposure=ddply(hs_hatch, "clutchid", summarize,                       orderfrequency = as.numeric(diff.date(datevisit)))   df %>%   mutate(exposure =  as.date(hs_hatch$datevisit, "%y-%m-%d")) %>%   group_by(clutchid) %>%   arrange(exposure) %>%   mutate(lag=lag(datevisit), difference=datevisit-lag) 

i'm still learning r appreciated.

edit: below sample of data i'm using

hs_hatch <- structure(list(clutchid = c(1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l,                                          2l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 5l ), datevisit = c("3/15/2012", "3/18/2012", "3/20/2012", "4/1/2012",                   "4/3/2012", "3/18/2012", "3/20/2012", "3/22/2012", "4/3/2012",                   "4/4/2012", "3/22/2012", "4/3/2012", "4/4/2012", "3/18/2012",                   "3/20/2012", "3/22/2012", "4/2/2012", "4/3/2012", "4/4/2012",                   "3/20/2012", "3/22/2012", "3/25/2012", "3/27/2012", "4/4/2012",                   "4/5/2012"), year = c(2012l, 2012l, 2012l, 2012l, 2012l, 2012l,                                         2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l,                                         2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l, 2012l,                                         2012l), survive = c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,                                                             1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l)), class = c("tbl_df",                                                                                                                                 "tbl", "data.frame"), row.names = c(na, -25l), .names = c("clutchid",                                                                                                                                                                                           "datevisit", "year", "survive"), spec = structure(list(cols = structure(list(                                                                                                                                                                                              clutchid = structure(list(), class = c("collector_integer",                                                                                                                                                                                                                                      "collector")), datevisit = structure(list(), class = c("collector_character",                                                                                                                                                                                                                                                                                             "collector")), year = structure(list(), class = c("collector_integer",                                                                                                                                                                                                                                                                                                                                               "collector")), survive = structure(list(), class = c("collector_integer",                                                                                                                                                                                                                                                                                                                                                                                                    "collector"))), .names = c("clutchid", "datevisit", "year",                                                                                                                                                                                                                                                                                                                                                                                                                               "survive")), default = structure(list(), class = c("collector_guess",                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "collector"))), .names = c("cols", "default"), class = "col_spec")) 

collecting of comments...

load dplyr

we need dplyr package problem. if load other packages, e.g. plyr, can cause conflicts if both packages have functions same name. let's load dplyr.

library(dplyr) 

in future, may wish load tidyverse instead -- includes dplyr , other related packages, graphics, etc.

converting dates

let's convert datevisit variable character strings r can interpret date. once this, allows r calculate differences in days subtracting 2 dates each other.

hs_hatch <- hs_hatch %>%  mutate(date_visit = as.date(datevisit, "%m/%d/%y")) 

the date format %m/%d/%y different original code. date format needs match how dates in data. datevisit has dates month/day/year, use %m/%d/%y.

also, don't need specify dataset datevisit inside mutate, in hs_hatch$datevisit, because it's looking in hs_hatch. code hs_hatch %>% ... says 'use hs_hatch following steps'.

calculating exposures

to calculate exposure, need find first date, last date, , difference between two, each set of rows clutchid. use summarize, collapses data 1 row per clutchid.

exposure <- hs_hatch %>%      group_by(clutchid) %>%     summarize(first_visit = min(date_visit),                last_visit = max(date_visit),                exposure = last_visit - first_visit) 

first_visit = min(date_visit) find minimum date_visit each clutchid separately, since using group_by(clutchid).

exposure = last_visit - first_visit takes newly-calculated first_visit , last_visit , finds difference in days.

this creates following result:

  clutchid first_visit last_visit exposure      <int>      <date>     <date>    <dbl> 1        1  2012-03-15 2012-04-03       19 2        2  2012-03-18 2012-04-04       17 3        3  2012-03-22 2012-04-04       13 4        4  2012-03-18 2012-04-04       17 5        5  2012-03-20 2012-04-05       16 

if want keep original rows, can use mutate in place of summarize.


Comments

Popular posts from this blog

php - How to display all orders for a single product showing the most recent first? Woocommerce -

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

angularjs - How restrict admin panel using in backend laravel and admin panel on angular? -