--- title: "Introduction to Relational Event Models" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to Relational Event Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Abstract The **redeem** package provides tools for the scalable estimation of continuous-time network models. While its primary focus is on models that explicitly incorporate duration (Durational Event Models), the framework natively supports standard **Relational Event Models (REM)** for point-process interaction data via the `rem()` function. This vignette illustrates how to estimate REMs, interpret the results, and evaluate model fit using simulated data. ## Theoretical Framework: Relational Event Models A standard Relational Event Model (REM) conceptualizes social interactions as instantaneous events in continuous time. It models a stream of interactions between actors without associated durations (e.g., sending an email, posting a tweet, or a discrete behavioral action). The generating mechanism of a REM is a multidimensional point process, where each potential directed dyad $(i,j)$ has a distinct, continuous-time intensity (or rate) of interaction: $$\lambda_{ij}(\mathscr{H}_t, \beta, \alpha, \gamma) = \exp(s_{ij}(\mathscr{H}_t)^\top \beta + \alpha_i + \alpha_j + f(t, \gamma))$$ where: - $\mathscr{H}_t$ is the history of interactions up to time $t$. - $s_{ij}(\mathscr{H}_t)$ is a vector of dynamic network statistics capturing the structural history of interactions up to time $t$. - $\beta$ are the structural parameters governing these statistics. - $\alpha_i$ and $\alpha_j$ are actor-specific baseline activity and popularity parameters (degree effects). - $f(t, \gamma) = \sum_{q=1}^Q \gamma_q \mathbb{I}(c_{q-1} \le t < c_q)$ is a baseline step-function that captures temporal variations in the overall intensity. The indicator function $\mathbb{I}(c_{q-1} \le t < c_q)$ takes the value 1 if $c_{q-1} \le t < c_q$ and 0 otherwise, where $0 = c_0 < c_1 < \dots < c_Q$ are specified change points and $\gamma = (\gamma_1, \dots, \gamma_Q)^\top$ is the baseline parameter vector (with $\gamma_1 = 0$ imposed for model identifiability). ## Summary Statistics and Degree Effects The package implements several key statistics to capture network dynamics. For full mathematical definitions and descriptions of the available transformations (e.g., `log`, `sig`, `bin`), please refer to the `redeem_terms` documentation. When specifying a REM, it is crucial to balance structural network statistics with actor-specific heterogeneity features: - **Structural Effects**: Variables like **Inertia** ($s_{ij}(\mathscr{H}_t) = N_{ij}(t)$) and **Common Partners** (shared coworkers or associates) capture complex endogenous dependencies reflecting the structural embedding of the network sequence. - **Degree Effects (Actor Heterogeneity)**: Including sender ($\alpha_i$) and receiver ($\alpha_j$) degree effects is important to account for inherent individual activity. Omitting these degree effects often causes an omitted variable bias where structural parameter estimates artificially inflate to absorb underlying actor heterogeneity. - **Temporal Effects**: Including terms that adapt to the baseline time accounts for non-stationarity in the overall intensity. Overlooking temporal fluctuations assumes all point occurrences are identically sequenced in exponential arrival gaps without burstiness or varying base density. ## Example: Simulating and Fitting a REM Let's simulate a basic relational event stream and fit a scalable REM. ```{r setup} library(redeem) ``` ### Data Preparation For a standard REM, the event sequence is represented as a matrix with three main columns: `time`, `from`, and `to`. (If a `type` column is present, standard REM treats all interactions as instantaneous events of type 1). ```{r data} # Simulated instantaneous event sequence n_nodes <- 10 events <- matrix(c( 1.0, 1, 2, 1.5, 3, 4, 2.0, 2, 1, 2.8, 1, 3, 3.5, 4, 3, 4.0, 1, 4 ), ncol = 3, byrow = TRUE) colnames(events) <- c("time", "from", "to") ``` ### Model Fitting We estimate the REM using the `rem()` function. The model `formula` is specified using the model terms documented in `redeem_terms`. In this simple example, we fit an intercept-only model. Complex combinations of structural network statistics and explicit node parameters can be mapped analogously. ```{r fit} # Fit the Relational Event Model fit_rem <- rem( events = events, n_nodes = n_nodes, formula = ~1, control = control.redeem(estimate = "Blockwise") ) # View summaries using `summary.redeem_result` summary(fit_rem) ``` ### Interpretation The estimated coefficients $\beta$ represent how the covariates influence the incidence of instantaneous events. A positive coefficient indicates that higher values of the corresponding statistic structurally increase the rate at which $(i, j)$ interacts. ## Simulation and Model Diagnostics Ensuring your estimated REM accurately reflects the observed data involves rigorous simulation and residual checking tests against continuous network evolution. ### Predicting and Simulating Events The **redeem** package enables generating brand-new event networks using the estimated parameters. Utilizing the `simulate()` method on the fitted object: ```r # Simulate networks from the fitted REM simulated_events <- simulate(fit_rem, nsim = 100, time_horizon = 10) ``` Comparing networks generated from this simulation against your observed data provides a holistic check. If macroscopic patterns (e.g., degree distribution, inter-arrival times, sequence clustering) match comprehensively, the model exhibits good structural and generating fit. ### Residual Analysis To statistically diagnose the fit at the dyad level, you can assess the model's unobserved error using **Cox-Snell residuals**. The concept relies on the property that if the specified intensity $\lambda_{ij}(t)$ is the true generating model, then the integrated cumulative intensity computed up to the precise time of an observed dyadic event will behave like a standard exponential random variable $Exp(1)$. The **redeem** package provides the `get_residuals()` function to automate this check. It calculates the cumulative intensities for all dyads and returns Kaplan-Meier estimates of the residual survival function alongside the theoretical $Exp(1)$ curve. ```{r residuals} # Extract residuals for diagnostics using `get_residuals()` # Note: Ensure return_data = TRUE was set in `control.redeem()` resids <- get_residuals(fit_rem) # Plot the Kaplan-Meier estimate of the residual survival vs. Theoretical Exp(1) plot(resids$time, resids$surv, type = "l", log = "y", xlab = "Cox-Snell Residuals", ylab = "Survival Probability", main = "Cox-Snell Residual Diagnostic") lines(resids$time, resids$theoretical, col = "red", lty = 2) legend("topright", legend = c("Empirical", "Theoretical Exp(1)"), col = c("black", "red"), lty = 1:2) ``` If the model is accurately specified, the empirical survival curve (black line) should closely follow the theoretical exponential decay (red dashed line). Significant deviations, especially in the tails, suggest that the model fails to capture certain temporal dynamics or that there is unmodeled heterogeneity among dyads.