### **Question**: Does this figure represent additive or multiplicative decomposition?
Thalles 2019
## Understanding Time Series | Stationarity
A time series is said to be **stationary** if its statistical properties such as mean, variance, and autocorrelation do not change over time. Many forecasting methods assume that the time series is stationary. The **Augmented Dickey-Fuller Test (ADF)** is a statistical test that can be used to test for stationarity.
### Strict Stationarity
The joint distribution of any subset of time series observations is independent of time. This is a strong assumption that is rarely met in practice.
### Trend Stationarity
The mean of the time series is constant over time. This is a weaker form of stationarity that is more commonly used in practice.
Wikipedia 2024
## Understanding Time Series | Differencing
**Differencing** is a technique used to make a time series **stationary** by computing the difference between consecutive observations. Differencing can help remove trends and seasonality from a time series.
$$ Y_t' = Y_t - Y_{t-1} $$
Where:
- $Y_t$ is the observation at time $t$.
- $Y_t'$ is the differenced observation at time $t$.
Wise, 2020
## Understanding Time Series | Autocorrelation
### Autocorrelation
A measure of the correlation between a time series and a lagged version of itself.
$$ \text{Corr}(X_t, X_{t-k}) $$
### Partial Autocorrelation
A measure of the correlation between a time series and a lagged version of itself, controlling for the values of the time series at all shorter lags.
$$ \text{Corr}(X_t, X_{t-k} | X_{t-1}, X_{t-2}, \ldots, X_{t-k+1}) $$
@osama063, 2016
## Understanding Time Series | Autocorrelation
### Autocorrelation
A measure of the correlation between a time series and a lagged version of itself.
$$ \text{Corr}(X_t, X_{t-k}) $$
### Partial Autocorrelation
A measure of the correlation between a time series and a lagged version of itself, controlling for the values of the time series at all shorter lags.
$$ \text{Corr}(X_t, X_{t-k} | X_{t-1}, X_{t-2}, \ldots, X_{t-k+1}) $$
## Understanding Time Series | Checkpoint TLDR;
### Seasonal Decomposition
A technique used to separate a time series into its trend, seasonal, and residual components.
### Stationarity
A time series is said to be stationary if its basic properties do not change over time.
### Differencing
A technique used to make a time series stationary by computing the difference between consecutive observations.
### Autocorrelation
A measure of the correlation between a time series and a lagged version of itself. Partial autocorrelation controls for the values of the time series at all shorter lags.
## Time Series Forecasting | Introduction
Time series forecasting is the process of predicting future values based on past observations. Time series forecasting is used in a wide range of applications, such as sales forecasting, weather forecasting, and stock price prediction.
The **ARIMA** (Autoregressive Integrated Moving Average) model is a popular time series forecasting model that combines autoregressive, moving average, and differencing components.
Before we dive into ARIMA, let's first discuss two simpler time series forecasting models to build intuition for the components of ARIMA: **Moving Average (MA)** and **Autoregressive (AR)** Models.
## Time Series Forecasting | Autoregressive Models
**Autoregressive Models (AR)**: A type of time series model that predicts future values based on past observations. The AR model is based on the assumption that the time series is a linear combination of its past values. It's primarily used to capture the periodic structure of the time series.
AR(1) $$ X_t = \phi_1 X_{t-1} + c + \epsilon_t $$
Where:
- $X_t$ is the observed value at time $t$.
- $\phi_1$ is a learnable parameter of the model.
- $c$ is a constant term (intercept).
- $\epsilon_t$ is the white noise at time $t$.
## Time Series Forecasting | Autoregressive Models
**Autoregressive Models (AR)**: A type of time series model that predicts future values based on past observations. The AR model is based on the assumption that the time series is a linear combination of its past values. It's primarily used to capture the periodic structure of the time series.
AR(p) $$ X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \ldots + \phi_p X_{t-p} + c + \epsilon_t $$
Where:
- $X_t$ is the observed value at time $t$.
- $p$ is the number of lag observations included in the model.
- $\phi_1, \phi_2, \ldots, \phi_p$ are the parameters of the model.
- $c$ is a constant term (intercept).
- $\epsilon_t$ is the white noise at time $t$.
## Time Series Forecasting | Autoregressive Models
**Autoregressive Models (AR)**: A type of time series model that predicts future values based on past observations. The AR model is based on the assumption that the time series is a linear combination of its past values. It's primarily used for capturing the periodic structure of the time series.
$$ X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \ldots + \phi_p X_{t-p} + c + \epsilon_t $$
## Time Series Forecasting | Moving Average
**Moving Average (MA) Models**: A type of time series model that predicts future values based on the past prediction errors. A MA model's primary utility is to smooth out noise and short-term discrepancies from the mean.
MA(1) $$ X_t = \theta_1 \epsilon_{t-1} + \mu + \epsilon_t$$
Where:
- $X_t$ is the observed value at time $t$.
- $\theta_1$ is a learnable parameter of the model.
- $\mu$ is the mean of the time series.
- $\epsilon_t$ is the white noise at time $t$.
Example with a $\mu = 10 $ and $\theta_1 = 0.5$:
| t | $\widehat{X}_t$ | $\epsilon_t$ | $X_t$ |
|---|------------|--------------|-------|
| 1 | 10 | -2 | 8 |
| 2 | 9 | 1 | 10 |
| 3 | 10.5 | 0 | 10.5 |
| 4 | 10 | 2 | 12 |
| 5 | 11 | -1 | 10 |
## Time Series Forecasting | Moving Average
**Moving Average (MA) Models**: A type of time series model that predicts future values based on the past prediction errors. A MA model's primary utility is to smooth out noise and short-term discrepancies from the mean.
MA(1) $$ X_t = \theta_1 \epsilon_{t-1} + \mu + \epsilon_t$$
## Time Series Forecasting | Moving Average
**Moving Average (MA) Models**: A type of time series model that predicts future values based on the past prediction errors. A MA model's primary utility is to smooth out noise and short-term discrepancies from the mean.
MA(q) $$ X_t = \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} + \mu + \epsilon_t$$
Where:
- $X_t$ is the observed value at time $t$.
- $q$ is the number of lag prediction errors included in the model.
- $\theta_1, \theta_2, \ldots, \theta_q$ are the learnable parameters.
- $\mu$ is the mean of the time series.
- $\epsilon_t$ is the white noise at time $t$.
## Time Series Forecasting | ARMA
**Autoregressive Models with Moving Average (ARMA)**: A type of time series model that combines autoregressive and moving average components.
The ARMA model is defined as:
$$ X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \ldots + \phi_p X_{t-p} + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} + c + \epsilon_t $$
Where:
- $X_t$ is the observed value at time $t$.
- $\phi_1, \phi_2, \ldots, \phi_p$ are the autoregressive parameters.
- $\theta_1, \theta_2, \ldots, \theta_q$ are the moving average parameters.
- $c$ is a constant term (intercept).
- $\epsilon_t$ is the white noise at time $t$.
## Time Series Forecasting | ARIMA
**Autoregressive Integrated Moving Average (ARIMA)**: A type of time series model that combines autoregressive, moving average, and differencing components.
The ARIMA model is defined as:
$$ y_t' = \phi_1 y_{t-1}' + \phi_2 y_{t-2}' + \ldots + \phi_p y_{t-p}' + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} + c + \epsilon_t $$
Where:
- $y_t'$ is the differenced observation at time $t$.
- $\phi_1, \phi_2, \ldots, \phi_p$ are the autoregressive parameters.
- $\theta_1, \theta_2, \ldots, \theta_q$ are the moving average parameters.
- $c$ is a constant term (intercept).
- $\epsilon_t$ is the white noise at time $t$.
## Time Series Forecasting | Practical ARIMA
ARIMA takes three parameters when fitting a model ($p$, $d$, $q$).
| Parameter | Description | Estimation |
|-----------|-------------|------------|
| $p$ | The number of lag observations included in the model (lag order for autoregression). | Where there is a dropoff in Partial Autocorrelation Function (PACF) (with gradual decline in ACF). |
| $d$ | The number of times that the raw observations are differenced (degree of differencing). | Minimum amount of differencing required to achieve a significant Augmented Dickey-Fuller Test (ADF). |
| $q$ | The number of prediction errors included in the model (order of moving average). | Where there is a dropoff in the Autocorrelation Function (ACF) (with gradual decline in PACF). |
## Time Series Forecasting | Practical ARIMA
ARIMA takes three parameters when fitting a model ($p$, $d$, $q$).
| Parameter | Description | Estimation |
|-----------|-------------|------------|
| $p$ | The number of lag observations included in the model (lag order for autoregression). | Where there is a dropoff in Partial Autocorrelation Function (PACF) (with gradual decline in ACF). |
| $d$ | The number of times that the raw observations are differenced (degree of differencing). | Minimum amount of differencing required to achieve a significant Augmented Dickey-Fuller Test (ADF). |
| $q$ | The number of prediction errors included in the model (order of moving average). | Where there is a dropoff in the Autocorrelation Function (ACF) (with gradual decline in PACF). |
**Question**: What is a reasonable value of $p$ based on the following?
Spur Economics 2022
## Time Series Forecasting | Practical ARIMA
ARIMA takes three parameters when fitting a model ($p$, $d$, $q$).
| Parameter | Description | Estimation |
|-----------|-------------|------------|
| $p$ | The number of lag observations included in the model (lag order for autoregression). | Where there is a dropoff in Partial Autocorrelation Function (PACF) (with gradual decline in ACF). |
| $d$ | The number of times that the raw observations are differenced (degree of differencing). | Minimum amount of differencing required to achieve a significant Augmented Dickey-Fuller Test (ADF). |
| $q$ | The number of prediction errors included in the model (order of moving average). | Where there is a dropoff in the Autocorrelation Function (ACF) (with gradual decline in PACF). |
**Question**: What is a reasonable value of $d$ based on the following?
```python
import numpy as np
from statsmodels.tsa.stattools import adfuller
timeseries = ...
for d in range(0, 3):
diffed = np.diff(timeseries, n=d)
result = adfuller(diffed)
print(f"ADF Statistic for d={d}: {result[0]} p-value: {result[1]}")
```
```text
ADF Statistic for d=0: -2.5 p-value: 0.1
ADF Statistic for d=1: -3.2 p-value: 0.04
ADF Statistic for d=2: -4.1 p-value: 0.01
```
## Time Series Forecasting | Practical ARIMA
ARIMA takes three parameters when fitting a model ($p$, $d$, $q$).
| Parameter | Description | Estimation |
|-----------|-------------|------------|
| $p$ | The number of lag observations included in the model (lag order for autoregression). | Where there is a dropoff in Partial Autocorrelation Function (PACF) (with gradual decline in ACF). |
| $d$ | The number of times that the raw observations are differenced (degree of differencing). | Minimum amount of differencing required to achieve a significant Augmented Dickey-Fuller Test (ADF). |
| $q$ | The number of prediction errors included in the model (order of moving average). | Where there is a dropoff in the Autocorrelation Function (ACF) (with gradual decline in PACF). |
**Question**: What is a reasonable value of $q$ based on the following?
Spur Economics 2022
## Walk-Forward Validation
In walk-forward validation, the model is trained on historical data and then used to make predictions on future data. The model is then retrained on the updated historical data and used to make predictions on the next future data point. This process is repeated until all future data points have been predicted.
### Train / Validate Period
The historical data used to train and validate the time series model.
### Test Period
The future data used to evaluate the generalization performance of the time series model.
Peeratiyuth, 2018
## Walk-Forward Validation
In walk-forward validation, the model is trained on historical data and then used to make predictions on future data. The model is then retrained on the updated historical data and used to make predictions on the next future data point. This process is repeated until all future data points have been predicted.
### Train / Validate Period
The historical data used to train and validate the time series model.
### Test Period
The future data used to evaluate the generalization performance of the time series model.
Karaman, 2005
## Evaluation Metrics
**Mean Absolute Error (MAE)**: The average of the absolute errors between the predicted and actual values.
$$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
**Mean Squared Error (MSE)**: The average of the squared errors between the predicted and actual values.
$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
**Root Mean Squared Error (RMSE)**: The square root of the average of the squared errors between the predicted and actual values.
$$ RMSE = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 } $$
**Mean Absolute Percentage Error (MAPE)**: The average of the absolute percentage errors between the predicted and actual values.
$$ MAPE = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100\% $$
## Summary | What We Covered
| Term | Description |
|------|-------------|
| **Seasonal Decomposition** | A technique used to separate a time series into its trend, seasonal, and residual components. |
| **Stationarity** | A time series is said to be stationary if its basic properties do not change over time. |
| **Differencing** | A technique used to make a time series stationary by computing the difference between consecutive observations. |
| **Autocorrelation** | A measure of the correlation between a time series and a lagged version of itself. Partial autocorrelation controls for the values of the time series at all shorter lags. |
| **ARIMA** | A type of time series model that combines autoregressive, moving average, and differencing components. |
| **Walk-Forward Validation** | A method for evaluating the generalization performance of a time series model. |
| **Evaluation Metrics** | Regression metrics used to evaluate the performance of a time series model. |
## Summary | What We Didn't Cover
| Topic | Description |
|-------|-------------|
| **Seasonal ARIMA** | An [extension of ARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html) that includes seasonal components. |
| **SARIMAX** | An [extension of ARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html) that includes seasonal components and exogenous variables. |
| **Box-Jenkins Approach** | A [systematic method](https://en.wikipedia.org/wiki/Box%E2%80%93Jenkins_method) for identifying, estimating, and diagnosing ARIMA models. |
| **Maximum Likelihood Estimation** | A [method](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) for estimating the parameters of a statistical model by maximizing the likelihood function. |
| **Prophet** | A [forecasting tool](https://facebook.github.io/prophet/) developed by Facebook that is designed for forecasting time series data with strong seasonal patterns. |
| **SOTA Models** | Models like [TiDE](https://arxiv.org/abs/2304.08424) and [TSMixer](https://arxiv.org/abs/2303.06053) are state-of-the-art models for time series forecasting (2023+). |
# Exit Poll
## On a scale of 1-5, how confident are you with the **time series** methods such as:
1. Seasonal Decomposition
2. Stationarity & Differencing
3. Autocorrelation
4. ARIMA
# Agenda
# **Part 1**: Storytelling
- Importance of storytelling in data science.
- What every good data story should include.
- What every good presentation should include.
# **Part 2**: Ethics of Data Science
- Consent for Data
- Privacy
- Examining Bias
- Who is Held Accountable?
- Radicalization and Misinformation
# **Part 3**: Career Tips
- Effective Networking
- Monetizing Your Curiosity
- Building a Personal Brand
## Storytelling | Importance
Data science is not just about the results. It is about the **story** that the data tells.
An often-overlooked aspect of data science is the ability to **communicate** and **convince** others of the results of your analysis.
JustInsighting, 2024
## Storytelling | Every Data Story Must Include ...
1. Background of Problem
2. Statement of Assumptions
3. Motivation for Solving the Problem
4. Explanation of your Analysis
5. Declared Limitations & Future Improvements
## Storytelling | Background of Problem
### What is the problem you are trying to solve?
We are trying to predict the price of a house based on its square footage.
JustInsighting, 2024
## Storytelling | Statement of Assumptions
### What assumptions are you making in your analysis?
We assume that the data we are training on represents the general population.
### What are the implications of these assumptions?
If this assumption is incorrect, the model may fail to generalize.
JustInsighting, 2024
## Storytelling | Motivation for Solving the Problem
### Why is it important to solve this problem?
Predicting the price of a house can help buyers and sellers make informed decisions.
JustInsighting, 2024
## Storytelling | Explanation of your Analysis
### How did you analyze the data?
We used linear regression to predict the price of a house based on its square footage.
### How do you interpret the results?
Our linear model predicts that the price of a house increases by **$100 per square foot**. Note that we don't report MSE or RMSE here.
JustInsighting, 2024
## Storytelling | Declared Limitations & Future Improvements
### What are the limitations of your analysis?
The model may not be accurate for houses with unique features, such as a swimming pool.
### How can you improve the analysis in the future?
You can collect more data on houses with swimming pools to improve the accuracy of the model.
JustInsighting, 2024
## Storytelling Recap | Every Data Story Must Include ...
1. Background of Problem
2. Statement of Assumptions
3. Motivation for Solving the Problem
4. Explanation of your Analysis
5. Declared Limitations & Future Improvements
## Storytelling | Every Good **Presentation** Must Include ...
1. Clear and concise slides
2. A compelling narrative
3. Energy and confidence
## Storytelling | Clear and Concise Slides
### What makes a slide **clear** and **concise**?
- Use bullet points to summarize key points.
- Use visuals to illustrate complex concepts.
- Use a consistent font and color scheme.
## Storytelling | A Compelling Narrative
### What makes a narrative **compelling**?
- Tell a story that engages the audience.
- Use examples and anecdotes to illustrate key points.
- Use humor and emotion to connect with the audience.
## Storytelling | Energy and Confidence
### How can you project **energy** and **confidence**?
- Speak clearly and with sufficient volume.
- Make eye contact with the audience.
- Use body language to emphasize key points.
## Storytelling | Every Good **Presentation** Must Include ...
1. Clear and concise slides
2. A compelling narrative
3. Energy and confidence
## Ethics of Data Science | Topics
1. Consent for Data
2. Privacy
3. Examining Bias
4. Accountability
5. Radicalization and Misinformation
## Ethics of Data Science | Consent for Data
### Why is consent important in data science?
- To protect the privacy of individuals.
- To ensure that data is used ethically and responsibly.
### How can you obtain consent for data?
- Inform individuals about how their data will be used.
- Obtain explicit consent before collecting or using data.
Euractiv, 2024
## Ethics of Data Science | Consent for Data
### **Opt-in** vs. **Opt-out**
- Opt-in: Individuals must actively consent to the use of their data.
- Opt-out: Individuals must actively decline the use of their data.
### **Granular** vs. **Broad**
- Granular: Individuals can choose how their data is used.
- Broad: Individuals have limited control over how their data is used.
Euractiv, 2024
## Ethics of Data Science | Privacy
### Why is privacy important in data science?
- To protect the personal information of individuals.
- To prevent the misuse of data for malicious purposes.
### How can you protect privacy in data science?
- Anonymize data to remove personally identifiable information.
- Encrypt data to prevent unauthorized access.
- Limit access to data to authorized individuals.
SBPhotos, 2018
## Ethics of Data Science | Privacy Compliance with Regulations
| Regulation | Description |
|------------|-------------|
| **General Data Protection Regulation (GDPR)** | GDPR is a European Union regulation that protects the personal data of EU citizens and residents. |
| **Health Information Portability and Accountability Act (HIPAA)** | HIPAA assures that an individual’s health information is properly protected by setting use and disclosure standards. |
| **California Consumer Privacy Act (CCPA)** | The CCPA is a state statute intended to enhance privacy rights and consumer protection for residents of California, United States. The CCPA is the first state statute to require businesses to provide consumers with the ability to opt-out of the sale of their personal information. |
## Ethics of Data Science | Examining Bias
### Why is bias a concern in data science?
- Bias can lead to unfair or discriminatory outcomes.
- Bias can perpetuate stereotypes and reinforce inequality.
### How can you identify and address bias in data science?
- Examine the data for bias in the collection or labeling process.
- [Fairness-aware machine learning](https://en.wikipedia.org/wiki/Fairness_(machine_learning)) to mitigate bias.
Google, 2018
## Ethics of Data Science | Accountability
### **Scenario**: A self-driving car causes an accident, resulting in injury or death.
### Who should be held accountable?
- The manufacturer of the car.
- The developer of the software.
- The owner of the car.
- The government.
Nygard, 2021
## Ethics of Data Science | Radicalization and Misinformation
### How can data science be used to **radicalize** and **misinform** people?
- By manipulating data to support false narratives.
- By targeting vulnerable populations with misleading information.
- By hyper-recommending content that reinforces extremist views.
### How can you combat radicalization and misinformation in data science?
- Fact-checking and verifying sources.
- Promoting trust, media literacy, and critical thinking.
- Implementing algorithms that prioritize accuracy and credibility.
TikTok, 2024
## Ethics of Data Science | Recap
1. Consent for Data
2. Privacy
3. Examining Bias
4. Accountability
5. Radicalization and Misinformation