How to Calculate Lag in Pandas DataFrame: A Step-by-Step Guide for Analyzing Delinquency Trends
To solve this problem, we need to create a table that includes the customer_id, binned_due_date, and days_after_due_date columns from your original data. Then we can calculate the lag of the delinquency column for 7 days (d7_t-1) and 30 days (d30_t-1) using the following SQL query:
SELECT
customer_id,
binned_due_date,
days_after_due_date,
delinquency,
lag(delinquency) OVER (PARTITION BY customer_id ORDER BY days_after_due_date) AS d7_t-1,
lag(delinquency) OVER (PARTITION BY customer_id ORDER BY days_after_due_date, binned_due_date) AS d30_t-1
FROM your_table
If you are using Python with pandas library to manipulate and analyze data, here is the equivalent code:
import pandas as pd
# Assuming df is the DataFrame containing your original data
df = pd.DataFrame({
'customer_id': [33179018],
'binned_due_date': ['2020-01-01'],
'days_after_due_date': [0],
'delinquency': [0.286724]
})
# Shift the delinquency column 7 days forward
df['d7_t-1'] = df['delinquency'].shift(6)
# Shift the delinquency column 30 days forward
df['d30_t-1'] = df['delinquency'].shift(29)
print(df)
This code creates a new DataFrame with the customer_id, binned_due_date, and days_after_due_date columns, and then calculates the lag of the delinquency column for 7 days (d7_t-1) and 30 days (d30_t-1). The results are printed to the console.
Last modified on 2023-05-17