Figure from da_b_da on Unsplush.

The codes applied GBDT (Gradient Boosting Decision Tree) with the gbm package in R to analyze the Onboard Travel Survey data and explore the relationship between people’s walking distance to access transit stops and built environments while controlling for trip and demographic variables. If you think it is useful for your work, please kindly consider citing our work.

Tao, T., Wang, J., & Cao, X. (2020). Exploring the non-linear associations between spatial attributes and walking distance to transit. Journal of Transport Geography, 82, 102560. https://doi.org/10.1016/j.jtrangeo.2019.102560

1 Package and data

## load the package
library(gbm)
## Loaded gbm 2.1.8
## load the dataset
urbanloc <- read.csv('DisToTransit_final_data.csv')

2 Modelling

Construct parameter regularization. Please note that this part requires high computation and time cost.

## create a list to store the parameters and results
hyper_grid <- expand.grid(
  interaction.depth = seq(1, 49),  ## tree depth from 1 to 49
  optimal_trees = 0, ## create a variable to store the number of iterations with best model prediction performance
  min_RMSE = 0 ## create a variable to store RMSE
)

## parameter regularization
for (i in 1:nrow(hyper_grid)) {
  
  set.seed(2017) ## set seed to ensure repetition
  gbm.tune <- gbm(
    WalkDis ~ PeakHou + TripDis + NumTrans + WorkDes ## formula
    + PerPopWorAgeS + Per0CarHouS + PerLowWagWorHomS + PopDenS +
      JobDenS + EmHouMixS + NetDenS + IntDenS
    + Mal + Whi + Blk + You + Sen + HouInc + Lab + Veh + DriLic +
      TraPas,
    data = urbanloc, 
    distribution = 'gaussian', ## distribution of the dependent variable
    n.trees =