Back to Article
Article Notebook
Download Source

Final Project Abstract - Lab 7

Authors
Affiliation

Bella Conrad

Colorado State University

Zachary Cramton

Colorado State University

Rachel Delorie

Colorado State University

Urbanization, Density and Access to Public Parks in the United States

Abstract

Introduction

Since 2008, the majority of the world’s population has lived in urban areas, a result of urbanization in developing countries (Beall et al., 2010; Kohlhase, 2013). The United States developed earlier than many nations, with more than 50 percent of the population living in urban areas.

Definition of Urban Areas in Census History

Prior to the 2020 Census urban areas were defined as any area with greater than 2500 people. Following the 2010 census urban clusters described areas with populations greater than 2,500 and less than 50,000; urbanized areas described areas with a population greater than 50,000. For the 2020 Census the threshold was changed to 5000 people (Ratcliffe, 2022).

by the 14th Census in 1920. In the century since the 1920 census the percentage of individuals living in urban areas has increased to 80.7% (Slack & Jensen, 2020). As more people moved to urban areas, those areas expanded forming urbanized areas and large cities.

Urban planning has existed for centuries out of necessity, historically dominated by efficiency and utilitarianism, optimizing the world we live in for profitability, corporate productivity, and automobile-based mobility. This optimization came with sacrifices, which now impact an increasingly large majority of the population. In recent years, the discipline has begun to prioritize human factors over utilitarian efficiency. Thousands of years of living in rural settings makes urban living hard for most people’s biology. Connection to nature and time outdoors even in small amounts has been shown to be a vital part maintaining physical and mental health [DOTHIS:: Find additional source]. In an effort to make urban spaces more livable, planners are turning to parks and natural areas to connect people to nature.

Equity issues aside, overturning and correcting more than a century of bad planning is a daunting task. Many cities filled in and built up over the course of the 20th century as land became a premium commodity [DOTHIS:: Find additional source]. Does this density present significant challenges for today’s planning professionals? This research seeks to investigate the relationship between urban demographics like density and park access. In exploring this relationship, we hypothesize that there is an intermediate/sublinear relationship between urban population density and public open space availability.

Data Overview

This report uses data from the UN-Habitat Urban Indicators Database and the ParkServe® Database maintained by the Trust for Public Land. The UN data relates to the UN-SDG 11.7.1 pertaining to access to open spaces and green areas.

The January 2025 version of the UN Open Spaces and Green Areas data includes the average share of urban areas allocated to streets and open public spaces as well as the share of the urban population with convenient access to an open public space.

UN Definition

In this case, the UN defines “convenient access to an open public space” as the “urban population withing 400 meters walking distance along the street network to an open public space” (May et al., 2000).

These data collected by the UN were collected in 2020 and provided as a .xls format spreadsheet. These data were converted to .csv format with Microsoft Excel. The ParkServe® data selected for use is the 2020 data set to match the year the UN data was recorded. Specifically, this report uses elements of the City Park Facts: Acreage & Park System Highlights. The ParkServe® data is much less synthesized and was available as a .xml file. The file was structured for viewing as a spreadsheet rather than for further analysis and included multiple worksheets withing the workbook. In converting the file to a .csv file, the data spread across multiple worksheets was collated in a single worksheet and converted to a summarized dataset .csv file.

These data are lacking a shared numerical position data type but share a city name column formatted as “city_name, two_letter_state_abbreviation”. There is not perfect overlap between cities with data in each database however, there are 25 cities shared between the datasets. Cities present in only one data set will be culled when the data is joined.

Methods

  1. Clean the data. The raw data were downloaded as Excel spreadsheets, some reformatting in Excel was required to effectively exporting as a .csv file and importing the new summarized file to RStudio. Remaining data cleaning will occur in R as needed including any header changes or additional columns needed.
  2. Conduct Exploratory Data Analysis (EDA).
  3. Join datasets by “city name” to have a complete working dataset. These data will be combined into a single data frame with an inner join because there is a large number of cities listed in one data set but not the other. The new dataset will include only cities found in both datasets, with columns from both.
Limiting Scope

The cities found in only one dataset will be cut from the data to accommodate the limited scope of the project. With a bigger scope it is possible that additional data could be used to understand these patterns with more depth.

  1. Prep data and split it into training and testing datasets. Perform a 10-fold cross-validation on training data.
  2. Create a recipe.
  3. Set up several models in regression mode.
  4. Create a workflow set including the previously written models and the recipe.
  5. Map function over workflow using workflow map.
  6. Using the highest performing model, fit the data and augment.
  7. Plot and graph data to visually display test results.
  8. Explore using the model to predict values for cities included in only one document (if time allows).

Exploratory Data Analysis (EDA)

The data has already been discussed in general terms in the data overview section. There will be a readme file created to elaborate on the sources, formatting and manipulation required for each dataset before joining them into the urban_parks_data data frame. In general terms, prior to importing into RStudio, the .xml files the data came in were opened in MS Excel; the sheets were formatted to be converted to .csv files including condensing multiple worksheets of the ParkServe® data into a sinigle sheet for easier conversion to a .csv file. While some of the cleaning done in Excel could have been completed in RStudio, it was not efficient to do so. Using Excel was faster and more flexible for that use case. Similar reformatting was required with the UN data as the headers were unreasonably long by default. The readme that will be created for each (or both) files will include a more detailed summary of what each variable means.

In [1]:
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
Warning: package 'here' was built under R version 4.4.3
here() starts at C:/Users/Zacha/github/CSU/ESS 330/ess_330_project_proposal
library(flextable)

Attaching package: 'flextable'

The following object is masked from 'package:purrr':

    compose
library(patchwork)
Warning: package 'patchwork' was built under R version 4.4.3
# Create some visualizations and descriptions of what data you have, where you got it, and how and if you need to clean and manipulate it for your project

# Import data from csvs and clean NAs
parkserve_data <- read_csv(here("data", "clean_data/parkserve_summarized_facts_2020.csv")) %>% 
  drop_na()
Rows: 100 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): city_name, city_pop, land_area, revised_area, percent_designed_par...
dbl  (5): parkland_area, designed_park_area, natural_park_area, parkland_per...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
un_land_use_data <- read_csv(here("data", "clean_data/un_land_use.csv")) %>% 
  drop_na()
Rows: 59 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): city_name
dbl (2): mean_percent_built_open_space, mean_percent_open_space_access

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Add columns and finish cleaning parkserve data
clean_parkserve_data <- parkserve_data %>%
  mutate(
    across(-city_name, ~ as.numeric(.x)),  # Convert all columns to numeric except for city name
    parkland_percent = parkland_percent * 100, # Convert parkland percent from ratio
    # Fix design/natural park area percentage calculations
    percent_designed_parks = ifelse(parkland_area == 0, NA, (designed_park_area / parkland_area) * 100),
    percent_natural_parks = ifelse(parkland_area == 0, NA, (natural_park_area / parkland_area) * 100),
    #New calculation for designed-natural area ratio
    dn_area_ratio = ifelse(percent_natural_parks == 0, NA, percent_designed_parks / percent_natural_parks)
  )

  
  
# Join data removing cities found in only one of the two datasets
urban_parks_data <- clean_parkserve_data %>% 
  inner_join(un_land_use_data, by = "city_name")  

# Basic data structure exploration
glimpse(urban_parks_data)
Rows: 25
Columns: 18
$ city_name                      <chr> "Anchorage, AK", "Atlanta, GA", "Boston…
$ city_pop                       <dbl> 299100, 498059, 687725, 2744859, 377963…
$ land_area                      <dbl> 1090997, 85217, 30897, 145686, 49726, 2…
$ revised_area                   <dbl> 1086019, 84250, 29175, 136796, 46880, 2…
$ parkland_area                  <dbl> 914138, 5293, 5072, 13609, 3170, 20352,…
$ designed_park_area             <dbl> 2417, 3864, 2556, 8593, 1792, 10974, 40…
$ natural_park_area              <dbl> 911721, 1429, 2516, 4430, 1378, 9378, 1…
$ parkland_percent               <dbl> 84.173297, 6.282493, 17.384747, 9.94839…
$ percent_designed_parks         <dbl> 0.2644021, 73.0020782, 50.3943218, 63.1…
$ percent_natural_parks          <dbl> 99.735598, 26.997922, 49.605678, 32.551…
$ pop_density                    <dbl> 0.28, 5.91, 23.57, 20.07, 8.06, 6.39, 7…
$ parkland_per_1k_pop            <dbl> 3056.295553, 10.627255, 7.375041, 4.744…
$ park_units                     <dbl> 228, 416, 373, 645, 179, 397, 302, 70, …
$ park_units_per_10k_pop         <dbl> 7.622869, 8.352424, 5.423680, 2.349847,…
$ percent_half_mile_walk         <dbl> 0.75091, 0.72447, 0.99790, 0.98220, 0.8…
$ dn_area_ratio                  <dbl> 0.00265103, 2.70398880, 1.01589825, 1.9…
$ mean_percent_built_open_space  <dbl> 22.0, 13.8, 19.8, 17.2, 20.0, 20.4, 20.…
$ mean_percent_open_space_access <dbl> 71.9, 21.2, 68.2, 47.8, 29.8, 41.9, 43.…
# Descriptive Stats

  # Write function to round numeric columns to two decimal places
  round_numeric <- function(df) {
    df %>% 
      mutate(across(where(is.numeric), ~round(.x, 2)))
  }

  # Summarize stats by variable
    desc_stats_parks <- urban_parks_data %>% 
      select(where(is.numeric)) %>% 
      pivot_longer(cols = everything(), names_to = "variable", values_to = "value") %>% 
      group_by(variable) %>% 
      summarize(mean = mean(value, na.rm = TRUE), 
                median = median(value, na.rm = TRUE), 
                sd = sd(value, na.rm = TRUE),
                Q1 = quantile(value, 0.25, na.rm = TRUE),
                Q3 = quantile(value, 0.75, na.rm = TRUE)) %>% 
      round_numeric()
      
      # Print descriptive stats with flextable
      desc_stats_flex <- flextable(desc_stats_parks) %>%
        set_caption("Summarized Urban Parks Statistics") %>% 
        set_header_labels(
          variable = "Variable",
          mean = "Mean",
          median = "Median",
          sd = "Standard Deviation",
          Q1 = "1st Quartile (Q1)",
          Q3 = "3rd Quartile (Q3)") %>% 
        autofit()

# Find Top/Bottom cities for percent parkland
  
  # Select relevant columns
    simplified_vars <- c("city_name", "city_pop", "revised_area", "pop_density", "parkland_area", "dn_area_ratio", "parkland_percent", "parkland_per_1k_pop", "percent_half_mile_walk")
  
  # Filter top/bottom 10 cities
  top10_park_percent <- urban_parks_data %>% 
    arrange(desc(parkland_percent)) %>% 
    slice_head(n = 10) %>% 
    select(all_of(simplified_vars)) %>% 
    round_numeric()
  
  bottom10_park_percent <- urban_parks_data %>% 
    arrange(parkland_percent) %>% 
    slice_head(n = 10) %>% 
    select(all_of(simplified_vars)) %>% 
    round_numeric()
  
  # Create top/bottom 10 flextables w/ function
  # Create function
  make_best_worst_flextbl <-function(df, caption) {
    flextable(df) %>% 
    set_caption(caption) %>%
    set_header_labels(
      city_name = "City Name",
      city_pop = "City Population",
      revised_area = "City Land Area (Revised) (Acres)",
      pop_density = "Population Density (People/Acre)",
      parkland_area = "City Parkland Area (Acres)",
      parkland_percent = "Percent Parkland",
      parkland_per_1k_pop = "Parkland Per (1000) Capita",
      percent_half_mile_walk = "Percent of Residents within 0.5 Miles of a Park",
      dn_area_ratio = "Designed-Natural Park Area Ratio (Designed Park (%) / Natural Park (%)") %>% 
    autofit()
  }
  
  top10_park_percent_flex <- make_best_worst_flextbl(top10_park_percent, "Top 10 Cities for Parkland Percentage")
  
  bottom10_park_percent_flex <- make_best_worst_flextbl(bottom10_park_percent, "Top 10 Cities for Parkland Percentage")
  
# Make plots to visualize the data
# Histogram: Land Area
land_area_plot <- ggplot(urban_parks_data, aes(x = as.numeric(land_area))) +
  geom_histogram(bins = 20, fill = "steelblue", color = "white") +
  labs(x = "Land Area (Acres)", y = "Frequency", title = "Distribution of City Land Areas") +
  theme_minimal()

# Scatterplot: Land Area vs Parkland Area
land_vs_park_area_plot <- ggplot(urban_parks_data, aes(x = as.numeric(land_area), y = parkland_area)) +
  geom_point(color = "forestgreen") +
  labs(x = "Land Area (Acres)", y = "Parkland Area\n(Acres)", title = "Land Area vs Parkland Area") +
  scale_x_continuous(labels = scales::label_comma()) +
  scale_y_continuous(labels = scales::label_comma()) +
  theme_minimal()

# Scatterplot: Population Density vs Parkland Percent
density_vs_park_percent_plot <- ggplot(urban_parks_data, aes(x = as.numeric(pop_density), y = parkland_percent)) +
  geom_point(color = "darkorange") +
  labs(x = "Population Density (People/Acre)", y = "Percent Parkland", title = "Population Density vs\nParkland Percent") +
  scale_x_continuous(labels = scales::label_comma()) +
  scale_y_continuous(labels = scales::label_comma()) +
  theme_minimal()

# Scatterplot: Designed-Natural Park Area Ratio vs Parkland Percent
dn_area_ratio_vs_park_percent_plot <- ggplot(urban_parks_data, aes(x = dn_area_ratio, y = parkland_percent)) +
  geom_point(color = "mediumvioletred") +
  labs(x = "Designed-Natural Park Area Ratio", y = "Percent Parkland", title = "Designed-Natural Park Area\nRatio vs Parkland Percent") +
  scale_x_continuous(labels = scales::label_comma()) +
  scale_y_continuous(labels = scales::label_comma()) +
  theme_minimal()

# Scatterplot: Percent Designed Parks vs Percent Natural Parks
designed_vs_natural_parks_plot <- ggplot(urban_parks_data, aes(x = percent_designed_parks, y = percent_natural_parks)) +
  geom_point(color = "cadetblue") +
  labs(x = "Percent Designed Parks", y = "Percent Natural\nParks", title = "Percent Designed vs Natural Parks") +
  theme_minimal()

# Scatterplot: Percent Open Space Access vs Percent Built Open Space
open_space_access_vs_built_plot <- ggplot(urban_parks_data, aes(x = mean_percent_open_space_access, y = mean_percent_built_open_space)) +
  geom_point(color = "seagreen") +
  labs(x = "Mean % Open Space Access", y = "Mean % Built\nOpen Space", title = "Accessibility of Built Open Space") +
  theme_minimal()

# Combine all plots in one figure using patchwork (optional)
all_eda_plots <- (land_area_plot | land_vs_park_area_plot) / 
  (density_vs_park_percent_plot | dn_area_ratio_vs_park_percent_plot) / 
  (designed_vs_natural_parks_plot| open_space_access_vs_built_plot) +
  plot_layout(guides = "collect")

# Display data summary and visualization

  # Display flextables
  desc_stats_flex

Variable

Mean

Median

Standard Deviation

1st Quartile (Q1)

3rd Quartile (Q3)

city_pop

1,113,559.56

655,061.00

1,696,534.55

377,963.00

1,006,142.00

designed_park_area

5,559.20

3,864.00

5,020.97

2,652.00

5,785.00

dn_area_ratio

2.42

1.02

4.90

0.26

1.56

land_area

178,285.40

88,800.00

224,466.29

53,723.00

201,635.00

mean_percent_built_open_space

18.71

18.20

3.23

17.20

20.60

mean_percent_open_space_access

42.06

40.00

17.04

29.80

52.80

natural_park_area

48,113.72

4,538.00

180,621.84

1,429.00

22,527.00

park_units

460.80

302.00

811.06

179.00

416.00

park_units_per_10k_pop

4.55

4.54

1.72

3.28

5.42

parkland_area

53,824.36

9,478.00

180,158.96

5,075.00

28,312.00

parkland_per_1k_pop

143.17

12.09

607.26

9.15

28.14

parkland_percent

15.31

12.54

15.58

6.76

17.38

percent_designed_parks

44.66

50.28

27.85

20.43

60.90

percent_half_mile_walk

0.76

0.80

0.20

0.63

0.96

percent_natural_parks

53.78

49.61

28.29

32.55

79.57

pop_density

9.52

6.39

9.84

3.59

12.41

revised_area

175,389.60

87,844.00

222,631.64

52,765.00

196,098.00

  top10_park_percent_flex

City Name

City Population

City Land Area (Revised) (Acres)

Population Density (People/Acre)

City Parkland Area (Acres)

Designed-Natural Park Area Ratio (Designed Park (%) / Natural Park (%)

Percent Parkland

Parkland Per (1000) Capita

Percent of Residents within 0.5 Miles of a Park

Anchorage, AK

299,100

1,086,019

0.28

914,138

0.00

84.17

3,056.30

0.75

New Orleans, LA

386,105

107,655

3.59

27,775

0.11

25.80

71.94

0.80

Washington, DC

702,321

38,955

18.03

9,478

1.09

24.33

13.50

0.98

New York, NY

8,627,852

187,946

45.91

40,190

1.01

21.38

4.66

0.99

San Diego, CA

1,399,844

205,918

6.80

39,385

0.29

19.13

28.14

0.81

Virginia Beach, VA

457,832

159,341

2.87

28,312

0.26

17.77

61.84

0.68

Boston, MA

687,725

29,175

23.57

5,072

1.02

17.38

7.38

1.00

Honolulu, HI

1,006,142

379,885

2.65

57,141

0.09

15.04

56.79

0.79

Minneapolis, MN

421,339

33,958

12.41

5,075

8.47

14.94

12.04

0.98

Jacksonville, FL

925,142

467,298

1.98

67,707

0.14

14.49

73.19

0.35

  bottom10_park_percent_flex 

City Name

City Population

City Land Area (Revised) (Acres)

Population Density (People/Acre)

City Parkland Area (Acres)

Designed-Natural Park Area Ratio (Designed Park (%) / Natural Park (%)

Percent Parkland

Parkland Per (1000) Capita

Percent of Residents within 0.5 Miles of a Park

Durham, NC

275,758

68,678

4.02

2,665

0.11

3.88

9.66

0.51

Memphis, TN

655,061

196,098

3.34

9,194

1.10

4.69

9.15

0.46

Winston-Salem, NC

248,839

83,917

2.97

4,263

7.34

5.08

17.13

0.37

Detroit, MI

660,960

87,844

7.52

5,102

3.63

5.81

7.72

0.80

Toledo, OH

277,467

51,169

5.42

3,175

1.41

6.20

11.44

0.81

Atlanta, GA

498,059

84,250

5.91

5,293

2.70

6.28

10.63

0.72

Cleveland, OH

377,963

46,880

8.06

3,170

1.30

6.76

8.39

0.83

Dallas, TX

1,378,903

215,676

6.39

20,352

1.17

9.44

14.76

0.71

St. Louis, MO

310,144

39,090

7.93

3,749

23.66

9.59

12.09

0.98

Chicago, IL

2,744,859

136,796

20.07

13,609

1.94

9.95

4.74

0.98

  # Display patchwork plots
  all_eda_plots

References

Beall, J., Guha-Khasnobis, B., & Kanbur, R. (2010). Urbanization and development: Multidisciplinary perspectives. Oxford University Press.
Kohlhase, J. E. (2013). The new urban world 2050: Perspectives, prospects and problems. Regional Science Policy & Practice, 5(2), 153–166.
May, R., Rex, K., Bellini, L., Sadullah, S., Nishi, E., James, F., & Mathangani, A. (2000). UN habitat indicators database: Evaluation as a source of the status of urban development problems and programs. Cities, 17(3), 237–244.
Ratcliffe, M. (2022). Redefining Urban Areas following the 2020 Census. In Census.gov. https://www.census.gov/newsroom/blogs/random-samplings/2022/12/redefining-urban-areas-following-2020-census.html
Slack, T., & Jensen, L. (2020). The changing demography of rural and small-town america. Population Research and Policy Review, 39(5), 775–783.