Lab 2: Minnesota Tree Growth

ESS 330 - Quantitative Reasoning

library(quarto)
Warning: package 'quarto' was built under R version 4.4.3
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)
library(here)
Warning: package 'here' was built under R version 4.4.3
here() starts at C:/Users/Zacha/github/CSU/ESS_330/csu_ess_330
# Data Import
tree_dat <- read.csv(here("data", "lab_data", "minnesota_tree_data", "tree_dat.csv"))
# Question 1: Read in the Minnesota tree growth dataset. Use glimpse to understand the structure and names of the dataset. Decribe the structure and what you see in the dataset?
glimpse(tree_dat)
Rows: 131,386
Columns: 8
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ stand   <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A…
$ year    <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 19…
$ species <chr> "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA"…
$ age     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
$ inc     <dbl> 0.930, 0.950, 0.985, 0.985, 0.715, 0.840, 0.685, 0.940, 1.165,…
$ rad_ib  <dbl> 10.78145, 11.73145, 12.71645, 13.70145, 14.41645, 15.25645, 15…

Question 1: This data table has 8 columns; treeID, standID, stand, year, species, age, inc, rad_ib. There are 131,386 rows.

# Question 2: How many records have been made in stand 1?
tree_dat %>%
    filter(standID == 1) %>%
    tally() # Find the number of recorded data from stand 1.
    n
1 979

Question 2: 979 records have been made in stand 1.

# Question 3: How many records of the Abies balsamea and Pinus strobus species have been made?
tree_dat %>%
    filter(species %in% c("ABBA", "PIST")) %>%
    count(species) # Find the number of times two species have been recorded.
  species     n
1    ABBA 13033
2    PIST  4188

Question 3: There are 13,033 records of Abies balsamea and 4188 records of Pinus Strobus.

# Question 4: How many trees are older then 200 years old in the last year of the dataset?
last_year <- max(tree_dat$year, na.rm = TRUE) # Find the last year of the data set.

tree_dat %>%
  filter(year == last_year, age > 200) %>%
  tally() # Counts trees older than 200 years in the last year.
  n
1 7

Question 4: In the last year recorded, there were 7 trees older than 200.

# Question 5: What is the oldest tree in the dataset found using slice_max?
tree_dat %>%
  slice_max(order_by = age, n = 1)
  treeID standID stand year species age  inc rad_ib
1     24       2    A2 2007    PIRE 269 0.37 308.84

Question 5: The oldest tree in the data set is treeID 24, a Pinus resinosa at 269 yoa.

# Question 6: Find the oldest 5 trees recorded in 2001. Use the help docs to understand optional parameters
tree_dat %>%
  filter(year == 2001) %>%
  slice_max(order_by = age, n = 5)
  treeID standID stand year species age   inc  rad_ib
1     24       2    A2 2001    PIRE 263 0.210 306.880
2     25       2    A2 2001    PIRE 259 0.280 156.210
3   1595      24    F1 2001    FRNI 212 0.579 156.267
4   1598      24    F1 2001    FRNI 206 0.394 130.251
5   1712      26    F3 2001    FRNI 206 0.168 154.354

Question 6: The oldest five trees in measured in 2001 include three Fraxinus nigra (FRNI) individuals and two Pinus resinosa (PIRE) individuals. Oldeds to youngest: 263 (PIRE), 259 (PIRE), 212 (FRNI), 206 (FRNI), 206 (FRNI).

# Question 7: Using slice_sample, how many trees are in a 30% sample of those recorded in 2002?
tree_dat %>%
  filter(year == 2002) %>%
  slice_sample(prop = 0.3) %>%
  tally ()
    n
1 687

Question 7: There are 687 trees recorded in a 30% sample of 2002.

#Question 8: Filter all trees in stand 5 in 2007. Sort this subset by descending radius at breast height (rad_ib) and use slice_head() to get the top three trees. Report the tree IDs.
tree_dat %>%
  filter(year == 2007, standID == 5) %>%
  arrange(desc(rad_ib)) %>%
  slice_head(n = 3)
  treeID standID stand year species age   inc   rad_ib
1    128       5    A6 2007    PIST  82 0.885 238.8850
2    157       5    A6 2007    PIRE  85 0.900 217.8700
3    135       5    A6 2007    PIMA  84 0.110 210.1874

Question 8: The three largest trees in stand 5 when measured in 2007 were (in descending order) treeIDs: 128, 157, 135.

# Question 9: Reduce your full data.frame to [treeID, stand, year, and radius at breast height]. Filter to only those in stand 3 with records from 2007, and use slice_min to pull the smallest three trees measured that year.
tree_dat %>%
  select("treeID", "standID", "year", "rad_ib") %>%
  filter(standID == 3, year == 2007) %>%
  slice_min(order_by = rad_ib, n = 3)
  treeID standID year rad_ib
1     50       3 2007 47.396
2     56       3 2007 48.440
3     36       3 2007 54.925

Question 9: The three smallest trees measured in stand 3 during 2007 were (from smallest to largest) tree ID: 50, 56, 36. The range of the three smallest trees was 47.396 to 54.925 mm.

# Question 10: Use select to remove the stand column. Use glimspe to show the dataset.
tree_dat %>% 
  select(-"stand") %>% 
  glimpse()
Rows: 131,386
Columns: 7
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ year    <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 19…
$ species <chr> "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA"…
$ age     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
$ inc     <dbl> 0.930, 0.950, 0.985, 0.985, 0.715, 0.840, 0.685, 0.940, 1.165,…
$ rad_ib  <dbl> 10.78145, 11.73145, 12.71645, 13.70145, 14.41645, 15.25645, 15…
# Question 11: Look at the help document for dplyr::select and examine the “Overview of selection features”. Identify an option (there are multiple) that would help select all columns with the string “ID” in the name. Using glimpse to view the remaining data set.
tree_dat %>% 
  select(contains("ID")) %>% # Could also use "ends_with()" or "matches()", though the former is limited in application requiring the column headers to end in "ID".
  glimpse()
Rows: 131,386
Columns: 2
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
# Question 12: Find a selection pattern that captures all columns with either ‘ID’ or ‘stand’ in the name. Use glimpse to verify the selection.
tree_dat %>% 
  select(contains("ID")|contains("stand")) %>% 
  glimpse()
Rows: 131,386
Columns: 3
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ stand   <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A…
# Question 13: Looking back at the data dictionary, rename rad_ib and inc to include _[unit] in the name. Unlike earlier options, be sure that this renaming is permanent, and stays with your data.frame (e.g. <-). Use glimpse to view your new data.frame.
tree_dat_mm <- tree_dat %>% 
  rename("rad_ib_mm" = "rad_ib", "inc_mm" = "inc")
glimpse(tree_dat_mm)
Rows: 131,386
Columns: 8
$ treeID    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ standID   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ stand     <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", …
$ year      <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, …
$ species   <chr> "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABB…
$ age       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ inc_mm    <dbl> 0.930, 0.950, 0.985, 0.985, 0.715, 0.840, 0.685, 0.940, 1.16…
$ rad_ib_mm <dbl> 10.78145, 11.73145, 12.71645, 13.70145, 14.41645, 15.25645, …
# Question 14: A key measurement in forestry in “basal area column”. The metric is computed with the formula:
  # BA(m2) = 0.00007854⋅DBH^
# Where DBH is the diameter at breast height (cm). Use mutate to compute DBH in centimeters, and BA in m2 (HINT: Make sure rad_ib is in cm prior to computing the diameter!). What is the mean BA_m2 of the the species POTR in 2007?

# Create columns for dia. at breast height (cm) and basal area (m^2).
tree_dat_exp <- tree_dat_mm %>% 
  mutate(DBH_cm = 2*rad_ib_mm/10, BA_m2 = 0.00007854*(DBH_cm)^2)

# Filter for year 2007 and species POTR.
tree_dat_exp %>%   
  filter(species == "POTR", year == 2007) %>% 
  select(BA_m2) %>% 
  pull(BA_m2) %>% 
  mean()           # Find mean BA (m^2).
[1] 0.03696619

Question 14: The mean basal area (m^2) of Populus tremuloides (POTR) from 2007 is 0.0369 m^2.

# Question 15: Lets say for the sake of our study, trees are not established until they are 5 years of age. Use if_else to add a boolean column to our dataset called established that is TRUE if the age is greater then 5 and FALSE if less then or equal to five. Once added, use count (see ?count) to determine how many records are from estabilshed trees?
tree_dat_exp %>% 
  mutate(established = if_else(age >5, TRUE, FALSE)) %>% 
  filter(established == TRUE) %>% 
  count(TRUE)
  TRUE      n
1 TRUE 122503

Question 15: There are 122503 records from established trees.

# Question 16: Use mutate and case_when to add a new column to you data.frame that classifies each tree into the proper DBH_class. Once done, limit your dataset to the year 2007 and report the number of each class with count.

# Adds column tree_dat_exp classifying trees based on DBH_cm.
tree_dat_exp <- tree_dat_exp %>% 
  mutate(DBH_class = case_when(
    DBH_cm > 0 & DBH_cm <= 2.5 ~ "seedling",
    DBH_cm > 2.5 & DBH_cm <= 10 ~ "sapling",
    DBH_cm > 10 & DBH_cm <= 30 ~ "pole",
    DBH_cm > 30 ~ "sawlog"
  ))

# Counts trees by DBH class for 2007.
tree_dat_exp %>% 
  filter(year == 2007) %>%
  count(DBH_class) %>% 
  complete(DBH_class = c("seedling", "sapling", "pole", "sawlog"), fill = list(n=0))
# A tibble: 4 × 2
  DBH_class     n
  <chr>     <int>
1 pole       1963
2 sapling     252
3 sawlog       76
4 seedling      0

Question 16: In 2007 there were 0 seedlings, 252 saplings, 1963 poles and 76 sawlogs recorded.

# Question 17: Compute the mean DBH (in cm) and standard deviation of DBH (in cm) for all trees in 2007. Explain the values you found and their statistical meaning.
tree_dat_exp %>% 
  filter(year == 2007) %>% 
  summarize(
    mean(DBH_cm),  # Calculate mean of DBH_cm
    sd(DBH_cm)     # Calculate stdev
  )
  mean(DBH_cm) sd(DBH_cm)
1     16.09351   6.138643

Question 17: The mean diameter at breast height was 16.094 cm and the mean standard deviation of diameter at breast height was 6.139. This means that the average diameter at breast height of all records form 2007 was ~16 cm and that 50% of the records are +/- ~6.1 cm of the average.

# Question 18: Compute the per species mean tree age using only those ages recorded in 2003. Identify the three species with the oldest mean age.

# Filter tree data for the year 2003
tree_dat_2003 <- tree_dat %>% 
  filter(year == 2003)
  # Compute mean age per species
  mean_species_age_2003 <- tree_dat_2003 %>% 
    group_by(species) %>% 
    summarize(mean_age_2003 = mean(age, na.rm = TRUE)) # Calculate mean age excluding null values.

# Sort by mean age in descending order and identify the three species with the oldest (highest) means.
oldest_3_species_2003 <- mean_species_age_2003 %>% 
    arrange(desc(mean_age_2003)) %>%
    head(3)

# Print results
print("Mean Age by Species:")
[1] "Mean Age by Species:"
mean_species_age_2003
# A tibble: 15 × 2
   species mean_age_2003
   <chr>           <dbl>
 1 ABBA             32.5
 2 ACRU             61.7
 3 ACSA             70.2
 4 BEPA             72.8
 5 FRNI             83.1
 6 LALA             71.5
 7 PIBA             38.5
 8 PIGL             42.3
 9 PIMA             64.1
10 PIRE             51.3
11 PIST             73.3
12 POGR             73.0
13 POTR             48.8
14 QURU             64.0
15 THOC            127. 
print("Oldest Species by Mean Age:")
[1] "Oldest Species by Mean Age:"
oldest_3_species_2003
# A tibble: 3 × 2
  species mean_age_2003
  <chr>           <dbl>
1 THOC            127. 
2 FRNI             83.1
3 PIST             73.3

Question 18: The three oldest species are Thuja occidentalis, Fraxinus nigra and Pinus strobus with mean ages of 126.6, 83.1 and 73.3 respectively.

#Question 19: In a single summarize call, find the number of unique years with records in the data set along with the first and last year recorded?
tree_dat %>% 
  summarize(
    unique_years = n_distinct(year),       # Count of unique years
    first_year = min(year, na.rm = TRUE),  # First year recorded
    last_year = max(year, na.rm = TRUE)    # Last year recorded
  )
  unique_years first_year last_year
1          111       1897      2007

Quesiton 19: There are 111 unique years with records starting in 1897 and ending in 2007.

# Question 20: Determine the stands with the largest number of unique years recorded. Report all stands with largest (or tied with the largest) temporal record.

# Identify the number of unique years per stand
stand_record_counts <- tree_dat %>%
  group_by(stand) %>% 
  summarize(unique_years = n_distinct(year, na.rm = TRUE)) # Count unique years per stand

# Find the max number of unique years
max_unique_years <- max(stand_record_counts$unique_years, na.rm = TRUE)

# Filter for stands with the max unique year counts
stands_with_max_years <- stand_record_counts %>% 
  filter(unique_years == max_unique_years)

# Print Result
print("Stands With Max Year Counts:")
[1] "Stands With Max Year Counts:"
stands_with_max_years
# A tibble: 5 × 2
  stand unique_years
  <chr>        <int>
1 A1             111
2 D1             111
3 D2             111
4 D3             111
5 F1             111

Question 20: The five following stands have data from all 111 years: A1, D1, D2, D2, F1.

# Final Question: Use a combination of dplyr verbs to compute these values and report the 3 species with the fastest growth, and the 3 species with the slowest growth. (** You will need to use either lag() or diff() in your compuation. You can learn more about each in the Help pages)

# Calculate annual growth rate
tree_growth <- tree_dat_exp %>% 
  arrange(species, treeID, year) %>%               # Set correct ordering
  group_by(species, treeID) %>% 
  mutate(growth_rate = DBH_cm - lag(DBH_cm)) %>%   # Caclulate growth rate
  ungroup()

# Compute mean growth rate per species
species_growth <- tree_growth %>% 
  group_by(species) %>% 
  summarize(mean_growth_rate = mean(growth_rate, na.rm = TRUE))

# Identify the 3 fastest/slowest growing species
fastest_species <- species_growth %>%    # 3 fastest growing species
  arrange(desc(mean_growth_rate)) %>% 
  head(3)

slowest_species <- species_growth %>%    # 3 slowest growing species
  arrange(mean_growth_rate) %>% 
  head(3)

# Print results
print("Fastest Growing SPecies:")
[1] "Fastest Growing SPecies:"
fastest_species
# A tibble: 3 × 2
  species mean_growth_rate
  <chr>              <dbl>
1 PIRE               0.358
2 POTR               0.331
3 PIBA               0.326
print("Slowest Growing Species:")
[1] "Slowest Growing Species:"
slowest_species
# A tibble: 3 × 2
  species mean_growth_rate
  <chr>              <dbl>
1 LALA               0.150
2 THOC               0.153
3 QURU               0.168

Question 21: The three fastest growing species are Pinus resinosa, Populus tremuloides and Pinus banksiana. The three slowest growing species are Larix laricina, Thuja occidentalis, Quercus rubra. The fastest growth rate was ~0.358 cm per year at breast height. The slowest growth rate was ~0.150 cm per year at breast height.