Boston has one of the oldest and most developed public transportation systems in the United States, including subways, buses, ferries, and commuter rail. MBTA (Massachusetts Bay Transportation Authority) is the main public transportation system in the Boston area. As transit access continues to influence urban dynamics, recent trends show an increasing concentration of housing near transit lines, with a notable increase of rental prices over the last ten years. This raises an essential question for urban planners: Do households prefer to reside in neighborhoods with rich-transit access? To explore this, this policy brief conducts a Transit-Oriented Development (TOD) analysis of Boston, utilizing data from the Massachusetts Bay Transportation Authority (MBTA) subway stations alongside 5-year American Community Survey (ACS) data from 2012 and 2022. Our findings indicate a marked change over the decade, revealing that households are more willing to pay a premium for living close to transit from 2012 to 2022.

1 Data Wrangling Work

The spatial analysis began with data wrangling by using Boston’s subway arc and node data, which includes 5 lines and 153 stations, provided by MBTA, along with ACS data from 2012 and 2022 for 438 tracts within Suffolk County, provided by the U.S. Census Bureau. The result of data wrangling work is as follows:

a. load packages and functions

# Load Libraries

library(tidyverse)
library(tidycensus)
library(sf)
library(kableExtra)
library(mapview)
library(ggplot2)
library(viridis)
library(scales)

# Source of MultipleRingBuffer
source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

options(scipen=999)
options(tigris_class = "sf")

b. get 2012 and 2022 census data of Boston

census_api_key("e6ed37e145b901c8c418a193d5bc174948045488", overwrite = TRUE)


tracts12 <- 
  get_acs(geography = "tract", 
          variables = c("B01003_001E","B02001_002E",
                        "B15001_050E","B15001_009E",
                        "B19013_001E","B25058_001E",
                        "B06012_002E"), 
          year=2012, 
          state=25, 
          county=025, 
          geometry=TRUE, 
          output="wide") %>%
  st_transform('EPSG:2249') %>%
  subset(GEOID != "25025990101") %>%
  rename(TotalPop = B01003_001E, 
         Whites = B02001_002E,
         FemaleBachelors = B15001_050E, 
         MaleBachelors = B15001_009E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E) %>%
  dplyr::select(-NAME, -starts_with("B")) %>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop,0),
         pctBachelors = ifelse(TotalPop > 0, ((FemaleBachelors + MaleBachelors) / TotalPop),0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         year = "2012") %>%
  mutate(tractArea=st_area(geometry)) %>%
  mutate(tractpopDens=TotalPop*1000/units::drop_units(tractArea)) %>%
  dplyr::select(-Whites, -FemaleBachelors, -MaleBachelors, -TotalPoverty) 


tracts22 <- 
  get_acs(geography = "tract", 
          variables = c("B01003_001E","B02001_002E",
                        "B15001_050E","B15001_009E",
                        "B19013_001E","B25058_001E",
                        "B06012_002E"), 
          year=2022, 
          state=25, 
          county=025, 
          geometry=TRUE, 
          output="wide") %>%
  st_transform('EPSG:2249') %>%
  rename(TotalPop = B01003_001E, 
         Whites = B02001_002E,
         FemaleBachelors = B15001_050E, 
         MaleBachelors = B15001_009E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E) %>%
  dplyr::select(-NAME, -starts_with("B")) %>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop,0),
         pctBachelors = ifelse(TotalPop > 0, ((FemaleBachelors + MaleBachelors) / TotalPop),0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         year = "2022") %>%
  mutate(tractArea=st_area(geometry)) %>%
  mutate(tractpopDens=TotalPop*1000/units::drop_units(tractArea)) %>%
  dplyr::select(-Whites, -FemaleBachelors, -MaleBachelors, -TotalPoverty) 

allTracts <- rbind(tracts12,tracts22)

c. load Boston transit data

bostonRailRoutes <- st_read("https://opendata.arcgis.com/datasets/a5a1e75b2f194465890151ee9c3a1d97_1.geojson")
bostonRailStop <- st_read("https://opendata.arcgis.com/datasets/a5a1e75b2f194465890151ee9c3a1d97_0.geojson")

bostonRailRoutes <- bostonRailRoutes%>%
  dplyr::select(route_long_name,line_id)%>%
  st_transform(st_crs(tracts12))


bostonRailStop <- bostonRailStop%>%
  dplyr::select(stop_name)%>%
  st_transform(st_crs(tracts12))

In this study, TOD area is defined as the total area of tracts whose centroids are located in 0.5 miles of each transit station. The following figure shows the buffer within 0.5 mile of each transit station. The color of line represents each service line, and black dots denote transfer stations.

d. create 0.5-mile buffer for each transit station

stopBuffer <- st_buffer(bostonRailStop, 2640)

stopUnion <- st_union(stopBuffer)

MBTABuffers <- 
  rbind(
     stopBuffer %>%
      mutate(Legend = "Buffer") %>%
      dplyr::select(Legend),
     stopUnion %>%
      st_sf() %>%
      mutate(Legend = "Unioned Buffer"))

buffer <- filter(MBTABuffers, Legend=="Unioned Buffer")
ggplot() +
  geom_sf(data=st_union(tracts22), fill = "transparent")+
  geom_sf(data=buffer, fill = "gray", color = "gray", alpha =0.2)+
  geom_sf(data = bostonRailRoutes,
          aes(color = line_id),
          show.legend = "line",
          size = 3)+
  scale_color_manual(values = c("#43a2ca","#31a514","#8856a7","#fe9929","#e31a1c"))+
  geom_sf(data=bostonRailStop, show.legend = "point", size = 0.5) +
  facet_wrap(~Legend) + 
  labs(title="Buffer within 0.5 mile of Each Transit Station", 
       caption="Figure 1",
       color="Line") +
  mapTheme()

e. substract TOD and Non-TOD tracts of Boston

selectCentroids12 <-
  st_centroid(tracts12)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts12, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(TotalPop) %>%
  mutate(Selection_Type = "Select by Centroids")

selectCentroids22 <-
  st_centroid(tracts22)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts22, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(TotalPop) %>%
  mutate(Selection_Type = "Select by Centroids")


selectCentroids12_union <- st_union(selectCentroids12)%>%
  st_sf() %>%
  mutate(year = "2012")
selectCentroids22_union <- st_union(selectCentroids22)%>%
  st_sf() %>%
  mutate(year = "2022")
intersections1222_union <- rbind(selectCentroids22_union, selectCentroids12_union)
allTracts.group <- 
  rbind(
    st_centroid(allTracts)[buffer,] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),
    st_centroid(allTracts)[buffer, op = st_disjoint] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>%
  mutate(MedRent.inf = ifelse(year == "2012", MedRent * 1.24, MedRent)) 

2 Indicator Analysis

a. Four small-multiple (2012 & 2022+) visualizations comparing four selected Census variables across time and space (TOD vs. non-TOD)

Here we choose 4 indicators - population density, median monthly rent, percentage of poverty and percentage of bachelors in Boston and analyze their spatial distribution in TOD and non-TOD areas in 2012 and 2022.

Population Density

ggplot() +
  geom_sf(data=allTracts.group, aes(fill = tractpopDens)) +
  geom_sf(data=intersections1222_union, fill='transparent', color = '#b30000', lwd = 0.7)+
  facet_wrap(~year)+
  scale_fill_continuous(low = "#edf8fb", high="#006d2c",name = "population density \n(thousand ppl per square foot)") +
  labs(
    title = "Population Density by Tract, 2012-2022",
    caption = "Figure 2a") +
  theme_void()

Figure 2a shows the population density of Boston per tract in 2012 and 2022. The darker color symbolizes higher density. And the area within red polygon represents TOD area. From the picture, We can see that the population density in TOD areas is much higher than that in non-TOD areas, which indicates people’s demand for housing is higher in the TOD area.

Median Monthly Rent

ggplot() +
  geom_sf(data=allTracts.group, aes(fill = MedRent.inf)) +
  geom_sf(data=intersections1222_union, fill='transparent', color = '#b30000', lwd = 0.7)+
  facet_wrap(~year)+
  scale_fill_continuous(low = "#eff3ff", high="#08519c",name = "Median Monthly Rent") +
  labs(
    title = "Median Monthly Rent by Tract, 2012-2022",
    caption = "Figure 2b") +
  theme_void()

Figure 2b shows the median monthly rent of Boston per tract in 2012 and 2022. The darker color symbolizes higher rent. And the area within red polygon represents TOD area. From the picture, We can see that the rent in 2022 is higher than that in 2012.color in TOD areas is darker than that in non-TOD areas, which indicates a higher rent in TOD area. The rent spatial pattern shows the impact of transit on house marketing.

Percentage of Poverty

ggplot() +
  geom_sf(data=allTracts.group, aes(fill = pctPoverty)) +
  geom_sf(data=intersections1222_union, fill='transparent', color = '#b30000', lwd = 0.7)+
  facet_wrap(~year)+
  scale_fill_continuous(low ="#fee5d9", high="#a50f15",name = "percentage") +
  labs(
    title = "Propercentage of poverty by Tract, 2012-2022",
    caption = "Figure 2c") +
  theme_void()

Figure 2c shows the percentage of poverty of Boston per tract in 2012 and 2022. The darker color symbolizes higher value.The proportion of poverty in some tracts near subway stations in TOD areas is higher than the rest of the areas. There was no notable difference in Poverty between 2012 and 2022 in other regions.

Percentage of Bachelors

ggplot() +
  geom_sf(data=allTracts.group, aes(fill = pctBachelors)) +
  geom_sf(data=intersections1222_union, fill='transparent', color = '#b30000', lwd = 0.7)+
  facet_wrap(~year)+
  scale_fill_continuous(low = "#f2f0f7", high="#54278f",name = "percentage") +
  labs(
    title = "percentage of bachelors by Tract, 2012-2022",
    caption = "Figure 2d") +
  theme_void()

Figure 2d shows the percentage of bachelors of Boston per tract in 2012 and 2022. The darker color symbolizes higher value.The percentage of bachelors in some tracts in TOD areas is 15-20%, which is higher than the proportion of bachelors in the non-TOD areas, which is less than 5%. This indicates a cluster of high-education people in TOD areas.

2 Bar Chart Analysis

allTracts.Summary <- 
  st_drop_geometry(allTracts.group) %>%
  group_by(year, TOD) %>%
  summarize(Median_Monthly_Rent = mean(MedRent, na.rm = T),
            Population_Density = mean(tractpopDens, na.rm = T),
            Percent_Bachelor = mean(pctBachelors, na.rm = T),
            Percent_Poverty = mean(pctPoverty, na.rm = T))

allTracts.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  mutate(Variable = factor(Variable, levels = c("Population_Density", "Median_Monthly_Rent", "Percent_Bachelor", "Percent_Poverty"))) %>%
  ggplot(aes(year, Value, fill = TOD)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Variable, scales = "free", ncol=5) +
  scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
  labs(title = "Indicator differences across time and space") +
  theme(legend.position="bottom")

This grouped bar chart provides a comprehensive view of key demographic and economic changes in Boston’s TOD and Non-TOD areas from 2012 to 2022.

We can see all the four indicators are higher in TOD area than non-TOD areas in both 2012 and 2022. Specifically, population density and bachelor percentage of TOD areas are obviously much higher that that of non-TOD areas.

From the perspective of changes between 2012 and 2022, there is a decline of poverty percentage. The population density and bachelor percentage has only slight change by year. In contrast, the rent increased notably from 2012 to 2022.

3 Table Analysis

allTracts.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, Value, -year.TOD) %>%
  mutate(Variable = factor(Variable, levels = c("Population_Density", "Median_Monthly_Rent", "Percent_White", "Percent_Bachelor", "Percent_Poverty")))%>%
  mutate(Value = round(Value, 2)) %>%
  spread(year.TOD, Value) %>%
  
  kable(caption = "Transit Indicator Table in 2012 and 2022") %>%
  kable_styling() %>%
  footnote(general_title = "\n",
           general = "Table 1")
Transit Indicator Table in 2012 and 2022
Variable 2012: Non-TOD 2012: TOD 2022: Non-TOD 2022: TOD
Population_Density 0.51 1.02 0.56 1.05
Median_Monthly_Rent 1026.74 1192.21 1632.97 1976.48
Percent_Bachelor 0.02 0.05 0.02 0.04
Percent_Poverty 0.18 0.21 0.15 0.17

Table 1

This table compares the specific values of the four indicators mentioned above between TOD and non-TOD area in 2012 and 2022. From the table, we can see that:

Population density in TOD areas rose by about 0.1, compared to a 0.03 raise in non-TOD areas, suggesting demographic displacement in transit-accessible neighborhoods.

In TOD areas, median monthly rents rose by approximately 66% from 2012 to 2022, reflecting increased demand and appeal near transit stations.

Percentage of Bachelors are consistently higher those in non-TOD areas, highlighting a trend of educational clustering around transit hubs.

Poverty rates in TOD areas slightly decreased, but they remained 13% higher than in non-TOD areas by 2022, indicating ongoing socioeconomic disparities.

4 Graduated symbol maps of population and rent within 0.5 mile of each transit station

a. Population graduated symbol map

tract_centorids = st_centroid(allTracts.group)

tracts_in_buffer = st_join(stopBuffer,tract_centorids,join=st_intersects)
stopBuffer = st_intersection(stopBuffer, st_geometry(st_union(allTracts.group)))
buffer_population = tracts_in_buffer %>%
  group_by(stop_name,year) %>%
  summarise(totpop.buffer = sum(TotalPop, na.rm = TRUE))%>%
  na.omit()

ggplot()+
  geom_sf(data = allTracts,fill = NA, color = "gray")+
  geom_point(data = buffer_population,
             alpha = 0.3,
             aes(size = totpop.buffer,
                 color = totpop.buffer,
                 geometry = geometry),
             stat = "sf_coordinates")+
  facet_wrap(~year)+
  scale_color_viridis(option="A", labels = comma)+
  scale_size_continuous(range=c(0.8,6), labels = comma)+
  labs(title="Population within 0.5 mile of Each Transit Stn", 
       subtitle="Boston (Suffolk County), MA", 
       caption="Figure 3a") +
  guides(color = guide_legend("Population"), size = guide_legend("Population")) +
  theme_void()

Figure 3a shows the total population within a 0.5-mile radius of each transit station in Boston, while larger and brighter circles indicating a larger population. We can see that the distribution in 2012 and 2022 are similar. In the central region, the color is brightest, representing the largest population.

b. Median Monthly Rent graduated symbol map

buffer_rent = tracts_in_buffer %>%
  group_by(stop_name,year) %>%
  summarise(rent.buffer = mean(MedRent.inf, na.rm = TRUE))%>%
  na.omit()
## `summarise()` has grouped output by 'stop_name'. You can override using the
## `.groups` argument.
ggplot()+
  geom_sf(data = allTracts,fill = NA, color = "gray")+
  geom_point(data = buffer_rent,
             alpha = 0.3,
             aes(size = rent.buffer,
                 color = rent.buffer,
                 geometry = geometry),
             stat = "sf_coordinates")+
  facet_wrap(~year)+
  scale_color_viridis(option="D", labels = comma)+
  scale_size_continuous(range=c(0.8,6), labels = comma)+
  labs(title="Median Rent within 0.5 mile of Each Transit Stn", 
       subtitle="Boston (Suffolk County), MA", 
       caption="Figure 3b") +
  guides(color = guide_legend("Median Monthly Rent ($)"), size = guide_legend("Median Monthly Rent ($)")) +
  theme_void()

Figure 3b shows the average median monthly rent within a 0.5-mile radius of each transit station in Boston, while larger circles indicating a larger population. We can see that the overall distribution pattern in 2012 and 2022 are similar, while rents in 2022 are much higher than that in 2012, indicating the economic impact of transit accessibility on local housing markets.

5 Relationship between Average Median Monthly Rent and Distance to Transit Stations

MBTA_MRB <- multipleRingBuffer(st_union(bostonRailStop), 34320, 2640)

allTracts.rings <-
  st_join(st_centroid(dplyr::select(allTracts, GEOID, year)),
          MBTA_MRB) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(allTracts.group, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) #convert to miles

# Group by distance and year, then average the total rent
rent_summary <- allTracts.rings %>%
  group_by(distance, year) %>%
  summarize(total_rent = mean(MedRent, na.rm = TRUE))

# Plot distance vs average rents
ggplot(rent_summary, aes(x = distance, y = total_rent, color = factor(year))) +
  geom_line() +  # Create a line plot
  scale_x_continuous(limits = c(0.5, 3.6), breaks = seq(0.5, 3.5, by = 0.5)) +
  scale_y_continuous(limits = c(800, 2000), breaks = seq(800, 2000, by = 200)) +
  labs(title = "Average Median Monthly Rent by Distance from MBTA Stops",
       x = "Distance from Transit Station (miles)",
       y = "Average Median Monthly Rent ($)",
       color = "Year") +
  theme_minimal()

This line graph illustrates the relationship between average median monthly rent and distance from the nearest subway station in Boston. The data reveals that average rents within 0.5 miles of subway stations are obviously higher in 2022 than in 2012, indicating a growing premium for transit proximity. The average rent decreases with increasing distance from subway stations in general both in 2012 and 2022. However, beyond 1.5 miles rents stabilize, suggesting that the impact of transit access on rent lessens at greater distances. Overall, the graph highlights the rising value of locations near subway stations over the decade and underscores how transit accessibility affects housing costs.

Conclusion

The analysis indicates that households value transit-rich neighborhoods, based on the evidence of rising population density, rents and educational clustering near transit stations. Specifically, within a 0.5-mile radius of subway stations, rents are much higher compared to areas farther away, reflecting a clear willingness to pay a premium for transit access. This trend highlights the importance of transit in shaping residential preferences.

However, this demand varies across the region. Central and well-connected transit stations has a higher demand, while this influence decreases with distance. These findings highlight the necessity for policies that encourage the growth of transit-oriented developments in high-demand areas to improve housing affordability and ensure fair access to transit benefits.