About

This website contains all the data and R-code necessary to replicate the results of the study: Air Quality Alerts and School Absences: Evidence from New York City.

Author

Luis Sarmiento

Published

November 15, 2022

Introduction

The website contains four sections:

  1. Data Sets – Includes a description of the data on school absences, the air quality index, demographic controls, and weather covariates.

  2. Descriptive Statistics – Descriptive statistics on school absences and the forecasts of the air quality index.

  3. Methodology –Describes the empirical strategy I use to estimate the effect of air quality health advisories on school absences in New York City. It includes four sections; ordinary least squares, instrumental variables, regression discontinuity, and synthetic DiD.

  4. Results – presents and discusses the main results of the research project.

Replication

All the raw files needed to run the code are in the dropbox repository

If you want to replicate the study:

Download all files of the dropbox repository while considering your computer space constraints.1

The files in the repository have a folder-like structure with the following key sections:

  1. 01_RawData – Includes the raw data obtained from the New York State Department of Education, the New York State Department of Environmental Conservation, The Environmental Protection Agency, and the USA Census Buro.

  2. 02_GenData – Contains the cleaned version of the raw data-files I use throughout the study.

  3. 03_scripts – Includes all the R-scripts I use to transform the raw data into a useful format, perform descriptive statistics, and run my empirical specifications. I divide this folder into six scripts:

  • 01_absences: Loads and clean the raw data on school absences
  • 02_aqi: Loads and clean the raw data on the forecast and real measures of the air quality index
  • 03_dem: Loads and clean demographic covariates obtained from the Census Buro.
  • 04_weather: Loads and clean the data on weather controls
  • 05_RegData: Constructs the data set for estimating the effect of air quality alerts on school absences with regression discontinuity and OLS estimators.
  • 06_rd: Contains the code to estimate the effect with regression discontinuity designs
  1. AQAs_School_Absences_NYC – This folder contains the Quarto files necessary to replicate this website

Raw data files

As previously mentioned, the folder 01_RawData contains all the raw files. It is divided into N different repositories

  • 01_schools includes all files related to school absences and school characteristics in New York City
  • 02_AQI has the data on the air quality index forecast from the NYSDEC and the AQI measures from the EPA
  • 03_weather contains several data files of weather covariates
  • 04_shp is a repository of shape (or spatial) files
  • 05_dem includes data on neighbourhood-level socio-demographic characteristics from the US Census Buro

Footnotes

  1. I recommend downloading the files to an external hard drive to avoid memory issues↩︎