Sample Midterm 1

Instructions

  • This exam consists of three questions. Answer all parts of each question.

  • Justify your answers with clear arguments, equations, and diagrams where applicable.

  • Write your answers legibly. Illegible answers will not be graded.

Question 1: Instrumental Variables (IV)

Study Overview and Key Variables

This exam covers the influential study by Acemoglu, Johnson, and Robinson (2001), which examines the impact of colonial-era institutions on modern economic performance. The authors use an instrumental variable approach, with settler mortality rates serving as an instrument for measuring the quality of institutions.

Key Variables

  • Dependent Variable: GDP per capita - Represents economic output per individual and serves as a measure of economic performance.

  • Independent Variable: Institution Quality - Assessed by indicators such as protection against expropriation and respect for property rights.

  • Instrumental Variable: Settler Mortality - Historical data on settler mortality rates used as an instrument for the exogeneity of institution quality.

Exam Questions

Question 1

Using the OLS regression result:

GDP  per capita = 4.65 + 0.52 ⋅Institution Quality + 𝜖

explain what the coefficient of Institution Quality implies about its impact on GDP per capita.

Question 2

Discuss why Settler Mortality is used as an instrumental variable for Institution Quality in the study. Refer to the first-stage regression equation:

Institution Quality = 9.34 − 0.61⋅Settler Mortality + u

Question 3

Interpret the second-stage regression of the 2SLS estimation:

GDP  per capita = 1.91 + 0.94⋅Institution Quality+ ν

What does this tell us about the causal impact of institutions on economic performance compared to the OLS estimate?

Question 4

Illustrate, with a diagram, the causal pathway posited by Acemoglu, Johnson, and Robinson (2001) between settler mortality, institution quality, and GDP per capita. How does this pathway support the study’s conclusions?

Question 5

Critically evaluate the methodology of using historical settler mortality as an instrument for institution quality. What are the potential limitations of this approach, and how might they affect the study’s conclusions?

Question 2: Difference-in-Differences (DiD)

Study Overview and Key Variables

This problem set covers the study by Diamond, McQuade, and Qian (2019), which analyzes the impact of the 1994 rent control expansion in San Francisco on tenants, landlords, and the overall housing market. The study exploits a natural experiment created by a ballot initiative that extended rent control to certain multi-family buildings.

Key Variables

  • Dependent Variables: Tenant mobility, landlord responses such as building conversions, and changes in the housing market.

  • Independent Variable: The status of rent control, determined by whether the building fell under the 1994 expansion based on its construction date.

  • Difference-in-Differences Design: The analysis compares outcomes before and after the 1994 law change, between buildings affected by the law and those that were not.

Problem Set Questions

Question 1

Given the DiD estimation equation used in the study:

Yist = δ+ τDist + γXist + 𝜀ist

where Y ist represents tenant mobility or landlord responses, Dist is a dummy variable indicating treatment (affected by rent control), and Xist includes other control variables. Explain how to interpret the coefficient τ.

Question 2

Discuss the selection of buildings constructed just before and after the 1980 cutoff as a natural experiment for studying rent control’s impact. Why does this approach help address potential endogeneity issues?

Question 3

Interpret the findings from the study regarding tenant mobility and landlord responses to the rent control expansion. What do these results suggest about the short-term benefits and long-term consequences of rent control?

Question 4

Using a diagram, illustrate the concept of a difference-in-differences analysis with respect to the rent control study. Show how the comparison of changes over time between the treatment and control groups helps isolate the impact of rent control.

Question 5

Critically assess the potential limitations of using a DiD approach in the context of rent control’s impact on San Francisco’s housing market. Consider factors such as changes in the external housing market, the role of other housing policies, and potential spillover effects.

Question 3: Prompt Engineering

This question explores the role of prompt engineering in leveraging ChatGPT for academic research, specifically in the context of replicating empirical papers on causal estimation techniques. Consider how you use ChatGPT to comprehend research articles, develop code for data analysis, and refine this code to mirror the empirical findings of a study. Additionally, reflect on your participation in preparing a group presentation on causal analysis.

Reading Journal Articles

Identify the title of the research paper you are examining for your presentation on causal estimation techniques. Describe which chatbots (e.g., Data Science Class for Economic and Social Issues) you are employing and the specific prompts you utilize for different purposes: comprehending the paper, generating code for data visualizations and summary statistics, and estimating the econometric model. Indicate if different chatbots serve distinct functions in your workflow.

Validating the Code

Explain the process of validating the code produced by ChatGPT. Were adjustments to the prompts or the code itself necessary to accurately replicate the study’s primary findings? Provide a detailed account of how you managed to replicate the paper’s results. Mention whether you utilize platforms such as Google Colab or Visual Studio Code for coding and if GitHub Desktop is used for code versioning and sharing on GitHub. Include your GitHub account link or username.

Contribution to Causal Presentation

Elaborate on your contributions to the group presentation focused on causal analysis. Discuss whether your role involves assisting with prompt engineering, code validation, or both. Describe how you support your peers in these tasks and any strategies you use to enhance the collaborative effort.

Applying Machine Learning Methods

Discuss any machine learning methods that could be applied to enhance the estimation accuracy and results of the paper you are working on. Consider techniques such as regularization, ensemble methods, or deep learning approaches that might offer novel insights or more robust findings compared to traditional econometric models. How might these methods be integrated into your analysis, and what potential improvements do you anticipate they could bring? Reflect on the specific steps or modifications required to implement these machine learning methods within your project’s workflow.

Rubric for Question 3: Prompt Engineering

Total Points: 8

1. Reading Journal Articles (2 Points)

Identification and Utilization of Resources (0.5 Points): Clear identification of the research paper and chatbots used for different project aspects. Demonstrates understanding of each chatbot’s contribution. Prompt Design and Application (0.5 Points): Effective description of the prompts for paper comprehension, code generation, and econometric analysis. Workflow Integration (0.5 Points): Explanation of how different chatbots serve distinct functions within the research workflow. Clarity and Detail (0.5 Points): Provides detailed descriptions showing thorough engagement with the tools.

2. Validating the Code (2 Points)

Validation Process (0.5 Points): Describes steps for code validation produced by ChatGPT. Adjustments and Refinements (0.5 Points): Details specific adjustments to prompts or code to accurately replicate study findings. Tool Utilization (0.5 Points): Mentions the use of development platforms like Google Colab or Visual Studio Code, and GitHub for code sharing. Documentation and Sharing (0.5 Points): Provides GitHub account link or username for project access.

3. Contribution to Causal Presentation (2 Points)

Role and Responsibilities (0.5 Points): Outlines individual roles in the group presentation. Support and Collaboration (0.5 Points): Describes support provided to peers and strategies to enhance teamwork. Presentation Development (0.5 Points): Reflects on integrating research findings into the presentation. Engagement and Contribution (0.5 Points): Demonstrates significant contributions to the group’s objectives.

4. Applying Machine Learning Methods (2 Points)

Innovation in Methods (0.5 Points): Discusses machine learning methods to enhance study accuracy. Integration Strategy (0.5 Points): Explains integration of new methods into the analysis. Anticipated Improvements (0.5 Points): Reflects on potential improvements from machine learning methods. Implementation Steps (0.5 Points): Details steps required to implement these methods.


Sample Midterm 2

Instructions

  • This exam consists of three questions. Answer all parts of each question.

  • Justify your answers with clear arguments, equations, and diagrams where applicable.

  • Write your answers legibly. Illegible answers will not be graded.

Question 1: Machine Learning Enhancements in IV Analysis

  1. Theoretical Foundations: Explain the limitations of traditional Instrumental Variables (IV) analysis in handling complex econometric data. Discuss how the integration of machine learning techniques, specifically Lasso and Double Lasso Regression, addresses these limitations. Highlight the theoretical underpinnings that make these methods suitable for enhancing IV analysis.

  2. Lasso Regression in Econometrics: Lasso Regression has emerged as a pivotal technique in econometric analysis.

    1. Describe the principle behind Lasso Regression and its contribution to variable selection and regularization in the context of high-dimensional data. Include the Lasso Regression formula and explain each component.

    2. Discuss how Lasso Regression can be utilized to improve the robustness and accuracy of models in historical econometric studies, referencing Nathan Nunn’s 2008 study on the economic impacts of Africa’s slave trades as an example.

  3. Double Lasso Regression for Causal Inference: Double Lasso Regression enhances causal inference by addressing endogeneity through a two-step variable selection process.

    1. Outline the steps involved in Double Lasso Regression. Provide the mathematical formulation for each step and describe the role of each variable and parameter in the process.

    2. Explain how Double Lasso Regression can be applied to historical econometric data to uncover causal relationships, using practical examples where appropriate.

  4. Causal Forests in Econometric Analysis: Causal Forests extend the capabilities of Random Forest algorithms to address causal inference, offering a powerful tool for non-linear econometric analysis.

    1. Describe the concept of Causal Forests and how they differ from traditional Random Forest algorithms in the context of econometric analysis.

    2. Provide an example of how Causal Forests could be applied to analyze the heterogeneous effects of historical events on economic outcomes, drawing parallels to studies like that of Nathan Nunn.

Question 2: Long Short-Term Memory (LSTM) Networks

  1. Introduction to LSTM: Discuss traditional RNNs’ limitations in processing sequential data and how LSTM networks address these by handling long-term dependencies effectively.

  2. LSTM Architecture: Explain the LSTM unit architecture, focusing on the cell state and the roles of the input, forget, and output gates. Detail how these components interact to maintain information over sequences.

    Input Gate:

    it = σ(Wiixt + bii + Whiht−1 +bhi)
    (1)

    Variables:

    • it: Input gate’s activation vector at time t.

    • xt: Input vector at time step t.

    • ht1: Output vector from the previous time step t 1.

    • Wii,Whi: Weight matrices for the input vector xt and output vector ht1.

    • bii,bhi: Bias terms for the input gate.

    • σ: Sigmoid function.

    Forget Gate:

    ft = σ(Wifxt + bif + Whf ht−1 + bhf)
    (2)

    Variables:

    • ft: Forget gate’s activation vector at time t.

    • Wif,Whf: Weight matrices for the input vector xt and output vector ht1.

    • bif,bhf: Bias terms for the forget gate.

    Output Gate:

    ot = σ(Wioxt + bio + Whoht− 1 + bho)
    (3)

    Variables:

    • ot: Output gate’s activation vector at time t.

    • Wio,Who: Weight matrices for the input vector xt and output vector ht1.

    • bio,bho: Bias terms for the output gate.

  3. Activation Functions in LSTM: Discuss the sigmoid (σ) and hyperbolic tangent (tanh) activation functions’ roles in LSTM operations. Include their equations and effects on gate operations and cell state updates.

  4. Application in Predicting Stock Market Returns: Illustrate how LSTM architecture is particularly suitable for predicting stock market returns, emphasizing its advantages in modeling financial time series data.

Question 3: Python Data Science in Economics and Finance

  1. Downloading Financial Data using yfinance: Analyze “AAPL” stock performance over the last five years.

    1. Write Python code using the yfinance library to download daily stock prices of “AAPL” for the past five years.

    2. Write a brief ChatGPT prompt to request guidance or explanation on how to use the yfinance library for downloading stock data.

  2. Data Manipulation and Conversion: Given economic_data.csv with columns for Country, Year, GDP, InflationRate, and UnemploymentRate.

    1. Use pandas to read the CSV file and fill missing values with the column means.

    2. Write a brief ChatGPT prompt to request assistance on handling missing data in pandas.

  3. Visualization and Summary Statistics: You have GDP growth rates for various countries.

    1. Create a line plot for “Country A” and “Country B” GDP growth over the last decade using matplotlib or seaborn.

    2. Calculate and display the mean, median, and standard deviation of GDP growth rates.

    3. Write a brief ChatGPT prompt to request assistance on creating visualizations with matplotlib or seaborn.

  4. Estimating Econometric Models: Use panel_data.csv with variables: Country, Year, PolicyChange, OutcomeVariable.

    1. Describe an IV approach to estimate the impact of PolicyChange on OutcomeVariable using Python.

    2. Explain estimating a Difference-in-Differences (DiD) model, identifying treatment and control groups.

    3. Write a brief ChatGPT prompt for guidance on implementing IV and DiD models in Python.

  5. Estimating an LSTM Model to Predict Stock Returns: Predict “AAPL” future stock returns using historical prices.

    1. Outline preprocessing steps for LSTM modeling, mentioning specific transformations or scaling.

    2. Provide an overview of setting up and training an LSTM model with TensorFlow and Keras, including model architecture and techniques.

    3. Write a brief ChatGPT prompt for guidance on LSTM model development and training with TensorFlow and Keras.