GSBA 604

Empirical Research Methods

Homework




 
  1. Simple Regression Modeling & Graphical Methods & Research Proposal (due at 5pm of Jan. 18 in Alex Liu's mailbox at Bridge Hall 401):  (1) describe your data; (2) select 2 variables to construct a simple regression model: Y=ß01X, (3) use OLS to estimate ß0 and ß1 and ó2; (4) perform the following hypothesis test: NH: Y=ß0 vs. AH: Y=ß01X; (5) produce scatter plots of Y against ALL your potential predictors (independent variables); (6) choose a set of predictors to create a multiple regression model: Y=ß01X+... ßnXn - your initial model; (7) use OLS to estimate ß0 ... ßn and ó2; (8) write your first draft of the research proposal that include at least (a) description of your variables (b) hypotheses, (c) proposed model, (d) strategies to build your model..  [READ Chapter 26 in GHJ]

    Suggestions on your writing the first research report: (a) a description of theories supporting your hypotheses and a list of some reference articles/books; (b) a clear stating of your research goal (population, sample, hypotheses, ...); (c) a discussion of your selection of variables and why (a table listing all your variables AND their definitions / values is needed); (d) descriptive statistics of all your variables (this place allows us to look into St Dev of your key variables to ensure enough variances); (e) simple regression and relate it to your key hypothesis (see if the simplest method can satisfy your research goal). Usually NOT, so we go to the next step; (f) hypothesis testing and your first evaluation of the research question; (g) try various plots AND connect your plots to your reasons of selecting variables & models; (h) estimation of your initial multiple regression model and relate it to your research question AND compare to your simple regression model; (i) some discussion of your current results to open your next chapter.

  2. First Diagnostics of Your Model (Due 11am of Feb. 4): Use learned methods to detect outliers, non-constant variance and non-normality. Correct the problems if detected. Then, re-estimate your multiple regression model.

    When write your research report #2, please provide answers to the following questions: (a) Are you satisfied with your results from your last homework? (provide a short summary of your results also) (b) Do your results from first assignment support your hypotheses or theories so far? (c) Do all the assumptions of a regression model make sense to your theories or data? (d) What are your reasons in choosing your correction methods? (For example, why do you decide to delete all the outliers? Any studies of your outliers?) (e) After corrected the detected problems, do your results (OLS estimates) become more meaningful to your research goal?


  3. Variable Transformation & Variable Selection (Due 11am of Feb. 19): Use the learned methods to detect non-linearity and collinearity problems. Correct the problems if detected. Also use the learned dummy variable methods to handle your use of discrete variables if any. Finalize your selection of variables to improve your model. Please present your results in a professional way accepted in your field.

  4. Generalized Linear Regression (Due 11am of March 4): Use GLIM models to fit your data and improve your modeling. 

    Recode your dependent variable into 1 or 0 scale and perform a logistic regression. Carefully compare your new results to your previous results obtained from estimating the linear
    regression model. Specifically, use table to present the results and compare Wald test to T test, likelihood ratio test to F-test, pseudo R square to R square. Importantly, explain the B coefficient differences
    from these two models. Tell your readers if this comparison gives you anything interesting.

    Suggested format for writing your report: (1) summarize the results from your last report, (2) provide justification for your recoding the dependent variable into 0 or 1 scale as: (a) your dependent variable is a categorical one, (b) you want to know why certain units fall into the top 5 or top 10 group - say, why some companies have achieved exceptional performance, (3) use tables to summarize your logistic regression results and explain them, (4) use tables to summarize your comparison of the logistic results vs. linear model results, (5) tell if this comparison helped your research. Is this logistic regression model an improvement over your linear regression model OR a good complement to your linear regression model?


  5. Model Validation & Missing Values (Due 11am of March 25): Use the learned methods to handle missing values in your dataset. Re-estimate your final model and perform all the necessary diagnostics. Split your sample and conduct a cross validation of your final model. This is the time for you to revisit all your diagnostics methods used as a way of searching for your FINAL model.

    Suggested format for your writing: (1) summarize all the decisions you have made, (2) summarize the results from your last report, (3) report your work of studying and handling missing values, (4) compare your results before and after taking care of missing values, (5) report your model validation results and summarize its meaning to your research. 
    n


    n
  6. Final Diagnostics and Non-OLS Estimations (Due 1:30pm of April 8):
    nOPTION 1: Use S-plus and select one Non-OLS method (L1, LTS, LMS, Ridge, CART-Logit) to obtain estimations & apply cross-validation & compare with your last results. This may improve on outliers, collinearity, or model specification.
    nOPTION 2: Use SPSS to estimate coefficients of your final model. Then follow slide 8 & 9 from lecture 19 to calculate your optimal shrinkage estimations and SR estimations.
    nApply cross validation & compare with your last results (especially calibration plots).
    nSuggested format for your writing: 1, summarizing your last results, 2, identify one weakness, 3, describe your selected non-OLS method, 4, compare results, 5, summarize the improvements if any.

    n
  7. Summarizing Modeling Strategies & Modern Presentation/Interpretation Methods (Due 11am of April 22)

    A) Use PowerPoint to create six slides (6 ONLY) to form your presentation:
    n
    n1) research question (hypothesis, population, variables)
    n2) Findings – confirm or reject hypothesis
    n3) More Findings – relationship, your model, Bs, tests, C.I.
    n4) Why tests, CIs, Bs are meaningful? Diagnostics worked?
    n5) How good is your model – validation results
    n6) Conclusion – modeling strategies & great finding

    B) write a summary and discussion of your modeling strategies (NO more than 3 pages). Among variable selection or recoding, function selection (variable transformation and interactions), case handling (outliers) and estimation method selection, which decision has the most important impact on your results? You may use lecture 24 content to do a method comparison.


    n
  8. Final Research Paper (Due 3pm of May 7)
    Your Paper Should Include: nIntroduction (introducing your question)
    nLiterature Review (why your question is good)
    nOperationalization (concepts -> variables & data)
    nResearch Design (specification of your regression equation, diagnostics)
    nResults (presenting results with tables & graphs)
    nDiscussion (relating results to question & hypotheses)
    nConclusion (confirm or reject hypotheses)

    Please do your best to follow the AMR Style Guide to Authors.

Copyright @ 2001-2002  ResearchMethods.org