QUICK REVIEW

[论文解读] Estimating grouped data models with a binary dependent variable and fixed effects: What are the issues

Nathaniel Beck|arXiv (Cornell University)|Sep 18, 2018

Electoral Systems and Political Participation参考文献 8被引用 23

一句话总结

本文主张在社会科学数据中，对于具有固定效应的二值因变量模型，条件logit（CLOGIT）优于普通最小二乘法（OLS）和固定效应logit（FELOGIT）。研究表明，CLOGIT在小样本组或大量组的情况下能提供更精确的参数估计和边际效应估计；同时提出了一种受约束的FELOGIT估计器，通过将CLOGIT的估计结果作为先验信息，提升了OLS的效率与可靠性，尤其在边际效应估计方面表现更优。

ABSTRACT

This article deals with asimple issue: if we have grouped data with a binary dependent variable and want to include fixed effects (group specific intercepts) in the specification, is Ordinary Least Squares (OLS) in any way superior to a (conditional) logit form? In particular, what are the consequences of using OLS instead of a fixed effects logit model with respect to the latter dropping all units which show no variability in the dependent variable while the former allows for estimation using all units. First, we show that the discussion of fthe incidental parameters problem is based on an assumption about the kinds of data being studied; for what appears to be the common use of fixed effect models in political science the incidental parameters issue is illusory. Turning to linear models, we see that OLS yields a linear combination of the estimates for the units with and without variation in the dependent variable, and so the coefficient estimates must be carefully interpreted. The article then compares two methods of estimating logit models with fixed effects, and shows that the Chamberlain conditional logit is as good as or better than a logit analysis which simply includes group specific intercepts (even though the conditional logit technique was designed to deal with the incidental parameters problem!). Related to this, the article discusses the estimation of marginal effects using both OLS and logit. While it appears that a form of logit with fixed effects can be used to estimate marginal effects, this method can be improved by starting with conditional logit and then using the those parameter estimates to constrain the logit with fixed effects model. This method produces estimates of sample average marginal effects that are at least as good as OLS, and much better when group size is small or the number of groups is large. .

研究动机与目标

解决应用研究中关于在具有固定效应的二值因变量模型中应使用OLS、FELOGIT还是CLOGIT的困惑。
澄清在典型政治学数据中，由于组数固定而组内规模不一，
证明OLS在边际效应估计中效率低下且具有误导性，原因在于其依赖于可变与不可变组的线性组合。
提出并评估一种受约束的FELOGIT估计器，通过将CLOGIT估计结果作为先验信息，提升估计精度。
表明基于CLOGIT的边际效应估计比OLS更可靠，尤其在组规模较小或组数较多时。

提出的方法

通过在不同组规模和成功概率下进行模拟研究，比较OLS、FELOGIT（含虚拟变量组效应的logit）和CLOGIT（Chamberlain的条件logit）的表现。
通过均方根误差（RMSE）评估估计器的相对准确性，将OLS与受约束的FELOGIT与CLOGIT作为基准进行比较。
提出一种受约束的FELOGIT方法，其中系数估计值被限制在CLOGIT估计值范围内，从而提升效率并减少偏差。
使用OLS和受约束的FELOGIT方法计算样本平均边际效应，评估其可靠性和不确定性。
采用基于模拟的重抽样方法，校正受约束FELOGIT框架下边际效应估计的不确定性。
分析真实数据中大量组在二值结果上无变异的模式，表明OLS将这些组视为信息量充足，而实际上并非如此。

实验结果

研究问题

RQ1当许多组在因变量上无变异时，OLS是否是估计二值结果与固定效应模型的合理替代方法？
RQ2在组数固定的社会科学数据中，
RQ3CLOGIT在参数估计准确性方面与FELOGIT相比如何，特别是在小样本组或大量组的情况下？
RQ4能否通过将CLOGIT估计结果作为先验信息，构建一种受约束的FELOGIT估计器，从而在边际效应估计上优于OLS？
RQ5当组内无变异时，OLS是否是可靠的边际效应估计方法，还是会导致误导性推断？

主要发现

即使FELOGIT几乎无偏，CLOGIT在均方误差方面仍优于FELOGIT，原因在于其对充分统计量的条件化处理。
受约束的FELOGIT估计器（以CLOGIT估计作为先验）将边际效应的RMSE相比OLS降低最多达25%，尤其在组规模较小或组数较多时效果显著。
模拟结果显示，受约束的FELOGIT相比OLS效率提升25%，相当于OLS在效率上等价于丢弃约三分之一的数据。
在组概率集中在0.5的模拟中，受约束的FELOGIT估计器比OLS效率高8%，在高概率情况下效率最高可高出25%。
受约束FELOGIT的边际效应估计不确定性可通过重抽样方法良好近似，且该方法通常优于OLS，极少更差。
本文结论认为，除非存在严重的计算或内生性约束，研究者应优先采用受约束的FELOGIT方法进行边际效应估计，而非OLS。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。