Research talk:Reading time/Work log/2018-11-18
Sunday, November 18, 2018
editWrapping up Robustness Checks for Model 3
editHow sensitive was the analysis of HDI to the choice of development instead of other variables like education levels? I fit the models from yesterday but used the variable "mean years of schooling" instead of HDI. I chose years of schooling because there was actually a lot of missing data for literacy in the UN dataset. This variable is the average years spent in school by the adult population of a country. Our results have roughly the same interpretation as HDI. It's important to note that years of schooling is a component of HDI, so this result isn't that surprising. I standardized MeanSchooling before I added it to the model.
model 2 | model 3 | model 3 with quadratic MS | model 3 with cubic MS | model 3 with quartic MS | model 3 with quadratic MS:mobile | ||
---|---|---|---|---|---|---|---|
Intercept | 9.9485 (0.0077)*** | 10.0198 (0.0078)*** | 10.0304 (0.0078)*** | 10.0324 (0.0078)*** | 10.0469 (0.0078)*** | 10.0511 (0.0078)*** | |
mobile | -0.1985 (0.0011)*** | -0.3101 (0.0017)*** | -0.3001 (0.0017)*** | -0.2999 (0.0017)*** | -0.2939 (0.0017)*** | -0.3026 (0.0019)*** | |
Human Development Index | -0.0733 (0.0007)*** | -0.1431 (0.0011)*** | -0.0965 (0.0017)*** | -0.0992 (0.0018)*** | -0.0398 (0.0023)*** | -0.0239 (0.0026)*** | |
mobile : MeanSchooling | 0.1252 (0.0015)*** | 0.1141 (0.0015)*** | 0.1140 (0.0015)*** | 0.1099 (0.0015)*** | 0.0836 (0.0027)*** | ||
mobile : MeanSchooling^2 | 0.0248 (0.0021)*** | ||||||
MeanSchooling^2 | -0.0394 (0.0010)*** | -0.0447 (0.0018)*** | -0.1211 (0.0025)*** | -0.1372 (0.0028)*** | |||
MeanSchooling^3 | 0.0042 (0.0011)*** | -0.0438 (0.0016)*** | -0.0429 (0.0016)*** | ||||
MeanSchooling^4 | 0.0440 (0.0010)*** | 0.0444 (0.0010)*** | |||||
Revision length (bytes) | 0.1677 (0.0005)*** | 0.1673 (0.0005)*** | 0.1677 (0.0005)*** | 0.1677 (0.0005)*** | 0.1684 (0.0005)*** | 0.1685 (0.0005)*** | |
time to first paint | -0.0157 (0.0006)*** | -0.0157 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | |
time to dom interactive | 0.0036 (0.0009)*** | 0.0037 (0.0009)*** | 0.0036 (0.0009)*** | 0.0036 (0.0009)*** | 0.0036 (0.0009)*** | 0.0036 (0.0009)*** | |
sessionlength | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | |
lastinsessionTRUE | 0.6190 (0.0011)*** | 0.6161 (0.0011)*** | 0.6152 (0.0011)*** | 0.6152 (0.0011)*** | 0.6152 (0.0011)*** | 0.6152 (0.0011)*** | |
nthinsession | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | |
dayofweekMon | 0.0952 (0.0020)*** | 0.0951 (0.0020)*** | 0.0950 (0.0020)*** | 0.0949 (0.0020)*** | 0.0943 (0.0020)*** | 0.0942 (0.0020)*** | |
dayofweekSat | 0.0087 (0.0020)*** | 0.0060 (0.0020)** | 0.0057 (0.0020)** | 0.0057 (0.0020)** | 0.0058 (0.0020)** | 0.0056 (0.0020)** | |
dayofweekSun | 0.0271 (0.0020)*** | 0.0252 (0.0020)*** | 0.0244 (0.0020)*** | 0.0244 (0.0020)*** | 0.0246 (0.0020)*** | 0.0244 (0.0020)*** | |
dayofweekThu | 0.0487 (0.0020)*** | 0.0484 (0.0020)*** | 0.0482 (0.0020)*** | 0.0482 (0.0020)*** | 0.0479 (0.0020)*** | 0.0478 (0.0020)*** | |
dayofweekTue | 0.0283 (0.0020)*** | 0.0286 (0.0020)*** | 0.0286 (0.0020)*** | 0.0286 (0.0020)*** | 0.0278 (0.0020)*** | 0.0277 (0.0020)*** | |
dayofweekWed | 0.0674 (0.0020)*** | 0.0671 (0.0020)*** | 0.0669 (0.0020)*** | 0.0669 (0.0020)*** | 0.0664 (0.0020)*** | 0.0663 (0.0020)*** | |
usermonth4 | 0.0099 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | |
usermonth5 | 0.0123 (0.0096) | 0.0116 (0.0096) | 0.0112 (0.0096) | 0.0112 (0.0096) | 0.0111 (0.0096) | 0.0110 (0.0096) | |
usermonth6 | -0.0075 (0.0099) | -0.0080 (0.0098) | -0.0083 (0.0098) | -0.0083 (0.0098) | -0.0084 (0.0098) | -0.0086 (0.0098) | |
usermonth7 | -0.0463 (0.0098)*** | -0.0469 (0.0098)*** | -0.0465 (0.0098)*** | -0.0465 (0.0098)*** | -0.0462 (0.0098)*** | -0.0463 (0.0098)*** | |
usermonth8 | -0.0105 (0.0098) | -0.0111 (0.0098) | -0.0110 (0.0098) | -0.0111 (0.0098) | -0.0108 (0.0098) | -0.0108 (0.0098) | |
usermonth9 | 0.0421 (0.0077)*** | 0.0426 (0.0077)*** | 0.0426 (0.0077)*** | 0.0426 (0.0077)*** | 0.0427 (0.0077)*** | 0.0426 (0.0077)*** | |
usermonth10 | -0.0012 (0.0076) | -0.0035 (0.0076) | -0.0039 (0.0076) | -0.0038 (0.0076) | -0.0022 (0.0076) | -0.0022 (0.0076) | |
R2 | 0.0520 | 0.0527 | 0.0528 | 0.0528 | 0.0530 | 0.0530 | |
Adj. R2 | 0.0520 | 0.0526 | 0.0528 | 0.0528 | 0.0530 | 0.0530 | |
Num. obs. | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | |
RMSE | 14.3861 | 14.3810 | 14.3799 | 14.3799 | 14.3785 | 14.3784 | |
***p < 0.001, **p < 0.01, *p < 0.05 |
So we can see clearly by comparing the two plots that our substantive results do not seem to depend on selection of HDI instead of Education level. The only difference is that maybe in the highest education contexts people read for somewhat longer, but my intuition is that this is an artifact of using a high-degree polynomial which may be overfitting. As with HDI, out-of-sample predictive performance improves as we increase the degree of the polynomial of MeanSchooling, but increasing the order of the interaction term hurts out-of-sample predictive performance.
Rmse | Rsqr | name |
---|---|---|
1.677632 | 0.0536712 | model 2 |
1.678465 | 0.0570225 | model 3 |
1.679589 | 0.0563081 | model 3 with quadratic MS |
1.679736 | 0.0562688 | model 3 with cubic MS |
1.679118 | 0.0579871 | model 3 with quartic MS |
1.679154 | 0.0576906 | model 3 with quadratic MS:mobile |
Model Diagnostics: Residual Plots
editAdding mobile:lastinsession to Model 3
editAdding this variable to the table improves the model's fit and predictive performance. More importantly, it leads to a quite striking change in conclusions. We now find that, accounting for the terms in the model:
- Mobile readers dwell for longer on average.
- This is true in countries that are more developed than average.
- However, in less developed countries, the "device gap" reemerges and mobile readers have shorter dwell times than mobile readers on average.
model 2 | model 3 | model 3 with quadratic HDI | model 3 with cubic HDI | model 3 with quartic HDI | model 3 with quadratic HDI:mobile | ||
---|---|---|---|---|---|---|---|
Intercept | 9.7900 (0.0077)*** | 9.8504 (0.0078)*** | 9.8534 (0.0078)*** | 9.8197 (0.0078)*** | 9.8326 (0.0078)*** | 9.8321 (0.0079)*** | |
mobile | 0.1134 (0.0015)*** | 0.0174 (0.0023)*** | 0.0284 (0.0023)*** | 0.0347 (0.0023)*** | 0.0373 (0.0023)*** | 0.0387 (0.0023)*** | |
Human Development Index | -0.0891 (0.0009)*** | -0.1501 (0.0014)*** | -0.0883 (0.0022)*** | -0.0195 (0.0026)*** | 0.0063 (0.0028)* | -0.0006 (0.0034) | |
mobile : HDI | 0.1064 (0.0019)*** | 0.0931 (0.0019)*** | 0.0847 (0.0019)*** | 0.0838 (0.0019)*** | 0.0950 (0.0036)*** | ||
mobile : HDI^2 | -0.0104 (0.0028)*** | ||||||
HDI^2 | -0.0518 (0.0014)*** | 0.0201 (0.0020)*** | -0.0628 (0.0042)*** | -0.0560 (0.0046)*** | |||
HDI^3 | -0.0807 (0.0016)*** | -0.0904 (0.0017)*** | -0.0908 (0.0017)*** | ||||
HDI^4 | 0.0392 (0.0018)*** | 0.0390 (0.0018)*** | |||||
Revision length (bytes) | 0.1679 (0.0005)*** | 0.1678 (0.0005)*** | 0.1678 (0.0005)*** | 0.1677 (0.0005)*** | 0.1678 (0.0005)*** | 0.1677 (0.0005)*** | |
time to first paint | -0.0161 (0.0006)*** | -0.0160 (0.0006)*** | -0.0157 (0.0006)*** | -0.0155 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | |
time to dom interactive | 0.0031 (0.0009)*** | 0.0031 (0.0009)*** | 0.0030 (0.0009)*** | 0.0031 (0.0009)*** | 0.0031 (0.0009)*** | 0.0031 (0.0009)*** | |
sessionlength | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | |
lastinsessionTRUE | 0.9461 (0.0015)*** | 0.9412 (0.0015)*** | 0.9404 (0.0015)*** | 0.9404 (0.0015)*** | 0.9401 (0.0015)*** | 0.9402 (0.0015)*** | |
nthinsession | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | |
dayofweekMon | 0.0891 (0.0020)*** | 0.0891 (0.0020)*** | 0.0890 (0.0020)*** | 0.0892 (0.0020)*** | 0.0890 (0.0020)*** | 0.0891 (0.0020)*** | |
dayofweekSat | 0.0148 (0.0020)*** | 0.0130 (0.0020)*** | 0.0127 (0.0020)*** | 0.0121 (0.0020)*** | 0.0123 (0.0020)*** | 0.0124 (0.0020)*** | |
dayofweekSun | 0.0316 (0.0020)*** | 0.0304 (0.0020)*** | 0.0298 (0.0020)*** | 0.0290 (0.0020)*** | 0.0292 (0.0020)*** | 0.0293 (0.0020)*** | |
dayofweekThu | 0.0507 (0.0020)*** | 0.0506 (0.0020)*** | 0.0505 (0.0020)*** | 0.0506 (0.0020)*** | 0.0504 (0.0020)*** | 0.0504 (0.0020)*** | |
dayofweekTue | 0.0307 (0.0020)*** | 0.0310 (0.0020)*** | 0.0310 (0.0020)*** | 0.0313 (0.0020)*** | 0.0310 (0.0020)*** | 0.0311 (0.0020)*** | |
dayofweekWed | 0.0708 (0.0019)*** | 0.0707 (0.0019)*** | 0.0705 (0.0019)*** | 0.0707 (0.0019)*** | 0.0705 (0.0019)*** | 0.0705 (0.0019)*** | |
usermonth4 | 0.0095 (0.0096) | 0.0096 (0.0096) | 0.0096 (0.0096) | 0.0098 (0.0096) | 0.0097 (0.0096) | 0.0097 (0.0096) | |
usermonth5 | 0.0105 (0.0096) | 0.0102 (0.0096) | 0.0099 (0.0096) | 0.0097 (0.0096) | 0.0098 (0.0096) | 0.0098 (0.0096) | |
usermonth6 | -0.0095 (0.0098) | -0.0098 (0.0098) | -0.0099 (0.0098) | -0.0099 (0.0098) | -0.0099 (0.0098) | -0.0098 (0.0098) | |
usermonth7 | -0.0482 (0.0098)*** | -0.0489 (0.0098)*** | -0.0485 (0.0098)*** | -0.0480 (0.0098)*** | -0.0479 (0.0098)*** | -0.0478 (0.0098)*** | |
usermonth8 | -0.0128 (0.0097) | -0.0134 (0.0097) | -0.0133 (0.0097) | -0.0126 (0.0097) | -0.0127 (0.0097) | -0.0126 (0.0097) | |
usermonth9 | 0.0385 (0.0076)*** | 0.0391 (0.0076)*** | 0.0390 (0.0076)*** | 0.0393 (0.0076)*** | 0.0392 (0.0076)*** | 0.0392 (0.0076)*** | |
usermonth10 | 0.0023 (0.0075) | 0.0021 (0.0075) | 0.0012 (0.0075) | 0.0005 (0.0075) | 0.0007 (0.0075) | 0.0007 (0.0075) | |
mobileTRUE:lastinsessionTRUE | -0.6636 (0.0021)*** | -0.6568 (0.0021)*** | -0.6575 (0.0021)*** | -0.6584 (0.0021)*** | -0.6582 (0.0021)*** | -0.6584 (0.0021)*** | |
R2 | 0.0613 | 0.0616 | 0.0617 | 0.0620 | 0.0620 | 0.0620 | |
Adj. R2 | 0.0613 | 0.0616 | 0.0617 | 0.0619 | 0.0620 | 0.0620 | |
Num. obs. | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | |
RMSE | 14.3154 | 14.3131 | 14.3121 | 14.3102 | 14.3099 | 14.3099 | |
***p < 0.001, **p < 0.01, *p < 0.05 |
Out of sample predictions
editRmse | Rsqr | name |
---|---|---|
1.677372 | 0.0542021 | model 2 |
1.675276 | 0.0700483 | model 3 |
1.677113 | 0.0688259 | model 3 with quadratic HDI |
1.676338 | 0.0718099 | model 3 with cubic HDI |
1.677631 | 0.0741799 | model 3 with quartic HDI |
1.677640 | 0.0743893 | model 3 with quadratic HDI:mobile |
Residuals for M3v2
editWe still have clusters in the predicted vs residuals plot. Adding the interaction term has improved things to some extent. However, it appears that it may be difficult to fully correct for the deskop + last-in-session patterns. This motivates checking if our results are robust to very long reading times.
Robustness check: removing long dwell times
editRemoving dwell times longer than 1 hour improves the model diagnostics, but the qualitative conclusions from the above model are robust.
Marginal Effects Plot
editRegression Tables
editmodel 2 | model 3 | model 3 with quadratic HDI | model 3 with cubic HDI | model 3 with quartic HDI | model 3 with quadratic HDI:mobile | ||
---|---|---|---|---|---|---|---|
Intercept | 9.7791 (0.0074)*** | 9.8431 (0.0074)*** | 9.8465 (0.0074)*** | 9.8099 (0.0075)*** | 9.8208 (0.0075)*** | 9.8213 (0.0075)*** | |
mobile | 0.1312 (0.0014)*** | 0.0302 (0.0022)*** | 0.0430 (0.0022)*** | 0.0499 (0.0022)*** | 0.0521 (0.0022)*** | 0.0508 (0.0022)*** | |
Human Development Index | -0.0911 (0.0009)*** | -0.1556 (0.0014)*** | -0.0844 (0.0021)*** | -0.0095 (0.0025)*** | 0.0124 (0.0027)*** | 0.0188 (0.0032)*** | |
mobile : HDI | 0.1117 (0.0018)*** | 0.0963 (0.0019)*** | 0.0871 (0.0019)*** | 0.0863 (0.0019)*** | 0.0759 (0.0034)*** | ||
mobile : HDI^2 | 0.0097 (0.0027)*** | ||||||
HDI^2 | -0.0596 (0.0013)*** | 0.0186 (0.0019)*** | -0.0517 (0.0040)*** | -0.0581 (0.0044)*** | |||
HDI^3 | -0.0878 (0.0015)*** | -0.0960 (0.0016)*** | -0.0956 (0.0016)*** | ||||
HDI^4 | 0.0333 (0.0017)*** | 0.0335 (0.0017)*** | |||||
Revision length (bytes) | 0.1643 (0.0004)*** | 0.1643 (0.0004)*** | 0.1643 (0.0004)*** | 0.1642 (0.0004)*** | 0.1642 (0.0004)*** | 0.1642 (0.0004)*** | |
time to first paint | -0.0250 (0.0006)*** | -0.0248 (0.0006)*** | -0.0245 (0.0006)*** | -0.0243 (0.0006)*** | -0.0243 (0.0006)*** | -0.0243 (0.0006)*** | |
time to dom interactive | 0.0034 (0.0008)*** | 0.0034 (0.0008)*** | 0.0032 (0.0008)*** | 0.0034 (0.0008)*** | 0.0033 (0.0008)*** | 0.0033 (0.0008)*** | |
sessionlength | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | |
lastinsessionTRUE | 0.7958 (0.0014)*** | 0.7905 (0.0014)*** | 0.7895 (0.0014)*** | 0.7895 (0.0014)*** | 0.7893 (0.0014)*** | 0.7892 (0.0014)*** | |
nthinsession | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | |
dayofweekMon | 0.0795 (0.0019)*** | 0.0795 (0.0019)*** | 0.0794 (0.0019)*** | 0.0796 (0.0019)*** | 0.0795 (0.0019)*** | 0.0795 (0.0019)*** | |
dayofweekSat | 0.0098 (0.0019)*** | 0.0078 (0.0019)*** | 0.0075 (0.0019)*** | 0.0069 (0.0019)*** | 0.0071 (0.0019)*** | 0.0071 (0.0019)*** | |
dayofweekSun | 0.0333 (0.0019)*** | 0.0320 (0.0019)*** | 0.0313 (0.0019)*** | 0.0305 (0.0019)*** | 0.0307 (0.0019)*** | 0.0307 (0.0019)*** | |
dayofweekThu | 0.0524 (0.0019)*** | 0.0524 (0.0019)*** | 0.0522 (0.0019)*** | 0.0523 (0.0019)*** | 0.0522 (0.0019)*** | 0.0522 (0.0019)*** | |
dayofweekTue | 0.0332 (0.0019)*** | 0.0334 (0.0019)*** | 0.0334 (0.0019)*** | 0.0338 (0.0019)*** | 0.0336 (0.0019)*** | 0.0336 (0.0019)*** | |
dayofweekWed | 0.0693 (0.0019)*** | 0.0692 (0.0019)*** | 0.0690 (0.0019)*** | 0.0691 (0.0019)*** | 0.0690 (0.0019)*** | 0.0690 (0.0019)*** | |
usermonth4 | 0.0024 (0.0092) | 0.0025 (0.0092) | 0.0025 (0.0092) | 0.0027 (0.0092) | 0.0026 (0.0092) | 0.0026 (0.0092) | |
usermonth5 | -0.0002 (0.0091) | -0.0005 (0.0091) | -0.0009 (0.0091) | -0.0011 (0.0091) | -0.0011 (0.0091) | -0.0011 (0.0091) | |
usermonth6 | -0.0127 (0.0094) | -0.0129 (0.0094) | -0.0131 (0.0094) | -0.0132 (0.0093) | -0.0131 (0.0093) | -0.0131 (0.0093) | |
usermonth7 | -0.0498 (0.0093)*** | -0.0506 (0.0093)*** | -0.0501 (0.0093)*** | -0.0495 (0.0093)*** | -0.0495 (0.0093)*** | -0.0495 (0.0093)*** | |
usermonth8 | -0.0148 (0.0093) | -0.0154 (0.0093) | -0.0153 (0.0093) | -0.0146 (0.0093) | -0.0146 (0.0093) | -0.0146 (0.0093) | |
usermonth9 | 0.0384 (0.0073)*** | 0.0389 (0.0073)*** | 0.0388 (0.0073)*** | 0.0392 (0.0073)*** | 0.0391 (0.0073)*** | 0.0391 (0.0073)*** | |
usermonth10 | -0.0020 (0.0072) | -0.0023 (0.0072) | -0.0033 (0.0072) | -0.0041 (0.0072) | -0.0039 (0.0072) | -0.0039 (0.0072) | |
mobileTRUE:lastinsessionTRUE | -0.5163 (0.0020)*** | -0.5092 (0.0020)*** | -0.5099 (0.0020)*** | -0.5109 (0.0020)*** | -0.5107 (0.0020)*** | -0.5105 (0.0020)*** | |
R2 | 0.0518 | 0.0522 | 0.0524 | 0.0527 | 0.0527 | 0.0527 | |
Adj. R2 | 0.0518 | 0.0522 | 0.0524 | 0.0527 | 0.0527 | 0.0527 | |
Num. obs. | 9787783 | 9787783 | 9787783 | 9787783 | 9787783 | 9787783 | |
RMSE | 13.5945 | 13.5919 | 13.5905 | 13.5882 | 13.5879 | 13.5879 | |
***p < 0.001, **p < 0.01, *p < 0.05 |