The Rotterdam Ophthalmic Institute, which is affiliated with the Rotterdam Eye Hospital has a dataset that contains longitudinal visual field data of 139 glaucoma patients. The main page is here and the dataset was used for studies like Robust and Censored Modeling and Prediction of Progression in Glaucomatous Visual Fields.

I downloaded the data and did the following, I took from every patient the first visual field and calculated the mean sensitivity threshold of all the patients for every point, then I did the same for the last visual field. I substracted the mean sensitivities of the first and the last test and plotted the differences, here is the result:



Most of the points that lose sensitivity are nasally placed, well I expected that but I didn´t expect point B. That point is gaining sensitivity in both eyes, is located nasally and surrounded by points that  are losing sensitivity, that´s odd. Then point A, it is located where the blind spot is and is losing sensitivity. Another thing is that point A and B are simetrically located and I thought it could be a transcription error and A and B are switched. To know more about those points I plotted an histogram for the thresholds at those points. I show you here point B for the right eye:



In the first visual field over 50 patients out of 138 didn´t saw that point and in the last one just over 30 people, also at the higher thresholds there are quite some people doing better in the last one. Let´s see the one for point A:



Here the difference is smaller and much more people didn´t see point A in both tests, as expected since it is the blind spot but that confirms that the points are not switched.

The fact that the same point in both eyes has that characteristic is quite interesting. I am going to leave it here by now and might look for it in a future article, meanwhile if you have any idea of why this is happening please let me know here or via linkedin.

Here is the code for the data preparation and for the graphics. I checked it but if you find a mistake there please also let me know.


There are many studies about the prevalence of refractive error among different populations. To see how we can show a table in a graph I will use the Namil study: refractive errors in a rural Korean adult population. In table 2 we can see the prevalence error by age and sex with 95 % confidence intervals (c.i.), being the last line the totals for all the sample population.

I will make a graphic of that table, it´s based on the one from this post but now with two factors and with a total different look.



Some comments on the graphic. The last row are panels for all age groups, the column to the right are for both genders together and the bottom right is for all the sample population, that is why it has narrower c.i., because the population is larger. The other panels are subgroups of the population from the bottom right one.

If we look at that one we can then compare different groups. For example we see that for all the sample population there are more hyperopes than myopes. Within the different groups this is true for all except for the youngest group category (40 to 49 years old men and women) and also for the men over 80, however this age group for men has the smallest sample size with n = 26 for all refractive errors and therefore the largest c.i. You can find other interesting differences between groups, like emmetropes or just how low is the prevalence in all the groups of high myopes (over -6 D). Remember the prevalence of the refractive error here is for a rural population in Korea.

You can find the code to repoduce the graphic here.

Graphics can show you in a faster way the characteristics of your data than a table or a summary statistic. But it also can show you more unexpected things.

I will show you this with the following example. Imagine you have a dataset with 44 points, the mean of x is 9, the mean of y is 7.5, the correlation between x and y is 0.816 and the linear regression line has an intercept of 3 and a slope of 0.5. Here is the plot :



Now we divide the original table of 44 points in 4 tables of 11 points. Each of these tables has the same mean of x and y, same correlation and same linear regression intercept and slope as the original table. I will plot it now and will use different colors for the 4 tables:




There seems to be different patterns for different categories. I will plot the 4 of them apart:



The characteristics of each table is quite different although the four of them have the same mean of x and y, the same correlation and the same interecept and slope for the linear regression, furthermore the 4 tables have the same variance of x and y. The information you get when you plot them by category is quite different that the information you get if you look at their simple summary statistics by category.

As you might have already notice the four datasets are Anscombe´s quartet, I just put them all together at the beginning. You can find the code to get the data and reproduce the plots here.