Assignment: Testing a Potential Moderator
04 Sep 2016Program and outputs
%matplotlib inline
import pandas
import numpy
import scipy
import seaborn
import statsmodels.formula.api as smf
from matplotlib import pyplot as plt
data = pandas.read_csv('gapminder.csv')
print('Total number of countries: {0}'.format(len(data)))
Total number of countries: 213
# Convert numeric types
data['incomeperperson'] = pandas.to_numeric(data['incomeperperson'], errors='coerce')
data['alcconsumption'] = pandas.to_numeric(data['alcconsumption'], errors='coerce')
data['lifeexpectancy'] = pandas.to_numeric(data['lifeexpectancy'], errors='coerce')
clean = data[['incomeperperson', 'alcconsumption', 'lifeexpectancy']].dropna()
print('Total remaining countries: {0}'.format(len(clean)))
Total remaining countries: 171
sub1 = clean[(clean['lifeexpectancy'] <= 70)]
sub2 = clean[(clean['lifeexpectancy'] > 70)]
def plot(data):
seaborn.regplot(x="incomeperperson", y="alcconsumption", data=data)
plt.xlabel('Income per Person')
plt.ylabel('Alcohol Consumption')
print('Association between income and alcohol consumption for those with a SHORTER life expectancy')
print(('correlation coefficient', 'p value'))
print(scipy.stats.pearsonr(sub1['incomeperperson'], sub1['alcconsumption']))
plot(sub1)
print('Association between income and alcohol consumption for those with a LONGER life expectancy')
print(('correlation coefficient', 'p value'))
print(scipy.stats.pearsonr(sub2['incomeperperson'], sub2['alcconsumption']))
plot(sub2)
Association between income and alcohol consumption for those with a SHORTER life expectancy
('correlation coefficient', 'p value')
(0.24564412558074125, 0.038937397029065977)
Association between income and alcohol consumption for those with a LONGER life expectancy
('correlation coefficient', 'p value')
(0.20396295528103475, 0.04180701551814929)
Summary
I ran a pearsonr correlation for income and alcohol consumption, moderated by countries with a SHORT (less than 70 years) life expectancy or a LONG (longer than 70 years) life expectancy. Despite having a significant P-value, the correlation coefficient is low enough that we can say that there is no correlation between income per person and alcohol consumption for either subgroup.