Assignment: Generating a Correlation Coefficient
22 Aug 2016Program and outputs
%matplotlib inline
import pandas
import numpy
import scipy
import seaborn
import matplotlib.pyplot as plt
data = pandas.read_csv('gapminder.csv')
print('Total number of countries: {0}'.format(len(data)))
Total number of countries: 213
# Convert numeric types
data['incomeperperson'] = pandas.to_numeric(data['incomeperperson'], errors='coerce')
data['alcconsumption'] = pandas.to_numeric(data['alcconsumption'], errors='coerce')
data['lifeexpectancy'] = pandas.to_numeric(data['lifeexpectancy'], errors='coerce')
clean = data[['incomeperperson', 'alcconsumption', 'lifeexpectancy']].dropna()
scat1 = seaborn.regplot(x='incomeperperson', y='lifeexpectancy', fit_reg=True, data=clean, logx=True)
plt.xlabel('Income per Person')
plt.ylabel('Life Expectancy')
plt.title('Scatterplot for the Association between Income per Person and Life Expectancy')
print('Association between Income per Person and Life Expectancy')
print(scipy.stats.pearsonr(clean['incomeperperson'], clean['lifeexpectancy']))
Association between Income per Person and Life Expectancy
(0.59534257610705288, 8.874239655396052e-18)
scat2 = seaborn.regplot(x='alcconsumption', y='lifeexpectancy', fit_reg=True, data=clean)
plt.xlabel('Alcohol Consumption')
plt.ylabel('Life Expectancy')
plt.title('Scatterplot for the Association between Alcohol Consumption and Life Expectancy')
print('Association between Alcohol Consumption and Life Expectancy')
print(scipy.stats.pearsonr(clean['alcconsumption'], clean['lifeexpectancy']))
Association between Alcohol Consumption and Life Expectancy
(0.29893173996914729, 7.1401036474160216e-05)
Summary
There is a significant difference between my two graphs, and the second one definitely doesn’t look significant.
There is a strong correlation between Income per Person and Life Expectancy. This has a very low p value, so we can say that this is significant.
There is a much weaker correlation between Alcohol Consumption and Life Expectancy, but the p value is still fairly low, so at least we can be confident about it.