3일차
그래프 그리기¶
04-1 데이터 시각화란?¶
이번에 사용할 데이터셋 예제는 앤스컴 콰르텟입니다. 이 데이터셋은 통계 그래프의 중요성을 강조하고자 영국의 통계학자 프랭크 앤스컴이 만들었습니다. 앤스컴 콰르텟은 4개의 데이터셋으로 구성되며 각 데이터셋에는 2개의 연속 변수가 있습니다. 4개의 데이터셋은 평균, 분산, 상관관계, 회귀선이 모두 같습니다. 따라서 기술 통계만 보면 마치 같은 데이터셋처럼 보일 수 있습니다. 하지만 이를 시각화 하면 4가지 모두 경향이 다르다는 사실을 직관적으로 알 수 있습니다. 이런 점에서 시각화는 데이터 분석에서 아주 중요한 요소라고 할 수 있습니다.
앤스컴 콰르텟 데이터셋은 seaborn 라이브러리에 포함됩니다.
In [1]:
import seaborn as sns
anscombe = sns.load_dataset("anscombe")
print(anscombe)
dataset x y 0 I 10.0 8.04 1 I 8.0 6.95 2 I 13.0 7.58 3 I 9.0 8.81 4 I 11.0 8.33 5 I 14.0 9.96 6 I 6.0 7.24 7 I 4.0 4.26 8 I 12.0 10.84 9 I 7.0 4.82 10 I 5.0 5.68 11 II 10.0 9.14 12 II 8.0 8.14 13 II 13.0 8.74 14 II 9.0 8.77 15 II 11.0 9.26 16 II 14.0 8.10 17 II 6.0 6.13 18 II 4.0 3.10 19 II 12.0 9.13 20 II 7.0 7.26 21 II 5.0 4.74 22 III 10.0 7.46 23 III 8.0 6.77 24 III 13.0 12.74 25 III 9.0 7.11 26 III 11.0 7.81 27 III 14.0 8.84 28 III 6.0 6.08 29 III 4.0 5.39 30 III 12.0 8.15 31 III 7.0 6.42 32 III 5.0 5.73 33 IV 8.0 6.58 34 IV 8.0 5.76 35 IV 8.0 7.71 36 IV 8.0 8.84 37 IV 8.0 8.47 38 IV 8.0 7.04 39 IV 8.0 5.25 40 IV 19.0 12.50 41 IV 8.0 5.56 42 IV 8.0 7.91 43 IV 8.0 6.89
04-2 matplotlib 라이브러리란?¶
널리 사용되는 파이썬 시각화 라이브러리입니다. 매우 유연하므로 사용자가 그래프의 모든 요소를 제어할 수 있습니다.
In [2]:
import matplotlib.pyplot as plt
dataset_1 = anscombe[anscombe['dataset'] == 'I']
plt.plot(dataset_1['x'], dataset_1['y'])
plt.show()
In [3]:
plt.plot(dataset_1['x'], dataset_1['y'], 'o')
plt.show()
In [4]:
dataset_2 = anscombe[anscombe['dataset'] == 'II']
dataset_3 = anscombe[anscombe['dataset'] == 'III']
dataset_4 = anscombe[anscombe['dataset'] == 'IV']
In [5]:
fig = plt.figure()
axes1 = fig.add_subplot(2, 2, 1)
axes2 = fig.add_subplot(2, 2, 2)
axes3 = fig.add_subplot(2, 2, 3)
axes4 = fig.add_subplot(2, 2, 4)
plt.show()
In [6]:
fig = plt.figure()
axes1 = fig.add_subplot(2, 2, 1)
axes2 = fig.add_subplot(2, 2, 2)
axes3 = fig.add_subplot(2, 2, 3)
axes4 = fig.add_subplot(2, 2, 4)
axes1.plot(dataset_1['x'], dataset_1['y'], 'o')
axes2.plot(dataset_2['x'], dataset_2['y'], 'o')
axes3.plot(dataset_3['x'], dataset_3['y'], 'o')
axes4.plot(dataset_4['x'], dataset_4['y'], 'o')
plt.show()
In [9]:
fig = plt.figure()
axes1 = fig.add_subplot(2, 2, 1)
axes2 = fig.add_subplot(2, 2, 2)
axes3 = fig.add_subplot(2, 2, 3)
axes4 = fig.add_subplot(2, 2, 4)
axes1.plot(dataset_1['x'], dataset_1['y'], 'o')
axes2.plot(dataset_2['x'], dataset_2['y'], 'o')
axes3.plot(dataset_3['x'], dataset_3['y'], 'o')
axes4.plot(dataset_4['x'], dataset_4['y'], 'o')
axes1.set_title("dataset_1")
axes2.set_title("dataset_2")
axes3.set_title("dataset_3")
axes4.set_title("dataset_4")
fig.suptitle("Anscombe Data")
fig.set_tight_layout(True)
plt.show()
In [10]:
dataset_1.describe()
Out[10]:
x | y | |
---|---|---|
count | 11.000000 | 11.000000 |
mean | 9.000000 | 7.500909 |
std | 3.316625 | 2.031568 |
min | 4.000000 | 4.260000 |
25% | 6.500000 | 6.315000 |
50% | 9.000000 | 7.580000 |
75% | 11.500000 | 8.570000 |
max | 14.000000 | 10.840000 |
In [11]:
dataset_2.describe()
Out[11]:
x | y | |
---|---|---|
count | 11.000000 | 11.000000 |
mean | 9.000000 | 7.500909 |
std | 3.316625 | 2.031657 |
min | 4.000000 | 3.100000 |
25% | 6.500000 | 6.695000 |
50% | 9.000000 | 8.140000 |
75% | 11.500000 | 8.950000 |
max | 14.000000 | 9.260000 |
In [12]:
tips = sns.load_dataset("tips")
print(tips)
total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns]
In [13]:
fig = plt.figure()
axes1 = fig.add_subplot(1, 1, 1)
axes1.hist(data=tips, x="total_bill", bins=10)
axes1.set_title("Histogram of Total Bill")
axes1.set_xlabel("Total Bill")
axes1.set_ylabel("Frequency")
plt.show() # 가격의 분포
In [14]:
import matplotlib.pyplot as plt
plt.rc('font', family='Malgun Gothic')
In [15]:
scatter_plot = plt.figure()
axes1 = scatter_plot.add_subplot(1, 1, 1)
axes1.scatter(tips['total_bill'], tips['tip'])
axes1.set_title('Scatterplot of Total Bill vs Tip')
axes1.set_xlabel("Total Bill")
axes1.set_ylabel('Tip')
plt.show()
In [16]:
boxplot = plt.figure()
axes1 = boxplot.add_subplot(1, 1, 1)
axes1.boxplot(
x = [
tips[tips['sex'] == 'Female']['tip'],
tips[tips['sex'] == 'Male']['tip']
],
labels = ['Female', 'Male']
)
axes1.set_xlabel('Sex')
axes1.set_ylabel('Tip')
axes1.set_title('Boxplot of Tips by Sex')
plt.show()
In [18]:
colors = {"Female": "#f1a340", "Male": "#998ec3"}
scatter_plot = plt.figure()
axes1 = scatter_plot.add_subplot(1, 1, 1)
axes1.scatter(data=tips, x='total_bill', y='tip', s=tips['size']**2*10, c=tips['sex'].map(colors), alpha=0.5)
axes1.set_title('Colored by Sex and Sized by Size')
axes1.set_xlabel('Total Bill')
axes1.set_ylabel('Tip')
scatter_plot.suptitle('Total Bill vs Tip')
plt.show()
In [19]:
import seaborn as sns
tips = sns.load_dataset('tips')
In [20]:
sns.set_context("paper")
In [21]:
hist, ax = plt.subplots()
sns.histplot(data=tips, x='total_bill', ax=ax)
ax.set_title('Total Bill Histogram')
plt.show()
In [22]:
den, ax = plt.subplots()
sns.kdeplot(data=tips, x='total_bill', ax=ax)
ax.set_title('Total Bill Density')
ax.set_xlabel('Total Bill')
ax.set_ylabel('Unit Probability')
plt.show()
In [23]:
rug, ax = plt.subplots()
sns.rugplot(data=tips, x='total_bill', ax=ax)
sns.histplot(data=tips, x='total_bill', ax=ax)
ax.set_title('Rug Plot and Histogram of Total Bill')
plt.show()
In [24]:
fig= sns.displot(data=tips, x='total_bill', kde=True, rug=True)
fig.set_axis_labels(x_var='Total Bill', y_var='Count')
fig.figure.suptitle('Distribution of Total Bill')
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [27]:
count, ax = plt.subplots()
sns.countplot(data=tips, x='day', palette='viridis', ax=ax)
ax.set_title('Count of days')
ax.set_xlabel('Day of the Week')
ax.set_ylabel('Frequency')
plt.show()
In [28]:
scatter, ax = plt.subplots()
sns.scatterplot(data=tips, x='total_bill', y='tip', ax=ax)
ax.set_title('Scatter Plot of Total Bill')
ax.set_xlabel('Total Bill')
ax.set_ylabel('Tip')
plt.show()
In [29]:
reg, ax = plt.subplots()
sns.regplot(data=tips, x='total_bill', y='tip', ax=ax)
ax.set_title('Regression Plot of Total Bill and Tip')
ax.set_xlabel('Total Bill')
ax.set_ylabel('Tip')
plt.show()
In [30]:
fig = sns.lmplot(data=tips, x='total_bill', y='tip')
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [31]:
joint = sns.jointplot(data=tips, x='total_bill', y='tip')
joint.set_axis_labels(xlabel='Total Bill', ylabel='Tip')
joint.figure.suptitle('Joint Plot of Total Bill and Tip', y=1.03)
plt.show()
In [33]:
hexbin = sns.jointplot(data=tips, x='total_bill', y='tip', kind='hex')
hexbin.set_axis_labels(xlabel='Total Bill', ylabel='Tip')
hexbin.figure.suptitle('Hexbin Plot of Total', y=1.03)
plt.show()
In [34]:
kde, ax = plt.subplots()
sns.kdeplot(data=tips, x="total_bill", y="tip", fill=True, ax=ax)
ax.set_title('Kernel Density Plot of Total Bill')
ax.set_xlabel('Total Bill')
ax.set_ylabel('Tip')
plt.show()
In [35]:
kde2d = sns.jointplot(data=tips, x="total_bill", y="tip", kind="kde")
kde2d.set_axis_labels(xlabel='Total Bill', ylabel='Tip')
kde2d.figure.suptitle('Hexbin Joint Plot of Total Bill and Tip', y=1.03)
plt.show()
In [36]:
import numpy as np
bar, ax = plt.subplots()
sns.barplot(data=tips, x="time", y="total_bill", estimator=np.mean, ax=ax)
ax.set_title('Bar Plot of Average Total Bill for Time of Day')
ax.set_xlabel('Time of Day')
ax.set_ylabel('Average Total Bill')
plt.show()
In [38]:
box, ax = plt.subplots()
sns.boxplot(data=tips, x='time', y="total_bill", ax=ax)
ax.set_title('Bar Plot of Total Bill for Time of Day')
ax.set_xlabel('Time of Day')
ax.set_ylabel('Total Bill')
plt.show()
In [39]:
violin, ax = plt.subplots()
sns.violinplot(data=tips, x="time", y="total_bill", ax=ax)
ax.set_title('Violin plot of total bill by time of day')
ax.set_xlabel('Time of day')
ax.set_ylabel('Total Bill')
plt.show()
In [43]:
box_violin, (ax1, ax2) = plt.subplots(nrows=1, ncols=2)
sns.boxplot(data=tips, x='time', y='total_bill', ax=ax1)
sns.violinplot(data=tips, x='time', y='total_bill', ax=ax2)
ax1.set_title('Box Plot')
ax1.set_xlabel('Time of Day')
ax1.set_ylabel('Total Bill')
ax2.set_title('Violin plot')
ax2.set_xlabel('Time of Day')
ax2.set_ylabel('Total Bill')
box_violin.suptitle('Comparison of Box Plot With Violin Plot')
box_violin.set_tight_layout(True)
plt.show()
In [44]:
fig = sns.pairplot(data=tips)
fig.figure.suptitle('Pairwise Relationship of the Tips Data', y=1.03)
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [46]:
pair_grid = sns.PairGrid(tips, diag_sharey=False)
pair_grid = pair_grid.map_upper(sns.regplot)
pair_grid = pair_grid.map_lower(sns.kdeplot)
pair_grid = pair_grid.map_diag(sns.histplot)
plt.show()
In [48]:
violin, ax = plt.subplots()
sns.violinplot(data=tips,
x="time",
y="total_bill",
hue="smoker",
split=True,
palette="viridis",
ax=ax)
Out[48]:
<Axes: xlabel='time', ylabel='total_bill'>
In [50]:
scatter = sns.lmplot(data=tips, x="total_bill", y="tip", hue="smoker", fit_reg=False, palette="viridis")
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs) C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [51]:
fig = sns.pairplot(tips, hue="time", palette="viridis")
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [52]:
fig, ax = plt.subplots()
sns.scatterplot(data=tips, x="time", y="tip", hue="time", size="size", palette="viridis", ax=ax)
plt.show()
In [53]:
anscombe = sns.load_dataset("anscombe")
In [55]:
anscombe_plot = sns.relplot(data=anscombe,
x="x",
y="y",
kind="scatter",
col="dataset",
col_wrap=2,
height=2,
aspect=1.6)
anscombe_plot.figure.set_tight_layout(True)
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [57]:
colors = {
"Yes": "#f1a340", # 주황색
"No": "#998ec3", #보라색
}
facet2 = sns.relplot(data=tips,
x="total_bill",
y="tip",
hue="smoker",
style="sex",
kind="scatter",
col="day",
row="time",
palette=colors,
height=1.7)
facet2.set_titles(row_template="{row_name}", col_template="{col_name}")
sns.move_legend(facet2, loc="lower center", bbox_to_anchor=(0.5, 1), ncol=2, title=None, frameon=True)
facet2.figure.set_tight_layout(True)
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [58]:
facet = sns.FacetGrid(tips, col='time')
facet.map(sns.histplot, 'total_bill')
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [59]:
facet = sns.FacetGrid(tips,
col='day',
col_wrap=2,
hue='sex',
palette="viridis")
facet.map(plt.scatter, 'total_bill', 'tip')
facet.add_legend()
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [60]:
facet = sns.FacetGrid(tips,
col='time',
row='smoker',
hue='sex',
palette="viridis")
facet.map(plt.scatter, 'total_bill', 'tip')
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [62]:
plt.rc('axes', unicode_minus=False)
facet = sns.catplot(x="day",
y="total_bill",
hue="sex",
data=tips,
row="smoker",
col="time",
kind="violin",
height=3)
plt.show()
C:\Users\offse\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
In [63]:
fig, ax = plt.subplots()
sns.violinplot(data=tips, x="time", y="total_bill", hue="sex", split=True, ax=ax)
plt.show()
In [66]:
with sns.axes_style("whitegrid"):
fig, ax = plt.subplots()
sns.violinplot(data=tips,
x="time",
y="total_bill",
hue="sex",
split=True,
ax=ax)
plt.show()
In [67]:
seaborn_styles = ["darkgrid", "whitegrid", "dark", "white", "ticks"]
fig = plt.figure()
for idx, style in enumerate(seaborn_styles):
plot_position = idx + 1
with sns.axes_style(style):
ax = fig.add_subplot(2, 3, plot_position)
violin = sns.violinplot(data=tips, x="time", y="total_bill", ax=ax)
violin.set_title(style)
fig.set_tight_layout(True)
plt.show()
In [68]:
import pandas as pd
contexts = pd.DataFrame({
"paper": sns.plotting_context("paper"),
"notebook": sns.plotting_context("notebook"),
"talk": sns.plotting_context("talk"),
"poster": sns.plotting_context("poster"),
})
print(contexts)
paper notebook talk poster axes.linewidth 1.0 1.25 1.875 2.5 grid.linewidth 0.8 1.00 1.500 2.0 lines.linewidth 1.2 1.50 2.250 3.0 lines.markersize 4.8 6.00 9.000 12.0 patch.linewidth 0.8 1.00 1.500 2.0 xtick.major.width 1.0 1.25 1.875 2.5 ytick.major.width 1.0 1.25 1.875 2.5 xtick.minor.width 0.8 1.00 1.500 2.0 ytick.minor.width 0.8 1.00 1.500 2.0 xtick.major.size 4.8 6.00 9.000 12.0 ytick.major.size 4.8 6.00 9.000 12.0 xtick.minor.size 3.2 4.00 6.000 8.0 ytick.minor.size 3.2 4.00 6.000 8.0 font.size 9.6 12.00 18.000 24.0 axes.labelsize 9.6 12.00 18.000 24.0 axes.titlesize 9.6 12.00 18.000 24.0 xtick.labelsize 8.8 11.00 16.500 22.0 ytick.labelsize 8.8 11.00 16.500 22.0 legend.fontsize 8.8 11.00 16.500 22.0 legend.title_fontsize 9.6 12.00 18.000 24.0
In [69]:
context_styles = contexts.columns
fig = plt.figure()
for idx, context in enumerate(context_styles):
plot_position = idx + 1
with sns.plotting_context(context):
ax = fig.add_subplot(2, 2, plot_position)
violin = sns.violinplot(data=tips, x="time", y="total_bill", ax=ax)
violin.set_title(context)
fig.set_tight_layout(True)
plt.show()
04-5 판다스로 그래프 그리기¶
In [70]:
fig, ax = plt.subplots()
tips['total_bill'].plot.hist(ax=ax)
plt.show()
In [71]:
fig, ax = plt.subplots()
tips[['total_bill', 'tip']].plot.hist(alpha=0.5, bins=20, ax=ax)
plt.show()
In [72]:
fig, ax = plt.subplots()
tips['tip'].plot.kde(ax=ax)
plt.show()
In [73]:
fig, ax = plt.subplots()
tips.plot.scatter(x="total_bill", y="tip", ax=ax)
plt.show()
In [76]:
fig, ax = plt.subplots()
tips.plot.hexbin(x="total_bill", y="tip", ax=ax)
plt.show()
In [77]:
fig, ax = plt.subplots()
tips.plot.hexbin(x="total_bill", y="tip", gridsize=10, ax=ax)
plt.show()
In [78]:
fig, ax = plt.subplots()
tips.plot.box(ax=ax)
plt.show()
'도서 > 프로그래밍' 카테고리의 다른 글
[05] Do it! 데이터 분석을 위한 판다스 입문 (0) | 2024.01.06 |
---|---|
[04] Do it! 데이터 분석을 위한 판다스 입문 (0) | 2024.01.05 |
[02] Do it! 데이터 분석을 위한 판다스 입문 (0) | 2024.01.03 |
[01] Do it! 데이터 분석을 위한 판다스 입문 (2) | 2024.01.02 |
[07][完] 객체지향의 사실과 오해 - 함께 모으기 (2) | 2024.01.01 |