- ๐PYTHON
6-1. ์ฃผ์ ํต๊ณ์น๋ฅผ ๊ณ์ฐํ๊ธฐ(agg)
shoo.
2023. 7. 26. 12:22
๋๋ณด๊ธฐ
์ํ๋ฐ์ดํฐ
In [2]: titanic = pd.read_csv("data/titanic.csv")
In [3]: titanic.head()
Out[3]:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
[5 rows x 12 columns]
1. ํต๊ณ ์ง๊ณํ๊ธฐ(summary statistics)
ex) ํ์ดํ๋ ์น๊ฐ์ ํ๊ท ๋์ด๋?
In [4]: titanic["Age"].mean()
Out[4]: 29.69911764705882
์ผ๋ฐ์ ์ผ๋ก ๊ฒฐ์ธก๊ฐ์ ์ ์ธ๋๋ฉฐ, ๊ธฐ๋ณธ์ ์ผ๋ก ํ์ ๋ฐ๋ผ ์๋ํ๋ค.
ex) ํ์ดํ๋ ์น๊ฐ์ ํฐ์ผ ๊ฐ๊ฒฉ๊ณผ ๋์ด์ ์ค์๊ฐ์?
In [5]: titanic[["Age", "Fare"]].median()
Out[5]:
Age 28.0000
Fare 14.4542
dtype: float64
DataFrame์ ์ฌ๋ฌ ์ด์ ์ ์ฉ๋ ํต๊ณ(๋ ์ด์ ์ ํํ๋ฉด DataFrame์ด ๋ฐํ)๋ ๊ฐ ์ซ์ ์ด์ ๋ํด ๊ณ์ฐ๋๋ค.
์ง๊ณ ํต๊ณ๋ ๋์์ ์ฌ๋ฌ ์ด์ ๋ํด ๊ณ์ฐ๋ ์ ์๋ค.
In [6]: titanic[["Age", "Fare"]].describe()
Out[6]:
Age Fare
count 714.000000 891.000000
mean 29.699118 32.204208
std 14.526497 49.693429
min 0.420000 0.000000
25% 20.125000 7.910400
50% 28.000000 14.454200
75% 38.000000 31.000000
max 80.000000 512.329200
์ฌ์ ์ ์ ์๋ ํต๊ณ ๋์ ์ DataFrame.agg() ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ์ฃผ์ด์ง ์ด์ ๋ํ ํน์ ํ ์ง๊ณ ํต๊ณ ์กฐํฉ์ ์ ์ํ ์ ์๋ค.
In [7]: titanic.agg(
...: {
...: "Age": ["min", "max", "median", "skew"],
...: "Fare": ["min", "max", "median", "mean"],
...: }
...: )
...:
Out[7]:
Age Fare
min 0.420000 0.000000
max 80.000000 512.329200
median 28.000000 14.454200
skew 0.389108 NaN
mean NaN 32.204208
๋ณธ ๋ด์ฉ์ ๊ณต๋ถ ๊ธฐ๋ก์ฉ์ผ๋ก ์ถ์ฒ๋ ํ๋ค์ค์ ๊ณต์๋ฌธ์(How to calculate summary statistics) ์ ๋๋ค