- ๐Ÿ“•PYTHON

6-1. ์ฃผ์š” ํ†ต๊ณ„์น˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ(agg)

shoo. 2023. 7. 26. 12:22
๋”๋ณด๊ธฐ

์ƒ˜ํ”Œ๋ฐ์ดํ„ฐ

In [2]: titanic = pd.read_csv("data/titanic.csv")

In [3]: titanic.head()
Out[3]: 
   PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
1            2         1       1  ...  71.2833   C85         C
2            3         1       3  ...   7.9250   NaN         S
3            4         1       1  ...  53.1000  C123         S
4            5         0       3  ...   8.0500   NaN         S

[5 rows x 12 columns]

 

1. ํ†ต๊ณ„ ์ง‘๊ณ„ํ•˜๊ธฐ(summary statistics)

ex) ํƒ€์ดํƒ€๋‹‰ ์Šน๊ฐ์˜ ํ‰๊ท  ๋‚˜์ด๋Š”?

In [4]: titanic["Age"].mean()
Out[4]: 29.69911764705882

 ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฒฐ์ธก๊ฐ’์€ ์ œ์™ธ๋˜๋ฉฐ, ๊ธฐ๋ณธ์ ์œผ๋กœ ํ–‰์„ ๋”ฐ๋ผ ์ž‘๋™ํ•œ๋‹ค.

 

ex) ํƒ€์ดํƒ€๋‹‰ ์Šน๊ฐ์˜ ํ‹ฐ์ผ“ ๊ฐ€๊ฒฉ๊ณผ ๋‚˜์ด์˜ ์ค‘์•™๊ฐ’์€?

In [5]: titanic[["Age", "Fare"]].median()
Out[5]: 
Age     28.0000
Fare    14.4542
dtype: float64

DataFrame์˜ ์—ฌ๋Ÿฌ ์—ด์— ์ ์šฉ๋œ ํ†ต๊ณ„(๋‘ ์—ด์„ ์„ ํƒํ•˜๋ฉด DataFrame์ด ๋ฐ˜ํ™˜)๋Š” ๊ฐ ์ˆซ์ž ์—ด์— ๋Œ€ํ•ด ๊ณ„์‚ฐ๋œ๋‹ค.

 

์ง‘๊ณ„ ํ†ต๊ณ„๋Š” ๋™์‹œ์— ์—ฌ๋Ÿฌ ์—ด์— ๋Œ€ํ•ด ๊ณ„์‚ฐ๋  ์ˆ˜ ์žˆ๋‹ค. 

In [6]: titanic[["Age", "Fare"]].describe()
Out[6]: 
              Age        Fare
count  714.000000  891.000000
mean    29.699118   32.204208
std     14.526497   49.693429
min      0.420000    0.000000
25%     20.125000    7.910400
50%     28.000000   14.454200
75%     38.000000   31.000000
max     80.000000  512.329200

 

์‚ฌ์ „์— ์ •์˜๋œ ํ†ต๊ณ„ ๋Œ€์‹ ์— DataFrame.agg() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ์–ด์ง„ ์—ด์— ๋Œ€ํ•œ ํŠน์ •ํ•œ ์ง‘๊ณ„ ํ†ต๊ณ„ ์กฐํ•ฉ์„ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

In [7]: titanic.agg(
   ...:     {
   ...:         "Age": ["min", "max", "median", "skew"],
   ...:         "Fare": ["min", "max", "median", "mean"],
   ...:     }
   ...: )
   ...: 
Out[7]: 
              Age        Fare
min      0.420000    0.000000
max     80.000000  512.329200
median  28.000000   14.454200
skew     0.389108         NaN
mean          NaN   32.204208

 

 

๋ณธ ๋‚ด์šฉ์€ ๊ณต๋ถ€ ๊ธฐ๋ก์šฉ์œผ๋กœ ์ถœ์ฒ˜๋Š” ํŒ๋‹ค์Šค์˜ ๊ณต์‹๋ฌธ์„œ(How to calculate summary statistics) ์ž…๋‹ˆ๋‹ค