๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
- ๐Ÿ“•PYTHON

4. ํŒ๋‹ค์Šค์—์„œ ๊ทธ๋ž˜ํ”„ ๋งŒ๋“ค๊ธฐ(plot,)

by shoo. 2023. 7. 22.

๋ชฉ์ฐจ

     

      In [1]: import pandas as pd
      
      In [2]: import matplotlib.pyplot as plt
      In [3]: air_quality = pd.read_csv("data/air_quality_no2.csv", index_col=0, parse_dates=True)
      
      In [4]: air_quality.head()
      Out[4]: 
                           station_antwerp  station_paris  station_london
      datetime                                                           
      2019-05-07 02:00:00              NaN            NaN            23.0
      2019-05-07 03:00:00             50.5           25.0            19.0
      2019-05-07 04:00:00             45.0           27.7            19.0
      2019-05-07 05:00:00              NaN           50.4            16.0
      2019-05-07 06:00:00              NaN           61.9             NaN

      index_col ๋ฐ parse_dates ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ read_csv ํ•จ์ˆ˜์—์„œ ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑ๋˜๋Š” DataFrame์˜ ์ธ๋ฑ์Šค๋กœ ์ฒซ ๋ฒˆ์งธ(0๋ฒˆ์งธ) ์—ด์„ ์ •์˜ํ•˜๊ณ , ์—ด์— ์žˆ๋Š” ๋‚ ์งœ๋ฅผ ๊ฐ๊ฐ Timestamp ๊ฐ์ฒด๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค

       

       

      1) ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•˜๊ธฐ

      In [5]: air_quality.plot()
      Out[5]: <AxesSubplot: xlabel='datetime'>
      
      In [6]: plt.show()

      DataFrame์—์„œ ํŒ๋‹ค์Šค๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ˆซ์ž ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฐ ์—ด์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ์„  ๊ทธ๋ž˜ํ”„(line plot)๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

       

      2) 'Paris'์—์„œ ๋‚˜์˜จ ๋ฐ์ดํ„ฐ๋งŒ ํฌํ•จํ•˜๋Š” ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์˜ ์—ด๋งŒ ๊ทธ๋ฆฌ๊ธฐ

      In [7]: air_quality["station_paris"].plot()
      Out[7]: <AxesSubplot: xlabel='datetime'>
      
      In [8]: plt.show()

      ํŠน์ • ์—ด์„ ๊ทธ๋ฆฌ๋ ค๋ฉด, ๋ถ€๋ถ„ ๋ฐ์ดํ„ฐ ์„ ํƒ ๋ฐฉ๋ฒ•๊ณผ plot() ๋ฉ”์„œ๋“œ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•œ๋‹ค.

      ๋”ฐ๋ผ์„œ plot() ๋ฉ”์†Œ๋“œ๋Š” Series์™€ DataFrame ๋ชจ๋‘์—์„œ ์ž‘๋™ํ•œ๋‹ค.

       

      3) ๋Ÿฐ๋˜๊ณผ ํŒŒ๋ฆฌ์—์„œ ์ธก์ •๋œ ๊ฐ’์„ ์‹œ๊ฐ์ ์œผ๋กœ ๋น„๊ตํ•˜๊ธฐ

      In [9]: air_quality.plot.scatter(x="station_london", y="station_paris", alpha=0.5)
      Out[9]: <AxesSubplot: xlabel='station_london', ylabel='station_paris'>
      
      In [10]: plt.show()

      plot ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ๋ณธ ์„  ๊ทธ๋ž˜ํ”„ ์™ธ์—๋„ ๋ฐ์ดํ„ฐ๋ฅผ ํ”Œ๋กฏํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. 

      In [11]: [
         ....:     method_name
         ....:     for method_name in dir(air_quality.plot)
         ....:     if not method_name.startswith("_")
         ....: ]
         ....: 
      Out[11]: 
      ['area',
       'bar',
       'barh',
       'box',
       'density',
       'hexbin',
       'hist',
       'kde',
       'line',
       'pie',
       'scatter']

       

       ์ด ์ฝ”๋“œ๋Š” air_quality.plot ๊ฐ์ฒด์—์„œ ์‹œ์ž‘ํ•˜๋Š” "_"๋กœ ์‹œ์ž‘ํ•˜์ง€ ์•Š๋Š” ๋ชจ๋“  ๋ฉ”์„œ๋“œ ์ด๋ฆ„์„ ์ฐพ์•„ ๋ฆฌ์ŠคํŠธ๋กœ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

      'area': ๋ฉด์  ๊ทธ๋ž˜ํ”„
      'bar': ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„(์ˆ˜์ง ๋ฐฉํ–ฅ)
      'barh': ์ˆ˜ํ‰ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„(์ˆ˜ํ‰ ๋ฐฉํ–ฅ)
      'box': ๋ฐ•์Šค ํ”Œ๋กฏ
      'density': ๋ฐ€๋„ ๊ทธ๋ž˜ํ”„
      'hexbin': ์œก๊ฐํ˜• ๊ตฌ๊ฐ„ ๊ทธ๋ž˜ํ”„
      'hist': ํžˆ์Šคํ† ๊ทธ๋žจ
      'kde': ์ปค๋„ ๋ฐ€๋„ ์ถ”์ • ๊ทธ๋ž˜ํ”„
      'line': ์„  ๊ทธ๋ž˜ํ”„(๊ธฐ๋ณธ)
      'pie': ์›(ํŒŒ์ด) ๊ทธ๋ž˜ํ”„
      'scatter': ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„

       

      ๋งŽ์€ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์—์„œ IPython๊ณผ Jupyter Notebook๊ณผ ๊ฐ™์ด TAB ๋ฒ„ํŠผ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฉ”์„œ๋“œ์— ๋Œ€ํ•œ ๊ฐœ์š”๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, air_quality.plot. + TAB์ž…๋‹ˆ๋‹ค. 

       

      ์˜ต์…˜ ์ค‘ ํ•˜๋‚˜๋Š” DataFrame.plot.box()์ธ๋ฐ, ์ด๋Š” ๋ฐ•์Šค ํ”Œ๋กฏ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ฐ•์Šค ๋ฉ”์„œ๋“œ๋Š” ๊ณต๊ธฐ ์งˆ ์˜ˆ์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.

      In [12]: air_quality.plot.box()
      Out[12]: <AxesSubplot: >
      
      In [13]: plt.show()

       

      4) ๊ฐ ์—ด์„ ๋ณ„๋„์˜ ์„œ๋ธŒํ”Œ๋กฏ์— ๊ทธ๋ฆฌ๊ธฐ

      In [14]: axs = air_quality.plot.area(figsize=(12, 4), subplots=True)
      
      In [15]: plt.show()

      ํ”Œ๋กฏ ํ•จ์ˆ˜์—์„œ ์ œ๊ณต๋˜๋Š” subplots ์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ฐ ๋ฐ์ดํ„ฐ ์—ด์— ๋Œ€ํ•œ ๋ณ„๋„์˜ ์„œ๋ธŒํ”Œ๋กฏ์„ ์ง€์›ํ•  ์ˆ˜ ์žˆ๋‹ค. ํŒ๋‹ค์Šค ๋‚ด์žฅ ํ”Œ๋กฏ ํ•จ์ˆ˜์—์„œ ๊ฐ€๋Šฅํ•œ ์˜ต์…˜๋“ค์€ ๋ชจ๋‘ ๊ฒ€ํ† ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

       

      5) ๊ทธ๋ž˜ํ”„๋ฅผ ์ปค์Šคํ„ฐ๋งˆ์ด์ฆˆ, ํ™•์žฅํ•˜์—ฌ ์ €์žฅํ•˜๊ธฐ

      In [16]: fig, axs = plt.subplots(figsize=(12, 4))
      
      In [17]: air_quality.plot.area(ax=axs)
      Out[17]: <AxesSubplot: xlabel='datetime'>
      
      In [18]: axs.set_ylabel("NO$_2$ concentration")
      Out[18]: Text(0, 0.5, 'NO$_2$ concentration')
      
      In [19]: fig.savefig("no2_concentrations.png")
      
      In [20]: plt.show()

      ํŒ๋‹ค์Šค์—์„œ ์ƒ์„ฑํ•œ ๊ฐ ํ”Œ๋กฏ ๊ฐ์ฒด๋Š” Matplotlib ๊ฐ์ฒด๋‹ค. Matplotlib๋Š” ๋‹ค์–‘ํ•œ ์˜ต์…˜์„ ์ œ๊ณตํ•˜์—ฌ ๊ทธ๋ž˜ํ”„๋ฅผ ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ํŒ๋‹ค์Šค์™€ Matplotlib ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ช…ํ™•ํžˆ ํ•จ์œผ๋กœ์จ Matplotlib์˜ ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ๊ทธ๋ž˜ํ”„์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ์ „๋žต์€ ์ด์ „ ์˜ˆ์‹œ์—์„œ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค.

      fig, axs = plt.subplots(figsize=(12, 4))        # Create an empty Matplotlib Figure and Axes
      air_quality.plot.area(ax=axs)                   # Use pandas to put the area plot on the prepared Figure/Axes
      axs.set_ylabel("NO$_2$ concentration")          # Do any Matplotlib customization you like
      fig.savefig("no2_concentrations.png")           # Save the Figure/Axes using the existing Matplotlib method.
      plt.show()                                      # Display the plot

       

       

      ์ •๋ฆฌ

      .plot.* ๋ฉ”์„œ๋“œ๋Š” Series์™€ DataFrames ๋ชจ๋‘์— ์ ์šฉ ๊ฐ€๋Šฅ
      ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฐ ์—ด์€ ๋‹ค๋ฅธ ์š”์†Œ(์„ , ๋ฐ•์Šคํ”Œ๋กฏ, ...)๋กœ ํ‘œ์‹œ
      ํŒ๋‹ค์Šค์—์„œ ์ƒ์„ฑ๋œ ๋ชจ๋“  ๊ทธ๋ž˜ํ”„๋Š” Matplotlib ๊ฐ์ฒด

       

       

       

      ๋ณธ ๋‚ด์šฉ์€ ๊ณต๋ถ€ ๊ธฐ๋ก์šฉ์œผ๋กœ ์ถœ์ฒ˜๋Š” ํŒ๋‹ค์Šค์˜ ๊ณต์‹๋ฌธ์„œ(How do I create plots in pandas?) ์ž…๋‹ˆ๋‹ค