๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

- ๐Ÿ—’๏ธ์˜ค๋‹ต๋…ธํŠธ5

[Python]set_index() pandas DataFrame์—์„œ ํŠน์ • ์—ด์„ ์ธ๋ฑ์Šค๋กœ ์„ค์ •ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด DataFrame์˜ ๊ธฐ์กด ์ธ๋ฑ์Šค๋ฅผ ์ƒˆ๋กœ์šด ์—ด๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. import pandas as pd # ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ data = { 'City': ['New York', 'Los Angeles', 'Chicago'], 'Population': [8623000, 3999000, 2716000] } df = pd.DataFrame(data) # City ์—ด์„ ์ธ๋ฑ์Šค๋กœ ์„ค์ • df.set_index('City', inplace=True) print(df) Population City New York 8623000 Los Angeles 3999000 Chicago 2716000 set_index() ํ•จ์ˆ˜์— in.. 2023. 9. 13.
[Python] groupby() groupby()๋Š” pandas ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ œ๊ณตํ•˜๋Š” ํ•จ์ˆ˜๋กœ, ํŠน์ • ์—ด์„ ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ฃนํ™”ํ•˜๊ณ  ์ด์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ์—ฐ์‚ฐ(ํ‰๊ท , ํ•ฉ๊ณ„ ๋“ฑ)์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์›”๋ณ„ ์ฃผ๋ฌธ๊ฑด์ˆ˜ ์•Œ์•„๋‚ด๊ธฐ import pandas as pd # ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ data = { 'order_id': [1, 2, 3, 4, 5], 'order_date': ['2023-01-01', '2023-01-02', '2023-02-01', '2023-02-02', '2023-03-01'], } df = pd.DataFrame(data) # order_date ์—ด์„ datetime ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ df['order_date'] = pd.to_datetime(df['order_date']) # order_date ์—ด์—์„œ ์›” ์ •๋ณด .. 2023. 9. 13.
[Python] info() info() ํ•จ์ˆ˜๋Š” pandas DataFrame์— ๋Œ€ํ•œ ์š”์•ฝ ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜๋Š” DataFrame์˜ ํฌ๊ธฐ, ์ปฌ๋Ÿผ ์ด๋ฆ„, ๋ฐ์ดํ„ฐ ํƒ€์ž…, ๋น„์–ด์žˆ์ง€ ์•Š์€ ๊ฐ’์˜ ๊ฐœ์ˆ˜ ๋“ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. import pandas as pd # ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ data = { 'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, 32], 'City': ['New York', 'Paris', None, 'Berlin'] } df = pd.DataFrame(data) # DataFrame ์ •๋ณด ์ถœ๋ ฅ df.info() RangeIndex: 4 entries, 0 to 3 Data columns (total 3 columns): # Column Non-N.. 2023. 9. 13.
[Python] scatter() scatter() ํ•จ์ˆ˜์— ๋Œ€ํ•ด ๋ฌป๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. Python์˜ matplotlib ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ scatter() ํ•จ์ˆ˜๋Š” ์‚ฐ์ ๋„๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [2, 3, 5, 7, 11] plt.scatter(x, y) plt.show() ๊ฐ ์ ์˜ x์ขŒํ‘œ์™€ y์ขŒํ‘œ๋Š” ๊ฐ๊ฐ x, y ๋ฆฌ์ŠคํŠธ์˜ ์›์†Œ๋“ค์ž…๋‹ˆ๋‹ค. ์˜ต์…˜ ์„ค๋ช… matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, *, edge.. 2023. 9. 10.
[Python] loc() Pandas์˜ loc๋Š” ๋ ˆ์ด๋ธ” ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์„ ํƒ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ธ๋ฑ์Šค์˜ ์ด๋ฆ„์ด๋‚˜ ์—ด ๋ ˆ์ด๋ธ”๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ํ˜•ํƒœ DataFrame.loc[, ] : ์„ ํƒํ•˜๋ ค๋Š” ํ–‰์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹จ์ผ ๋ ˆ์ด๋ธ”, ๋ ˆ์ด๋ธ” ๋ฆฌ์ŠคํŠธ, ๋ ˆ์ด๋ธ” ์Šฌ๋ผ์ด์‹ฑ, ๋ถˆ๋ฆฌ์–ธ ๋ฐฐ์—ด ๋“ฑ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. : ์„ ํƒํ•˜๋ ค๋Š” ์—ด์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋‹จ์ผ ๋ ˆ์ด๋ธ”, ๋ ˆ์ด๋ธ” ๋ฆฌ์ŠคํŠธ, ๋ ˆ์ด๋ธ” ์Šฌ๋ผ์ด์‹ฑ ๋“ฑ์œผ๋กœ ์ง€์ •๋ฉ๋‹ˆ๋‹ค. # 'A' ์—ด ์„ ํƒ df.loc[:, 'A'] # 'A'์™€ 'B'์—ด ์„ ํƒ df.loc[:, ['A', 'B']] # ์ธ๋ฑ์Šค 0๋ถ€ํ„ฐ 2๊นŒ์ง€ ํ–‰๊ณผ ๋ชจ๋“  ์—ด ์„ ํƒ df.loc[0:2,:] # ์กฐ๊ฑด์— ๋”ฐ๋ฅธ ํ–‰๊ณผ ๋ชจ๋“  ์—ด ์„ ํƒ (์˜ˆ: A์—ด ๊ฐ’ > 5) df.loc[df['A'] > 5,:] 1... 2023. 9. 10.