✋ [Python] Pandas Group

Pandas 의 Method 중 groupby() 는 데이터를 그룹별로 분할하여, 독립화된 그룹을 별도의 데이터 처리 하거나

그룹별 통계를 확인 하는데 사용하는 함수 입니다.

위 이미지는 iris(붓꽃) 데이터를 종별 mean 값을 groupby() 메소드를 이용해 데이터 처리한 것 입니다.

1. 먼저 iris 데이터를 pandas method로 가져와 변수에 선언해보자.

import pandas as pd
data_path = '/content/drive/MyDrive/YS_edu/data/'
col_names = ["sepal_length","sepal_width","petal_length","petal_width","sepcise"]
iris = pd.read_csv(data_path + "iris.csv",names = col_names)
iris

2. groupby() 메소드를 사용하여, sepcise 별 평균 값 구하기

agg_dic = {
    "sepal_length" : "mean",
    "sepal_width" : "mean",
    "petal_length" : "mean",
    "petal_width" : "mean" 
           }

iris_grouped = iris.groupby("sepcise").agg(agg_dic)
iris_grouped

3. groupby() 메소드를 사용하여, sepcise 별 최댓값, 최솟값, 평균값 구하기

agg_dic = {
           "sepal_length": ["min", "max", "mean"],
           "sepal_width": ["min", "max", "mean"],
           "petal_length": ["min", "max", "mean"],
           "petal_width": ["min", "max", "mean"]
          }

iris_grouped = iris.groupby("sepcise").agg(agg_dic)
iris_grouped

4-1 iris_grouped.columns 의 index 타입은 mutiIndex 으로 원본 iris 데이터에 파생 변수로 넣지 못한다.

iris_grouped.columns

4-2 MultiIndex 의 데이터 타입은 튜플이며, 아래 코드로 (sepal_length_mean) 형태로 컬럼 변경

iris_grouped.columns = ['_'.join(col).strip() for col in iris_grouped.columns]
iris_grouped

읽어주셔서 감사합니다 !!

저작자표시 비영리 변경금지

TechTalk with KwangHyun

✋ [Python] Pandas Group_by 란 ?

티스토리툴바