[Python] 데이터를 빠르게 전처리 하는 방법 - (3) Numpy활용: np.where(), np.select()

Python

[Python] 데이터를 빠르게 전처리 하는 방법 - (3) Numpy활용: np.where(), np.select()

xyz1 2022. 6. 5. 14:04

1. np.where()

condition이 2개인 경우 사용 (apply보다 속도가 훨씬 빠름), 이때, values를 쓸 경우 좀 더 speed up 가능(computation이 handle할 대상이 훨씬 줄어들기 때문)

# fraud_reported가 'Y'인 경우 0, 아닌 경우 1
df['fraud_reported'] = np.where(df['fraud_reported'].values=='Y', 0, 1)

2. np.select()

한개의 column에 대해 적용할 condition이 2개보다 더 많을 경우 사용

conditions = [
    df['auto_make'].str.startswith('A'),
    df['auto_make'].isin(['Saab', 'Mercedes', 'Dodge']),
    df['auto_make'].isin(['Chevrolet', 'BMW', 'Jeep', 'Honda'])
]
# 이때, conditions안에 if else조건을 넣고싶은 경우 & 처리해주면 된다.
# ((df['auto_make'] == "Saab") & (df['auto_make'] == "Mercedes") & (df['auto_make'] == "Dodge"))

choices = [
    'Type0',
    'Type1',
    'Type2'
]

# 위 조건이 아닌 경우 Type3으로 설정
df['auto_make'] = np.select(conditions, choices, default='Type3')

더 복잡한 경우 (string, dictionaries, dates, other rows) string의 경우 apply가 더 빠르다.
python dictionary를 look up해야하는 경우가 있을 때, map method를 쓸 수 있다.
- ex) df[’Category’].map(channel_dict)
datetime의 경우 dt accessor를 써준다.
- ex) df[’Date’].dt.days
timedelta로 형변환을 하는 것이다. (dt accessor보다 연산속도가 빠르다.)

더 많은 예시 참고: https://velog.io/@jkl133/1000x-faster-data-manipulation-np.where-np.select

'Python' 카테고리의 다른 글

[Python] 데이터를 빠르게 전처리 하는 방법 - (4) pandas apply 병렬처리 (0)	2022.06.06
[Python] 데이터를 빠르게 전처리 하는 방법 - (2) pandas 읽고 쓰기 비교 (pickle, npz, npy, feature, parquet) (0)	2022.06.04
[Python] 데이터를 빠르게 전처리 하는 방법 - (1) pandas (0)	2022.06.03
[Python] dataclass 모듈 (0)	2022.04.18

현재글[Python] 데이터를 빠르게 전처리 하는 방법 - (3) Numpy활용: np.where(), np.select()

딥러닝, TensorFlow, bitcoin, deep learning, pandas, TensorFlow Extended, error, 머신러닝, ML pipeline, Pipeline, Android, Machine learning, 파이프라인, Flutter, Apache Beam, mlops, TFX, airflow, Google, XAI,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

no

[Python] 데이터를 빠르게 전처리 하는 방법 - (3) Numpy활용: np.where(), np.select()

'Python' 카테고리의 다른 글

'Python'의 다른글

티스토리툴바

[Python] 데이터를 빠르게 전처리 하는 방법 - (3) Numpy활용: np.where(), np.select()

'Python' 카테고리의 다른 글

'Python'의 다른글

관련글

티스토리툴바