5일차

06장 apply

06 apply() 메서드로 함수 적용하기¶

apply()는 사용자가 작성한 함수를 데이터프레임의 각 행과 열에 적용할 수 있게 해주는 메서드 for문이나 mpa()을 이용해서 할 수 있지만 대용량 데이터를 처리할 때는 for문보다 빠르므로 apply() 메서드 사용법은 반드시 알아야 합니다.

In [1]:

def my_sq(x):
    return x ** 2


def avg_2(x, y):
    """두 숫자의 평균을 구하는 함수
    :param x : x
    :parameter y: y
    """
    return (x + y) / 2

In [2]:

import pandas as pd

df = pd.DataFrame({
    "a": [10, 20, 30],
    "b": [20, 30, 40]
})

df

Out[2]:

	a	b
0	10	20
1	20	30
2	30	40

In [3]:

df['a'] ** 2

Out[3]:

0    100
1    400
2    900
Name: a, dtype: int64

In [4]:

type(df['a'])

Out[4]:

pandas.core.series.Series

In [5]:

type(df.iloc[0])

Out[5]:

pandas.core.series.Series

In [6]:

df['a'].apply(my_sq)

Out[6]:

0    100
1    400
2    900
Name: a, dtype: int64

In [7]:

def my_exp(x, e):
    return x ** e

In [8]:

df['a'].apply(my_exp, e=2)

Out[8]:

0    100
1    400
2    900
Name: a, dtype: int64

In [9]:

def print_me(x):
    print(x)

In [10]:

df.apply(print_me, axis=0)

0    10
1    20
2    30
Name: a, dtype: int64
0    20
1    30
2    40
Name: b, dtype: int64

Out[10]:

a    None
b    None
dtype: object

In [11]:

df.apply(print_me, axis=1)

a    10
b    20
Name: 0, dtype: int64
a    20
b    30
Name: 1, dtype: int64
a    30
b    40
Name: 2, dtype: int64

Out[11]:

0    None
1    None
2    None
dtype: object

In [12]:

def avg_3(x, y, z):
    return (x + y + z) / 3

In [13]:

print(df.apply(avg_3))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 print(df.apply(avg_3))

File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:9423, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
   9412 from pandas.core.apply import frame_apply
   9414 op = frame_apply(
   9415     self,
   9416     func=func,
   (...)
   9421     kwargs=kwargs,
   9422 )
-> 9423 return op.apply().__finalize__(self, method="apply")

File ~\anaconda3\Lib\site-packages\pandas\core\apply.py:678, in FrameApply.apply(self)
    675 elif self.raw:
    676     return self.apply_raw()
--> 678 return self.apply_standard()

File ~\anaconda3\Lib\site-packages\pandas\core\apply.py:798, in FrameApply.apply_standard(self)
    797 def apply_standard(self):
--> 798     results, res_index = self.apply_series_generator()
    800     # wrap results
    801     return self.wrap_results(results, res_index)

File ~\anaconda3\Lib\site-packages\pandas\core\apply.py:814, in FrameApply.apply_series_generator(self)
    811 with option_context("mode.chained_assignment", None):
    812     for i, v in enumerate(series_gen):
    813         # ignore SettingWithCopy here in case the user mutates
--> 814         results[i] = self.f(v)
    815         if isinstance(results[i], ABCSeries):
    816             # If we have a view on v, we need to make a copy because
    817             #  series_generator will swap out the underlying data
    818             results[i] = results[i].copy(deep=False)

TypeError: avg_3() missing 2 required positional arguments: 'y' and 'z'

In [14]:

def avg_3_apply(col):
    x = col[0]
    y = col[1]
    z = col[2]
    return (x + y + z) / 3

In [17]:

print(df.apply(avg_3_apply, axis=0))

a    20.0
b    30.0
dtype: float64

In [18]:

df = pd.DataFrame({'a': [10, 20, 30],
                   'b': [20, 30, 40]})

print(df)

In [19]:

def avg_2(x, y):
    return (x + y) / 2


print(avg_2(df['a'], df['b']))

0    15.0
1    25.0
2    35.0
dtype: float64

In [20]:

import numpy as np


def avg_2_mod(x, y):
    if x == 20:
        return np.NaN
    else:
        return (x + y) / 2

In [21]:

print(avg_2_mod(df['a'], df['b']))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 print(avg_2_mod(df['a'], df['b']))

Cell In[20], line 5, in avg_2_mod(x, y)
      4 def avg_2_mod(x, y):
----> 5     if x == 20:
      6         return np.NaN
      7     else:

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:1466, in NDFrame.__nonzero__(self)
   1464 @final
   1465 def __nonzero__(self) -> NoReturn:
-> 1466     raise ValueError(
   1467         f"The truth value of a {type(self).__name__} is ambiguous. "
   1468         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1469     )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [22]:

avg_2_mod(10, 20)

Out[22]:

15.0

In [23]:

avg_2_mod_vec = np.vectorize(avg_2_mod)

In [24]:

print(avg_2_mod_vec(df['a'], df['b']))

[15. nan 35.]

In [25]:

@np.vectorize
def v_avg_2_mod(x, y):
    if x == 20:
        return (np.NaN)
    else:
        return (x + y) / 2


print(v_avg_2_mod(df['a'], df['b']))

[15. nan 35.]

In [26]:

# 넘바(numba) 라이브러리는 파이썬 코드, 특히 배열에 적용하는 수학 계산에 최적화되도록 설계되었습니다.

import numba


@numba.vectorize
def v_avg_2_numba(x, y):
    if int(x) == 20:
        return np.NaN
    else:
        return (x + y) / 2

In [27]:

print(v_avg_2_numba(df['a'].values, df['b'].values))

[15. nan 35.]

저작자표시 (새창열림)

'도서 > 프로그래밍' 카테고리의 다른 글

[07] Do it! 데이터 분석을 위한 판다스 입문 (0)	2024.01.08
[06] Do it! 데이터 분석을 위한 판다스 입문 (0)	2024.01.07
[04] Do it! 데이터 분석을 위한 판다스 입문 (0)	2024.01.05
[03] Do it! 데이터 분석을 위한 판다스 입문 (0)	2024.01.04
[02] Do it! 데이터 분석을 위한 판다스 입문 (0)	2024.01.03

늦었다고 안 하면 더 늦어요

[05] Do it! 데이터 분석을 위한 판다스 입문

5일차

06 apply() 메서드로 함수 적용하기¶

'도서 > 프로그래밍' 카테고리의 다른 글

티스토리툴바

[05] Do it! 데이터 분석을 위한 판다스 입문

5일차

06 apply() 메서드로 함수 적용하기¶

'도서 > 프로그래밍' 카테고리의 다른 글

관련글

티스토리툴바