閱讀(1.8k) 書簽贊(1) 我要糾錯(cuò)

Pandas 數(shù)據(jù)重采樣

2022-07-15 10:37 更新

數(shù)據(jù)重采樣是將時(shí)間序列從一個(gè)頻率轉(zhuǎn)換至另一個(gè)頻率的過程，它主要有兩種實(shí)現(xiàn)方式，分別是降采樣和升采樣，降采樣指將高頻率的數(shù)據(jù)轉(zhuǎn)換為低頻率，升采樣則與其恰好相反，說明如下：

方法	說明
降采樣	將高頻率(間隔短)數(shù)據(jù)轉(zhuǎn)換為低頻率(間隔長(zhǎng))。
升采樣	將低頻率數(shù)據(jù)轉(zhuǎn)換為高頻率。

Pandas 提供了 resample() 函數(shù)來實(shí)現(xiàn)數(shù)據(jù)的重采樣。

降采樣

通過 resample() 函數(shù)完成數(shù)據(jù)的降采樣，比如按天計(jì)數(shù)的頻率轉(zhuǎn)換為按月計(jì)數(shù)。

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2021',periods=100,freq='D')
ts = pd.Series(np.random.randn(len(rng)),index=rng)
#降采樣后并聚合
ts.resample('M').mean()

輸出結(jié)果：

2021-01-31    0.210353
2021-02-28   -0.058859
2021-03-31   -0.182952
2021-04-30    0.205254
Freq: M, dtype: float64

如果您只想看到月份，那么您可以設(shè)置kind=period如下所示：

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2021',periods=100,freq='D')
ts = pd.Series(np.random.randn(len(rng)),index=rng)
#降采樣后并聚合
ts.resample('M',kind='period').mean()

輸出結(jié)果：

2021-01   -0.153121
2021-02    0.136231
2021-03   -0.238975
2021-04   -0.309502
Freq: M, dtype: float64

升采樣

升采樣是將低頻率（時(shí)間間隔）轉(zhuǎn)換為高頻率，示例如下：

import pandas as pd
import numpy as np
#生成一份時(shí)間序列數(shù)據(jù)
rng = pd.date_range('1/1/2021', periods=20, freq='3D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
print(ts.head())
#使用asfreq()在原數(shù)據(jù)基礎(chǔ)上實(shí)現(xiàn)頻率轉(zhuǎn)換
print(ts.resample('D').asfreq().head())

輸出結(jié)果：

升采樣前：
2021-01-01    0.608716
2021-01-04    1.097451
2021-01-07   -1.280173
2021-01-10   -0.175065
2021-01-13    1.046831
Freq: 3D, dtype: float64
升采樣后：
2021-01-01    0.608716
2021-01-02         NaN
2021-01-03         NaN
2021-01-04    1.097451
2021-01-05         NaN
Freq: D, dtype: float64

頻率轉(zhuǎn)換

asfreq() 方法不僅能夠?qū)崿F(xiàn)頻率轉(zhuǎn)換，還可以保留原頻率對(duì)應(yīng)的數(shù)值，同時(shí)它也可以單獨(dú)使用，示例如下：

import pandas as pd

index = pd.date_range('1/1/2021', periods=6, freq='T')
series = pd.Series([0.0, None, 2.0, 3.0,4.0,5.0], index=index)
df = pd.DataFrame({'s':series})
print(df.asfreq("45s"))

輸出結(jié)果：

                     num
2021-01-01 00:00:00  0.0
2021-01-01 00:00:45  NaN
2021-01-01 00:01:30  NaN
2021-01-01 00:02:15  NaN
2021-01-01 00:03:00  3.0
2021-01-01 00:03:45  NaN
2021-01-01 00:04:30  NaN

插值處理

從上述示例不難看出，升采樣的結(jié)果會(huì)產(chǎn)生缺失值，那么就需要對(duì)缺失值進(jìn)行處理，一般有以下幾種處理方式：

方法	說明
pad/ffill	用前一個(gè)非缺失值去填充缺失值。
backfill/bfill	用后一個(gè)非缺失值去填充缺失值。
interpolater('linear')	線性插值方法。
fillna(value)	指定一個(gè)值去替換缺失值。

下面使用插值方法處理 NaN 值，示例如下：

import pandas as pd
import numpy as np
#創(chuàng)建時(shí)間序列數(shù)據(jù)
rng = pd.date_range('1/1/2021', periods=20, freq='3D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
print(ts.resample('D').asfreq().head())
#使用ffill處理缺失值
print(ts.resample('D').asfreq().ffill().head())

輸出結(jié)果：

2021-01-01    0.555580
2021-01-02         NaN
2021-01-03         NaN
2021-01-04   -0.079324
2021-01-05         NaN
Freq: D, dtype: float64

#插值處理，注意對(duì)比
2021-01-01    0.555580
2021-01-02    0.555580
2021-01-03    0.555580
2021-01-04   -0.079324
2021-01-05   -0.079324
Freq: D, dtype: float64

以上內(nèi)容是否對(duì)您有幫助：

← Pandas 隨機(jī)抽樣

Pandas分類對(duì)象 →

寫筆記

我要補(bǔ)充