numpy.memmap#

class numpy.memmap(filename, dtype=<class 'numpy.ubyte'>, mode='r+', offset=0, shape=None, order='C')[source]#

建立記憶體對應到磁碟上 *二進位* 檔案中儲存的陣列。

記憶體對應檔案用於存取磁碟上大型檔案的小片段，而無需將整個檔案讀入記憶體。NumPy 的 memmap 是類陣列物件。這與 Python 的 mmap 模組不同，後者使用類檔案物件。

ndarray 的這個子類別與某些操作有一些不好的互動，因為它作為子類別並不完全適合。使用這個子類別的替代方案是自行建立 mmap 物件，然後使用 ndarray.__new__ 直接建立 ndarray，並將建立的物件傳遞到其 'buffer=' 參數中。

這個類別在未來某個時間點可能會被轉換為工廠函數，該函數會傳回 mmap 緩衝區的視圖。

刷新 memmap 實例以將變更寫入檔案。目前沒有 API 可以關閉底層的 mmap。很難確保資源確實已關閉，因為它可能在不同的 memmap 實例之間共享。

參數:

filenamestr, file-like object, or pathlib.Path instance

要用作陣列資料緩衝區的檔案名稱或檔案物件。

dtypedata-type, optional

用於解譯檔案內容的資料型別。預設值為 uint8。

mode{‘r+’, ‘r’, ‘w+’, ‘c’}, optional

檔案會以此模式開啟

‘r’	開啟現有檔案以唯讀。
‘r+’	開啟現有檔案以讀寫。
‘w+’	建立或覆寫現有檔案以讀寫。如果 `mode == 'w+'`，則也必須指定 `shape`。
‘c’	寫入時複製：賦值會影響記憶體中的資料，但變更不會儲存到磁碟。磁碟上的檔案為唯讀。

預設值為 'r+'。

offsetint, optional

在檔案中，陣列資料從此偏移量開始。由於 offset 以位元組為單位測量，因此它通常應為 dtype 的位元組大小的倍數。當 mode != 'r' 時，即使超出檔案結尾的正偏移量也有效；檔案將會擴展以容納額外的資料。預設情況下，即使 filename 是檔案指標 fp 且 fp.tell() != 0，memmap 也會從檔案的開頭開始。

shapeint or sequence of ints, optional

陣列的所需形狀。如果 mode == 'r' 且 offset 之後剩餘的位元組數不是 dtype 的位元組大小的倍數，則必須指定 shape。預設情況下，傳回的陣列將為 1 維，元素數量由檔案大小和資料型別決定。

Changed in version 2.0: 在版本 2.0 中變更：shape 參數現在可以是任何整數序列型別，先前型別僅限於 tuple 和 int。

order{‘C’, ‘F’}, optional

指定 ndarray 記憶體佈局的順序：row-major (C 風格) 或 column-major (Fortran 風格)。只有在形狀大於 1 維時才有效。預設順序為 'C'。

另請參閱

lib.format.open_memmap: 建立或載入記憶體對應的 .npy 檔案。

說明

memmap 物件可以用於任何接受 ndarray 的地方。給定一個 memmap fp，isinstance(fp, numpy.ndarray) 會傳回 True。

在 32 位元系統上，記憶體對應檔案不能大於 2GB。

當 memmap 導致在檔案系統中建立或擴展超出其目前大小的檔案時，新部分的內容未指定。在具有 POSIX 檔案系統語意的系統上，擴展部分將會填滿零位元組。

範例

>>> import numpy as np
>>> data = np.arange(12, dtype='float32')
>>> data.resize((3,4))

此範例使用暫存檔，以便 doctest 不會將檔案寫入您的目錄。您會使用「正常」的檔案名稱。

>>> from tempfile import mkdtemp
>>> import os.path as path
>>> filename = path.join(mkdtemp(), 'newfile.dat')

建立一個 memmap，其 dtype 和 shape 符合我們的資料

>>> fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
>>> fp
memmap([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], dtype=float32)

將資料寫入 memmap 陣列

>>> fp[:] = data[:]
>>> fp
memmap([[  0.,   1.,   2.,   3.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.]], dtype=float32)

>>> fp.filename == path.abspath(filename)
True

將記憶體變更刷新到磁碟，以便將其讀回

>>> fp.flush()

載入 memmap 並驗證資料已儲存

>>> newfp = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))
>>> newfp
memmap([[  0.,   1.,   2.,   3.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.]], dtype=float32)

唯讀 memmap

>>> fpr = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))
>>> fpr.flags.writeable
False

寫入時複製 memmap

>>> fpc = np.memmap(filename, dtype='float32', mode='c', shape=(3,4))
>>> fpc.flags.writeable
True

可以賦值給寫入時複製陣列，但值只會寫入陣列的記憶體副本，而不會寫入磁碟

>>> fpc
memmap([[  0.,   1.,   2.,   3.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.]], dtype=float32)
>>> fpc[0,:] = 0
>>> fpc
memmap([[  0.,   0.,   0.,   0.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.]], dtype=float32)

磁碟上的檔案未變更

>>> fpr
memmap([[  0.,   1.,   2.,   3.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.]], dtype=float32)

memmap 中的偏移量

>>> fpo = np.memmap(filename, dtype='float32', mode='r', offset=16)
>>> fpo
memmap([  4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.], dtype=float32)

屬性:

filenamestr or pathlib.Path instance: 對應檔案的路徑。
offsetint: 檔案中的偏移位置。
modestr: 檔案模式。

方法

flush()

將陣列中的任何變更寫入磁碟上的檔案。