撰寫您自己的 ufunc#
建立新的通用函數#
在閱讀本文之前,先熟悉 Python C 擴展的基礎知識可能會有所幫助,方法是閱讀/略讀擴展和嵌入 Python 直譯器第 1 節以及如何擴展 NumPy中的教學文件
umath 模組是一個電腦產生的 C 模組,用於建立許多 ufunc。它提供了大量關於如何建立通用函數的範例。建立您自己的 ufunc 來使用 ufunc 機制也不困難。假設您有一個函數,想要在其輸入上逐元素操作。透過建立新的 ufunc,您將獲得一個可以處理以下事項的函數:
廣播 (broadcasting)
N 維迴圈
具有最小記憶體用量的自動類型轉換
可選的輸出陣列
建立您自己的 ufunc 並不困難。所有需要的是您想要支援的每種資料類型的 1 維迴圈。每個 1 維迴圈都必須具有特定的簽章,並且只能使用固定大小資料類型的 ufunc。用於建立新的 ufunc 以處理內建資料類型的函數呼叫如下所示。另一種機制用於註冊使用者定義資料類型的 ufunc。
在接下來的幾個章節中,我們將提供範例程式碼,這些程式碼可以輕鬆修改以建立您自己的 ufunc。這些範例是 logit 函數的逐步完整或複雜版本,logit 函數是統計建模中的常見函數。Logit 也很有趣,因為由於 IEEE 標準(特別是 IEEE 754)的魔力,以下建立的所有 logit 函數都會自動具有以下行為。
>>> logit(0)
-inf
>>> logit(1)
inf
>>> logit(2)
nan
>>> logit(-2)
nan
這非常棒,因為函數編寫者不必手動傳播 inf 或 nan。
非 ufunc 擴展範例#
為了比較並普遍啟發讀者,我們提供了一個簡單的 logit
C 擴展實作,它沒有使用 numpy。
為此,我們需要兩個檔案。第一個是包含實際程式碼的 C 檔案,第二個是用於建立模組的 setup.py
檔案。
#define PY_SSIZE_T_CLEAN #include <Python.h> #include <math.h> /* * spammodule.c * This is the C code for a non-numpy Python extension to * define the logit function, where logit(p) = log(p/(1-p)). * This function will not work on numpy arrays automatically. * numpy.vectorize must be called in python to generate * a numpy-friendly function. * * Details explaining the Python-C API can be found under * 'Extending and Embedding' and 'Python/C API' at * docs.python.org . */ /* This declares the logit function */ static PyObject *spam_logit(PyObject *self, PyObject *args); /* * This tells Python what methods this module has. * See the Python-C API for more information. */ static PyMethodDef SpamMethods[] = { {"logit", spam_logit, METH_VARARGS, "compute logit"}, {NULL, NULL, 0, NULL} }; /* * This actually defines the logit function for * input args from Python. */ static PyObject *spam_logit(PyObject *self, PyObject *args) { double p; /* This parses the Python argument into a double */ if(!PyArg_ParseTuple(args, "d", &p)) { return NULL; } /* THE ACTUAL LOGIT FUNCTION */ p = p/(1-p); p = log(p); /*This builds the answer back into a python object */ return Py_BuildValue("d", p); } /* This initiates the module using the above definitions. */ static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, "spam", NULL, -1, SpamMethods, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_spam(void) { PyObject *m; m = PyModule_Create(&moduledef); if (!m) { return NULL; } return m; }
要使用 setup.py file
,請將 setup.py
和 spammodule.c
放在同一個資料夾中。然後 python setup.py build
將建置要匯入的模組,或 python setup.py install
將模組安裝到您的 site-packages 目錄。
''' setup.py file for spammodule.c Calling $python setup.py build_ext --inplace will build the extension library in the current file. Calling $python setup.py build will build a file that looks like ./build/lib*, where lib* is a file that begins with lib. The library will be in this file and end with a C library extension, such as .so Calling $python setup.py install will install the module in your site-packages file. See the setuptools section 'Building Extension Modules' at setuptools.pypa.io for more information. ''' from setuptools import setup, Extension import numpy as np module1 = Extension('spam', sources=['spammodule.c']) setup(name='spam', version='1.0', ext_modules=[module1])
一旦 spam 模組匯入到 python 中,您就可以透過 spam.logit
呼叫 logit。請注意,上面使用的函數不能直接應用於 numpy 陣列。為此,我們必須對其呼叫 numpy.vectorize
。例如,如果在包含 spam 庫的檔案中開啟了 python 直譯器,或者已安裝了 spam,則可以執行以下命令
>>> import numpy as np
>>> import spam
>>> spam.logit(0)
-inf
>>> spam.logit(1)
inf
>>> spam.logit(0.5)
0.0
>>> x = np.linspace(0,1,10)
>>> spam.logit(x)
TypeError: only length-1 arrays can be converted to Python scalars
>>> f = np.vectorize(spam.logit)
>>> f(x)
array([ -inf, -2.07944154, -1.25276297, -0.69314718, -0.22314355,
0.22314355, 0.69314718, 1.25276297, 2.07944154, inf])
產生的 LOGIT 函數並不快! numpy.vectorize
只是簡單地迴圈遍歷 spam.logit
。loop
是在 C 層級完成的,但 numpy 陣列不斷被解析和重建。這是昂貴的。當作者將 numpy.vectorize(spam.logit)
與下面建構的 logit ufunc 進行比較時,logit ufunc 快了幾乎整整 4 倍。當然,較大或較小的加速是可能的,這取決於函數的性質。
單一 dtype 的 NumPy ufunc 範例#
為了簡化,我們給出一個單一 dtype 的 ufunc,即 'f8'
double
。與前一節一樣,我們先給出 .c
檔案,然後給出用於建立包含 ufunc 的模組的 setup.py
檔案。
程式碼中對應於 ufunc 實際計算的位置標記為 /\* BEGIN main ufunc computation \*/
和 /\* END main ufunc computation \*/
。這些行之間的程式碼是建立您自己的 ufunc 時必須更改的主要內容。
#define PY_SSIZE_T_CLEAN #include <Python.h> #include "numpy/ndarraytypes.h" #include "numpy/ufuncobject.h" #include "numpy/npy_3kcompat.h" #include <math.h> /* * single_type_logit.c * This is the C code for creating your own * NumPy ufunc for a logit function. * * In this code we only define the ufunc for * a single dtype. The computations that must * be replaced to create a ufunc for * a different function are marked with BEGIN * and END. * * Details explaining the Python-C API can be found under * 'Extending and Embedding' and 'Python/C API' at * docs.python.org . */ static PyMethodDef LogitMethods[] = { {NULL, NULL, 0, NULL} }; /* The loop definition must precede the PyMODINIT_FUNC. */ static void double_logit(char **args, const npy_intp *dimensions, const npy_intp *steps, void *data) { npy_intp i; npy_intp n = dimensions[0]; char *in = args[0], *out = args[1]; npy_intp in_step = steps[0], out_step = steps[1]; double tmp; for (i = 0; i < n; i++) { /* BEGIN main ufunc computation */ tmp = *(double *)in; tmp /= 1 - tmp; *((double *)out) = log(tmp); /* END main ufunc computation */ in += in_step; out += out_step; } } /* This a pointer to the above function */ PyUFuncGenericFunction funcs[1] = {&double_logit}; /* These are the input and return dtypes of logit.*/ static const char types[2] = {NPY_DOUBLE, NPY_DOUBLE}; static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, "npufunc", NULL, -1, LogitMethods, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_npufunc(void) { PyObject *m, *logit, *d; import_array(); import_umath(); m = PyModule_Create(&moduledef); if (!m) { return NULL; } logit = PyUFunc_FromFuncAndData(funcs, NULL, types, 1, 1, 1, PyUFunc_None, "logit", "logit_docstring", 0); d = PyModule_GetDict(m); PyDict_SetItemString(d, "logit", logit); Py_DECREF(logit); return m; }
這是上述程式碼的 setup.py file
。與之前一樣,可以透過在命令提示字元下呼叫 python setup.py build
來建置模組,或透過 python setup.py install
安裝到 site-packages。模組也可以使用 python setup.py build_ext --inplace
放置到本機資料夾中,例如下面的 npufunc_directory
。
''' setup.py file for single_type_logit.c Note that since this is a numpy extension we add an include_dirs=[get_include()] so that the extension is built with numpy's C/C++ header files. Calling $python setup.py build_ext --inplace will build the extension library in the npufunc_directory. Calling $python setup.py build will build a file that looks like ./build/lib*, where lib* is a file that begins with lib. The library will be in this file and end with a C library extension, such as .so Calling $python setup.py install will install the module in your site-packages file. See the setuptools section 'Building Extension Modules' at setuptools.pypa.io for more information. ''' from setuptools import setup, Extension from numpy import get_include npufunc = Extension('npufunc', sources=['single_type_logit.c'], include_dirs=[get_include()]) setup(name='npufunc', version='1.0', ext_modules=[npufunc])
安裝完成後,可以匯入並按如下方式使用。
>>> import numpy as np
>>> import npufunc
>>> npufunc.logit(0.5)
np.float64(0.0)
>>> a = np.linspace(0,1,5)
>>> npufunc.logit(a)
array([ -inf, -1.09861229, 0. , 1.09861229, inf])
具有多個 dtype 的 NumPy ufunc 範例#
我們最終給出一個完整 ufunc 的範例,其中包含半精度浮點數、浮點數、雙精度浮點數和長雙精度浮點數的內部迴圈。與之前的章節一樣,我們先給出 .c
檔案,然後給出對應的 setup.py
檔案。
程式碼中對應於 ufunc 實際計算的位置標記為 /\* BEGIN main ufunc computation \*/
和 /\* END main ufunc computation \*/
。這些行之間的程式碼是建立您自己的 ufunc 時必須更改的主要內容。
#define PY_SSIZE_T_CLEAN #include <Python.h> #include "numpy/ndarraytypes.h" #include "numpy/ufuncobject.h" #include "numpy/halffloat.h" #include <math.h> /* * multi_type_logit.c * This is the C code for creating your own * NumPy ufunc for a logit function. * * Each function of the form type_logit defines the * logit function for a different numpy dtype. Each * of these functions must be modified when you * create your own ufunc. The computations that must * be replaced to create a ufunc for * a different function are marked with BEGIN * and END. * * Details explaining the Python-C API can be found under * 'Extending and Embedding' and 'Python/C API' at * docs.python.org . * */ static PyMethodDef LogitMethods[] = { {NULL, NULL, 0, NULL} }; /* The loop definitions must precede the PyMODINIT_FUNC. */ static void long_double_logit(char **args, const npy_intp *dimensions, const npy_intp *steps, void *data) { npy_intp i; npy_intp n = dimensions[0]; char *in = args[0], *out = args[1]; npy_intp in_step = steps[0], out_step = steps[1]; long double tmp; for (i = 0; i < n; i++) { /* BEGIN main ufunc computation */ tmp = *(long double *)in; tmp /= 1 - tmp; *((long double *)out) = logl(tmp); /* END main ufunc computation */ in += in_step; out += out_step; } } static void double_logit(char **args, const npy_intp *dimensions, const npy_intp *steps, void *data) { npy_intp i; npy_intp n = dimensions[0]; char *in = args[0], *out = args[1]; npy_intp in_step = steps[0], out_step = steps[1]; double tmp; for (i = 0; i < n; i++) { /* BEGIN main ufunc computation */ tmp = *(double *)in; tmp /= 1 - tmp; *((double *)out) = log(tmp); /* END main ufunc computation */ in += in_step; out += out_step; } } static void float_logit(char **args, const npy_intp *dimensions, const npy_intp *steps, void *data) { npy_intp i; npy_intp n = dimensions[0]; char *in = args[0], *out = args[1]; npy_intp in_step = steps[0], out_step = steps[1]; float tmp; for (i = 0; i < n; i++) { /* BEGIN main ufunc computation */ tmp = *(float *)in; tmp /= 1 - tmp; *((float *)out) = logf(tmp); /* END main ufunc computation */ in += in_step; out += out_step; } } static void half_float_logit(char **args, const npy_intp *dimensions, const npy_intp *steps, void *data) { npy_intp i; npy_intp n = dimensions[0]; char *in = args[0], *out = args[1]; npy_intp in_step = steps[0], out_step = steps[1]; float tmp; for (i = 0; i < n; i++) { /* BEGIN main ufunc computation */ tmp = npy_half_to_float(*(npy_half *)in); tmp /= 1 - tmp; tmp = logf(tmp); *((npy_half *)out) = npy_float_to_half(tmp); /* END main ufunc computation */ in += in_step; out += out_step; } } /*This gives pointers to the above functions*/ PyUFuncGenericFunction funcs[4] = {&half_float_logit, &float_logit, &double_logit, &long_double_logit}; static const char types[8] = {NPY_HALF, NPY_HALF, NPY_FLOAT, NPY_FLOAT, NPY_DOUBLE, NPY_DOUBLE, NPY_LONGDOUBLE, NPY_LONGDOUBLE}; static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, "npufunc", NULL, -1, LogitMethods, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_npufunc(void) { PyObject *m, *logit, *d; import_array(); import_umath(); m = PyModule_Create(&moduledef); if (!m) { return NULL; } logit = PyUFunc_FromFuncAndData(funcs, NULL, types, 4, 1, 1, PyUFunc_None, "logit", "logit_docstring", 0); d = PyModule_GetDict(m); PyDict_SetItemString(d, "logit", logit); Py_DECREF(logit); return m; }
這是上述程式碼的 setup.py
檔案。與之前一樣,可以透過在命令提示字元下呼叫 python setup.py build
來建置模組,或透過 python setup.py install
安裝到 site-packages。
''' setup.py file for multi_type_logit.c Note that since this is a numpy extension we add an include_dirs=[get_include()] so that the extension is built with numpy's C/C++ header files. Furthermore, we also have to include the npymath lib for half-float d-type. Calling $python setup.py build_ext --inplace will build the extension library in the current file. Calling $python setup.py build will build a file that looks like ./build/lib*, where lib* is a file that begins with lib. The library will be in this file and end with a C library extension, such as .so Calling $python setup.py install will install the module in your site-packages file. See the setuptools section 'Building Extension Modules' at setuptools.pypa.io for more information. ''' from setuptools import setup, Extension from numpy import get_include from os import path path_to_npymath = path.join(get_include(), '..', 'lib') npufunc = Extension('npufunc', sources=['multi_type_logit.c'], include_dirs=[get_include()], # Necessary for the half-float d-type. library_dirs=[path_to_npymath], libraries=["npymath"]) setup(name='npufunc', version='1.0', ext_modules=[npufunc])
安裝完成後,可以匯入並按如下方式使用。
>>> import numpy as np
>>> import npufunc
>>> npufunc.logit(0.5)
np.float64(0.0)
>>> a = np.linspace(0,1,5)
>>> npufunc.logit(a)
array([ -inf, -1.09861229, 0. , 1.09861229, inf])
具有多個引數/傳回值的 NumPy ufunc 範例#
我們的最後一個範例是一個具有多個引數的 ufunc。它是用於單一 dtype 資料的 logit ufunc 程式碼的修改版本。我們計算 (A * B, logit(A * B))
。
我們只給出 C 程式碼,因為 setup.py 檔案與單一 dtype 的 NumPy ufunc 範例中的 setup.py
檔案完全相同,除了該行
npufunc = Extension('npufunc', sources=['single_type_logit.c'], include_dirs=[get_include()])
被替換為
npufunc = Extension('npufunc', sources=['multi_arg_logit.c'], include_dirs=[get_include()])
C 檔案如下所示。產生的 ufunc 接受兩個引數 A
和 B
。它傳回一個元組,其第一個元素是 A * B
,第二個元素是 logit(A * B)
。請注意,它會自動支援廣播以及 ufunc 的所有其他屬性。
#define PY_SSIZE_T_CLEAN #include <Python.h> #include "numpy/ndarraytypes.h" #include "numpy/ufuncobject.h" #include "numpy/halffloat.h" #include <math.h> /* * multi_arg_logit.c * This is the C code for creating your own * NumPy ufunc for a multiple argument, multiple * return value ufunc. The places where the * ufunc computation is carried out are marked * with comments. * * Details explaining the Python-C API can be found under * 'Extending and Embedding' and 'Python/C API' at * docs.python.org. */ static PyMethodDef LogitMethods[] = { {NULL, NULL, 0, NULL} }; /* The loop definition must precede the PyMODINIT_FUNC. */ static void double_logitprod(char **args, const npy_intp *dimensions, const npy_intp *steps, void *data) { npy_intp i; npy_intp n = dimensions[0]; char *in1 = args[0], *in2 = args[1]; char *out1 = args[2], *out2 = args[3]; npy_intp in1_step = steps[0], in2_step = steps[1]; npy_intp out1_step = steps[2], out2_step = steps[3]; double tmp; for (i = 0; i < n; i++) { /* BEGIN main ufunc computation */ tmp = *(double *)in1; tmp *= *(double *)in2; *((double *)out1) = tmp; *((double *)out2) = log(tmp / (1 - tmp)); /* END main ufunc computation */ in1 += in1_step; in2 += in2_step; out1 += out1_step; out2 += out2_step; } } /*This a pointer to the above function*/ PyUFuncGenericFunction funcs[1] = {&double_logitprod}; /* These are the input and return dtypes of logit.*/ static const char types[4] = {NPY_DOUBLE, NPY_DOUBLE, NPY_DOUBLE, NPY_DOUBLE}; static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, "npufunc", NULL, -1, LogitMethods, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_npufunc(void) { PyObject *m, *logit, *d; import_array(); import_umath(); m = PyModule_Create(&moduledef); if (!m) { return NULL; } logit = PyUFunc_FromFuncAndData(funcs, NULL, types, 1, 2, 2, PyUFunc_None, "logit", "logit_docstring", 0); d = PyModule_GetDict(m); PyDict_SetItemString(d, "logit", logit); Py_DECREF(logit); return m; }
具有結構化陣列 dtype 引數的 NumPy ufunc 範例#
此範例示範如何為結構化陣列 dtype 建立 ufunc。對於此範例,我們展示了一個微不足道的 ufunc,用於新增兩個具有 dtype 'u8,u8,u8'
的陣列。此過程與其他範例略有不同,因為呼叫 PyUFunc_FromFuncAndData
並未完全註冊自訂 dtype 和結構化陣列 dtype 的 ufunc。我們還需要呼叫 PyUFunc_RegisterLoopForDescr
以完成 ufunc 的設定。
我們只給出 C 程式碼,因為 setup.py
檔案與單一 dtype 的 NumPy ufunc 範例中的 setup.py
檔案完全相同,除了該行
npufunc = Extension('npufunc', sources=['single_type_logit.c'], include_dirs=[get_include()])
被替換為
npufunc = Extension('npufunc', sources=['add_triplet.c'], include_dirs=[get_include()])
C 檔案如下所示。
#define PY_SSIZE_T_CLEAN #include <Python.h> #include "numpy/ndarraytypes.h" #include "numpy/ufuncobject.h" #include "numpy/npy_3kcompat.h" #include <math.h> /* * add_triplet.c * This is the C code for creating your own * NumPy ufunc for a structured array dtype. * * Details explaining the Python-C API can be found under * 'Extending and Embedding' and 'Python/C API' at * docs.python.org. */ static PyMethodDef StructUfuncTestMethods[] = { {NULL, NULL, 0, NULL} }; /* The loop definition must precede the PyMODINIT_FUNC. */ static void add_uint64_triplet(char **args, const npy_intp *dimensions, const npy_intp *steps, void *data) { npy_intp i; npy_intp is1 = steps[0]; npy_intp is2 = steps[1]; npy_intp os = steps[2]; npy_intp n = dimensions[0]; uint64_t *x, *y, *z; char *i1 = args[0]; char *i2 = args[1]; char *op = args[2]; for (i = 0; i < n; i++) { x = (uint64_t *)i1; y = (uint64_t *)i2; z = (uint64_t *)op; z[0] = x[0] + y[0]; z[1] = x[1] + y[1]; z[2] = x[2] + y[2]; i1 += is1; i2 += is2; op += os; } } /* This a pointer to the above function */ PyUFuncGenericFunction funcs[1] = {&add_uint64_triplet}; /* These are the input and return dtypes of add_uint64_triplet. */ static const char types[3] = {NPY_UINT64, NPY_UINT64, NPY_UINT64}; static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, "struct_ufunc_test", NULL, -1, StructUfuncTestMethods, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_npufunc(void) { PyObject *m, *add_triplet, *d; PyObject *dtype_dict; PyArray_Descr *dtype; PyArray_Descr *dtypes[3]; import_array(); import_umath(); m = PyModule_Create(&moduledef); if (m == NULL) { return NULL; } /* Create a new ufunc object */ add_triplet = PyUFunc_FromFuncAndData(NULL, NULL, NULL, 0, 2, 1, PyUFunc_None, "add_triplet", "add_triplet_docstring", 0); dtype_dict = Py_BuildValue("[(s, s), (s, s), (s, s)]", "f0", "u8", "f1", "u8", "f2", "u8"); PyArray_DescrConverter(dtype_dict, &dtype); Py_DECREF(dtype_dict); dtypes[0] = dtype; dtypes[1] = dtype; dtypes[2] = dtype; /* Register ufunc for structured dtype */ PyUFunc_RegisterLoopForDescr(add_triplet, dtype, &add_uint64_triplet, dtypes, NULL); d = PyModule_GetDict(m); PyDict_SetItemString(d, "add_triplet", add_triplet); Py_DECREF(add_triplet); return m; }
傳回的 ufunc 物件是一個可呼叫的 Python 物件。它應放置在(模組)字典中,名稱與 ufunc 建立常式中使用的名稱相同。以下範例改編自 umath 模組
static PyUFuncGenericFunction atan2_functions[] = { PyUFunc_ff_f, PyUFunc_dd_d, PyUFunc_gg_g, PyUFunc_OO_O_method}; static void *atan2_data[] = { (void *)atan2f, (void *)atan2, (void *)atan2l, (void *)"arctan2"}; static const char atan2_signatures[] = { NPY_FLOAT, NPY_FLOAT, NPY_FLOAT, NPY_DOUBLE, NPY_DOUBLE, NPY_DOUBLE, NPY_LONGDOUBLE, NPY_LONGDOUBLE, NPY_LONGDOUBLE NPY_OBJECT, NPY_OBJECT, NPY_OBJECT}; ... /* in the module initialization code */ PyObject *f, *dict, *module; ... dict = PyModule_GetDict(module); ... f = PyUFunc_FromFuncAndData(atan2_functions, atan2_data, atan2_signatures, 4, 2, 1, PyUFunc_None, "arctan2", "a safe and correct arctan(x1/x2)", 0); PyDict_SetItemString(dict, "arctan2", f); Py_DECREF(f); ...