Furthermore, given the Cython-interface, efficient extensions of functionality are easily done. They are more or less drop-in replacements for Python's set. Int64Set, Int32Set, Float64Set, Float32Set ( and PyObjectSet) are implemented. > my_map = Float64toInt64Map() # values are 64bit integers Maps and sets handle nan-correctly (try it out with Python's dict/set): > from cykhash import Float64toInt64Map > unique_array = np.ctypeslib.as_array(unique_buffer) # can be converted to a numpy-array without copying via > unique_buffer = unique_int64(a) # unique element are exposed via buffer-protocol > isin_int64(b, lookup, result) # lookup is reused and not recreatedįinding unique in O(n) (compared to numpy's np.unique - O(n*logn)) and smaller memory-footprint than pandas' pd.unique: # prepare input > isin_int64(b, lookup, result) # running time O(b.size) > lookup = Int64Set_from_buffer(a) # create a hashset > from cykhash import Int64Set_from_buffer, isin_int64 > result = np.empty(b.size, dtype=np.bool_) Quick start Hash set and isinĬreating a hashset and using it in isin: # prepare data: See ( ) for dependencies needed for development. To build the library from source, Cython>=0.28 is required as well as a c-build tool chain. To install the most recent version of the module: pip install Īttention: On Linux/Mac python-dev should be installed for that (see also ) and MSVC on Windows. To install the latest release: pip install cykhash You can also install the library using pip. The recommended way to install the library is via conda package manager using the conda-forge channel: conda install -c conda-forge cykhash For some datatypes the overhead can be reduced by using khash by factor 4-8. Python-set/dict have big memory-footprint. This shortcoming is fixed and efficient (memory- and speedwise compared to pandas') unique and isin are implemented. Numpy's world is lacking the concept of a (hash-)set. Cython wrapper for khash-sets/maps, efficient implementation of isin and unique About:īrings functionality of khash ( ) to Python and Cython and can be used seamlessly in numpy or pandas.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |