解决 undefined symbol: PyUnicodeUCS2_Decode

当Python 加载一些预编译的so库时,时常会出现如下报错:

undefined symbol: PyUnicodeUCS2_Decode

这个问题的原因,Python官方是这样解释的:

When importing module X, why do I get “undefined symbol: PyUnicodeUCS2*”?

You are using a version of Python that uses a 4-byte representation for Unicode characters, but some C extension module you are importing was compiled using a Python that uses a 2-byte representation for Unicode characters (the default).

If instead the name of the undefined symbol starts with PyUnicodeUCS4, the problem is the reverse: Python was built using 2-byte Unicode characters, and the extension module was compiled using a Python with 4-byte Unicode characters.

This can easily occur when using pre-built extension packages. RedHat Linux 7.x, in particular, provided a “python2” binary that is compiled with 4-byte Unicode. This only causes the link failure if the extension uses any of the PyUnicode_*() functions. It is also a problem if an extension uses any of the Unicode-related format specifiers for Py_BuildValue() (or similar) or parameter specifications for PyArg_ParseTuple().

You can check the size of the Unicode character a Python interpreter is using by checking the value of sys.maxunicode:

>>> import sys
>>> if sys.maxunicode > 65535:
...     print 'UCS4 build'
... else:
...     print 'UCS2 build'

The only way to solve this problem is to use extension modules compiled with a Python binary built using the same size for Unicode characters.

 

简单解决办法:

  • 查看你的python编译时是用4-byte Unicode还是2-byte Unicode
  • 重新编译安装你的python 和 so库同样的 byte Unicode

recompile python with --enable-unicode=ucs4

 

设置