kwutil.util_json module

json utilities for debugging serializability and attempting to ensure it in some cases.

kwutil.util_json.debug_json_unserializable(data, msg='')[source]

Raises an exception if the data is not serializable and prints information about it. This is a thin wrapper around find_json_unserializable().

kwutil.util_json.ensure_json_serializable(dict_, normalize_containers=False, verbose=0, unhandled_policy='keep')[source]

Attempt to convert common types (e.g. numpy) into something json compliant

Convert numpy and tuples into lists. Attempts to decode bytes as utf8, but will skip if this is not possible.

Parameters:
  • dict_ (List | Dict) – A data structure nearly compatible with json. (todo: rename arg)

  • normalize_containers (bool) – if True, normalizes dict containers to be standard python structures. Defaults to False.

  • unhandled_policy (str) – What to do if there isn’t a straighforward way to convert to a serializable structure. Can be “keep”, “error” or “stringify”.

Returns:

normalized data structure that should be entirely json serializable.

Return type:

Dict | List

Note

This was ported from kwcoco.util

Example

>>> from kwutil.util_json import *  # NOQA
>>> assert ensure_json_serializable([]) == []
>>> assert ensure_json_serializable({}) == {}
>>> data = [pathlib.Path('.')]
>>> assert ensure_json_serializable(data) == ['.']
>>> assert ensure_json_serializable(data) != data

Example

>>> # by default non-serializable objects are kept-as-is
>>> data = [[], {}, object(), (1, 2)]
>>> ensure_json_serializable(data)
>>> ensure_json_serializable(data, unhandled_policy='stringify')
>>> #ensure_json_serializable(data, unhandled_policy='pickle')
>>> import pytest
>>> with pytest.raises(Exception):
>>>     ensure_json_serializable(data, unhandled_policy='error')

Example

>>> # xdoctest: +REQUIRES(module:numpy)
>>> from kwutil.util_json import *  # NOQA
>>> data = ub.ddict(lambda: int)
>>> data['foo'] = ub.ddict(lambda: int)
>>> data['bar'] = np.array([1, 2, 3])
>>> data['foo']['a'] = 1
>>> data['foo']['b'] = (1, np.array([1, 2, 3]), {3: np.int32(3), 4: np.float16(1.0)})
>>> dict_ = data
>>> print(ub.urepr(data, nl=-1))
>>> assert list(find_json_unserializable(data))
>>> result = ensure_json_serializable(data, normalize_containers=True)
>>> print(ub.urepr(result, nl=-1))
>>> assert not list(find_json_unserializable(result))
>>> assert type(result) is dict
kwutil.util_json.find_json_unserializable(data, quickcheck=False)[source]

Recurse through json datastructure and find any component that causes a serialization error. Record the location of these errors in the datastructure as we recurse through the call tree.

Parameters:
  • data (object) – data that should be json serializable

  • quickcheck (bool) – if True, check the entire datastructure assuming its ok before doing the python-based recursive logic.

Returns:

list of “bad part” dictionaries containing items

’value’ - the value that caused the serialization error

’loc’ - which contains a list of key/indexes that can be used to lookup the location of the unserializable value. If the “loc” is a list, then it indicates a rare case where a key in a dictionary is causing the serialization error.

Return type:

List[Dict]

Note

This was ported from kwcoco.util

Example

>>> # xdoctest: +REQUIRES(module:numpy)
>>> from kwutil.util_json import *  # NOQA
>>> part = ub.ddict(lambda: int)
>>> part['foo'] = ub.ddict(lambda: int)
>>> part['bar'] = np.array([1, 2, 3])
>>> part['foo']['a'] = 1
>>> # Create a dictionary with two unserializable parts
>>> data = [1, 2, {'nest1': [2, part]}, {frozenset({'badkey'}): 3, 2: 4}]
>>> parts = list(find_json_unserializable(data))
>>> print('parts = {}'.format(ub.urepr(parts, nl=1)))
>>> # Check expected structure of bad parts
>>> assert len(parts) == 2
>>> part = parts[1]
>>> assert list(part['loc']) == [2, 'nest1', 1, 'bar']
>>> # We can use the "loc" to find the bad value
>>> for part in parts:
>>>     # "loc" is a list of directions containing which keys/indexes
>>>     # to traverse at each descent into the data structure.
>>>     directions = part['loc']
>>>     curr = data
>>>     special_flag = False
>>>     for key in directions:
>>>         if isinstance(key, list):
>>>             # special case for bad keys
>>>             special_flag = True
>>>             break
>>>         else:
>>>             # normal case for bad values
>>>             curr = curr[key]
>>>     if special_flag:
>>>         assert part['data'] in curr.keys()
>>>         assert part['data'] is key[1]
>>>     else:
>>>         assert part['data'] is curr

Example

>>> # xdoctest: +SKIP("TODO: circular ref detect algo is wrong, fix it")
>>> from kwutil.util_json import *  # NOQA
>>> import pytest
>>> # Test circular reference
>>> data = [[], {'a': []}]
>>> data[1]['a'].append(data)
>>> with pytest.raises(ValueError, match="Circular reference detected at.*1, 'a', 1*"):
...     parts = list(find_json_unserializable(data))
>>> # Should be ok here
>>> shared_data = {'shared': 1}
>>> data = [[shared_data], shared_data]
>>> parts = list(find_json_unserializable(data))
class kwutil.util_json.Json[source]

Bases: object

Similar to kwutil.Yaml, the Json class provides a set of helpers to make working with json easier.

Example

>>> from kwutil.util_json import Json
>>> import ubelt as ub
>>> unserializable_data = {
>>>     'a': 'hello world',
>>>     'b': ub.udict({'a': 3}),
>>>     'c': ub.Path('a/path/object'),
>>> }
>>> data = Json.ensure_serializable(unserializable_data)
>>> text1 = Json.dumps(data, backend='stdlib')
>>> # Coerce is idempotent and resolves the input to nested Python
>>> # structures.
>>> resolved1 = Json.coerce(data)
>>> resolved2 = Json.coerce(text1)
>>> resolved3 = Json.coerce(resolved2)
>>> assert resolved1 == resolved2 == resolved3 == data
>>> # with stdlib
>>> data2 = Json.loads(text1)
>>> assert data2 == data
>>> # with ujson
>>> # xdoctest: +REQUIRES(module:ujson)
>>> data2 = Json.loads(text1, backend='ujson')
>>> assert data2 == data
>>> # with orjson
>>> # xdoctest: +REQUIRES(module:orjson)
>>> data2 = Json.loads(text1, backend='orjson')
>>> assert data2 == data
static _load_filepointer(filepointer, backend='stdlib')[source]
static load(file, backend='stdlib')[source]

Load json from a filepointer or filepath.

Parameters:

file (Path | str | _io._IOBase) – a path to a file, or an open file descriptor in bytes or str mode. bytes mode is more efficient.

Example

>>> import kwutil
>>> import io
>>> # test loading from string or byte file pointers
>>> data = b'["hello", {"from": "json"}]'
>>> r1 = kwutil.Json.load(io.BytesIO(data), backend='stdlib')
>>> r2 = kwutil.Json.load(io.StringIO(data.decode()), backend='stdlib')
>>> # xdoctest: +REQUIRES(module:ujson)
>>> r3 = kwutil.Json.load(io.BytesIO(data), backend='ujson')
>>> r4 = kwutil.Json.load(io.StringIO(data.decode()), backend='ujson')
>>> # xdoctest: +REQUIRES(module:orjson)
>>> r3 = kwutil.Json.load(io.BytesIO(data), backend='orjson')
>>> r4 = kwutil.Json.load(io.StringIO(data.decode()), backend='orjson')
>>> assert r1 == r2 == r3 == r4
static loads(text, backend='stdlib')[source]

Decode json from bytes or text

static dump(data, fp, backend='stdlib', **kwargs)[source]

Write json data to a file with a chosen backend.

Parameters:
  • data (dict | list | int | float | str) – json serializable data.

  • fp (PathLike | IO) – Where to write the data

  • backend (str) – stdlib, ujson, or orjson

  • **kwargs – additional arguments to pass to the specific backend.

static dumps(data, backend='stdlib', **kwargs)[source]

Convert json data to text with a chosen backend.

Parameters:
  • data (dict | list | int | float | str) – json serializable data.

  • backend (str) – stdlib, ujson, or orjson

  • **kwargs – additional arguments to pass to the specific backend.

classmethod coerce(data, backend='stdlib', path_policy='existing_file_with_extension')[source]

Example

>>> from kwutil.util_json import Json
>>> import ubelt as ub
>>> Json.coerce('[1, 2, 3]')
[1, 2, 3]
>>> fpath = ub.Path.appdir('kwutil/tests/util_json').ensuredir() / 'file.json'
>>> fpath.write_text(Json.dumps([4, 5, 6]))
>>> Json.coerce(fpath)
[4, 5, 6]
>>> Json.coerce(str(fpath))
[4, 5, 6]
>>> dict(Json.coerce('{"a": "b", "c": "d"}'))
{'a': 'b', 'c': 'd'}
>>> Json.coerce(None)
None
classmethod find_unserializable(data, quickcheck=False)[source]

Example

>>> import kwutil
>>> import ubelt as ub
>>> data = {
>>>     'a': 1,
>>>     'b': 2,
>>>     'c': ub.Path('/pathlib/object')
>>> }
>>> results = list(kwutil.Json.find_unserializable(data))
>>> print(f'results = {ub.urepr(results, nl=1)}')
results = [
    {'loc': ['c'], 'data': Path('/pathlib/object')},
]
classmethod ensure_serializable(dict_, normalize_containers=False, verbose=0, unhandled_policy='keep')[source]

Example

>>> import kwutil
>>> import pathlib
>>> data = {
>>>     'a': 1,
>>>     'b': 2,
>>>     'c': pathlib.Path('/pathlib/object')
>>> }
>>> results = kwutil.Json.ensure_serializable(data)
>>> print(f'results = {ub.urepr(results, nl=1)}')
results = {
    'a': 1,
    'b': 2,
    'c': '/pathlib/object',
}
classmethod debug_unserializable(data, msg='')[source]

Raises an exception if the data is not serializable and prints information about it. This is a thin wrapper around Json.find_unserializable().

Example

>>> import kwutil
>>> import ubelt as ub
>>> data = {
>>>     'a': 1,
>>>     'b': 2,
>>>     'c': ub.Path('/pathlib/object')
>>> }
>>> try:
>>>     kwutil.Json.debug_unserializable(data, 'obj had non-json data at: ')
>>> except Exception as ex:
>>>     print(f'Exception: {ex}')
Exception: obj had non-json data at: [
    {'loc': ['c'], 'data': Path('/pathlib/object')},
]