kwutil.process_context module

Defines the ProcessContext object, which is what mlops expects jobs to be wrapped in.

Todo

  • [ ] Make “most” telemetry opt-in

class kwutil.process_context.ProcessContext(name=None, type='process', args=None, config=None, extra=None, track_emissions=False, request_all_telemetry=True, request_most_telemetry=True, output_dpath=None, output_fpath=None)[source]

Bases: object

Context manager to track the context under which a result was computed.

This tracks things like start / end time. The command line that can reproduce the process (assuming an appropriate environment. The configuration the process was run with. The machine details the process was run on. The power usage / carbon emissions the process used, and other information.

Parameters:
  • args (str | List[str]) – This should be the sys.argv or the command line string that can be used to rerun the process

  • config (Dict) – This should be a configuration dictionary (likely based on sys.argv)

  • name (str) – the name of this process

  • type (str) – The type of this process (usually keep the default of process)

  • request_all_telemetry (bool) – if False, telemetry is disabled. This is forced to False if PROCESS_CONTEXT_DISABLE_MOST_TELEMETRY is in the environment.

  • request_most_telemetry (bool) – if False, telemetry is disabled. This is forced to False if PROCESS_CONTEXT_DISABLE_ALL_TELEMETRY is in the environment.

Note

This module provides telemetry, which records user-identifiable information. While useful, it does raise ethical concerns about user privacy, and the people running this code have a right to know about it and opt out. Notably, this module simply records the information, but does not send it anywhere. As such, a default opt-in is reasonable, but any future work that sends this information anywhere must be opt-out by default.

Note

There are two levels of telemetry.

Environment telemetry. These are things like the machine the code was run on. Use PROCESS_CONTEXT_DISABLE_MOST_TELEMETRY=0 to opt-out.

The start / stop / sys.argv / config objects are necessary for mlops to do anything. But these can leak information by containing system paths. Emissions is also in this category. Use PROCESS_CONTEXT_DISABLE_ALL_TELEMETRY to opt out.

CommandLine

xdoctest -m kwutil.process_context ProcessContext

Example

>>> # xdoctest: +REQUIRES(module:psutil)
>>> from kwutil.process_context import *
>>> import rich
>>> # Adding things like disk info an tracking emission usage
>>> self = ProcessContext(track_emissions='offline')
>>> obj1 = self.start().stop()
>>> self.add_disk_info('.')
>>> #
>>> # Telemetry can be mostly disabled
>>> self = ProcessContext(track_emissions='offline', request_most_telemetry=False)
>>> obj2 = self.start().stop()
>>> self.add_disk_info('.')
>>> # Telemetry can be completely disabled
>>> self = ProcessContext(track_emissions='offline', request_all_telemetry=False)
>>> obj3 = self.start().stop()
>>> self.add_disk_info('.')
>>> rich.print('full_telemetry = {}'.format(ub.urepr(obj1, nl=3)))
>>> rich.print('some_telemetry = {}'.format(ub.urepr(obj2, nl=3)))
>>> rich.print('no_telemetry = {}'.format(ub.urepr(obj3, nl=3)))

Example

>>> # xdoctest: +REQUIRES(module:psutil)
>>> from kwutil.process_context import *
>>> # flush can measure intermediate progress
>>> self = ProcessContext(track_emissions=True)
>>> self.add_disk_info('.')
>>> obj1 = self.start().flush()
>>> obj1_orig = obj1.copy()
>>> obj2 = self.stop()
_infer_static_properties(func)[source]
_infer_dynamic_properties(func, args, kwargs)[source]
property is_running

Has the context object started and not yet been stopped?

property is_started

Has the context object ever started? This can still return True if it has stopped.

dump()[source]
write_invocation(invocation_fpath)[source]

Write a helper file that contains a locally reproducible invocation of this process.

_timestamp()[source]
_hostinfo()[source]
_osinfo()[source]
_pyinfo()[source]
_meminfo()[source]
_cpuinfo()[source]
_gpuinfo()[source]
_machine()[source]
start()[source]
flush()[source]
stop()[source]
_start_emissions_tracker()[source]
_flush_emissions_tracker()[source]
_stop_emissions_tracker()[source]
_device_info(device)[source]
add_device_info(device)[source]

Add information about a torch device that was used in this process.

Does nothing if telemetry is disabled.

Parameters:

device (torch.device) – torch device to add info about

Example

>>> # xdoctest: +REQUIRES(module:torch)
>>> from kwutil.process_context import *
>>> import torch
>>> import rich
>>> device = torch.device(0) if torch.cuda.is_available() else torch.device('cpu')
>>> # Adding things like disk info an tracking emission usage
>>> self = ProcessContext(track_emissions='offline')
>>> obj1 = self.start().stop()
>>> self.add_disk_info('.')
>>> self.add_device_info(device)
>>> #
>>> # Telemetry can be mostly disabled
>>> self = ProcessContext(track_emissions='offline', request_most_telemetry=False)
>>> obj2 = self.start().stop()
>>> self.add_disk_info('.')
>>> self.add_device_info(device)
>>> # Telemetry can be completely disabled
>>> self = ProcessContext(track_emissions='offline', request_all_telemetry=False)
>>> obj3 = self.start().stop()
>>> self.add_disk_info('.')
>>> self.add_device_info(device)
>>> rich.print('full_telemetry = {}'.format(ub.urepr(obj1, nl=3)))
>>> rich.print('some_telemetry = {}'.format(ub.urepr(obj2, nl=3)))
>>> rich.print('no_telemetry = {}'.format(ub.urepr(obj3, nl=3)))
add_disk_info(path)[source]

Add information about a storage disk that was used in this process

Does nothing if telemetry is disabled.

kwutil.process_context.jsonify_config(config)[source]

Converts an object to a jsonifiable config as best as possible

class kwutil.process_context.Reconstruction[source]

Bases: object

kwutil.process_context.main()[source]

Simple CLI to get hardware measurements that process context would provide.