kwutil.process_context module¶
Defines the ProcessContext object, which is what mlops expects jobs to
be wrapped in.
Todo
[ ] Make “most” telemetry opt-in
- class kwutil.process_context.ProcessContext(name=None, type='process', args=None, config=None, extra=None, track_emissions=False, request_all_telemetry=True, request_most_telemetry=True, output_dpath=None, output_fpath=None)[source]¶
Bases:
objectContext manager to track the context under which a result was computed.
This tracks things like start / end time. The command line that can reproduce the process (assuming an appropriate environment. The configuration the process was run with. The machine details the process was run on. The power usage / carbon emissions the process used, and other information.
- Parameters:
args (str | List[str]) – This should be the sys.argv or the command line string that can be used to rerun the process
config (Dict) – This should be a configuration dictionary (likely based on sys.argv)
name (str) – the name of this process
type (str) – The type of this process (usually keep the default of process)
request_all_telemetry (bool) – if False, telemetry is disabled. This is forced to False if PROCESS_CONTEXT_DISABLE_MOST_TELEMETRY is in the environment.
request_most_telemetry (bool) – if False, telemetry is disabled. This is forced to False if PROCESS_CONTEXT_DISABLE_ALL_TELEMETRY is in the environment.
Note
This module provides telemetry, which records user-identifiable information. While useful, it does raise ethical concerns about user privacy, and the people running this code have a right to know about it and opt out. Notably, this module simply records the information, but does not send it anywhere. As such, a default opt-in is reasonable, but any future work that sends this information anywhere must be opt-out by default.
Note
There are two levels of telemetry.
Environment telemetry. These are things like the machine the code was run on. Use PROCESS_CONTEXT_DISABLE_MOST_TELEMETRY=0 to opt-out.
The start / stop / sys.argv / config objects are necessary for mlops to do anything. But these can leak information by containing system paths. Emissions is also in this category. Use PROCESS_CONTEXT_DISABLE_ALL_TELEMETRY to opt out.
CommandLine
kernprof -lvp kwutil -m xdoctest -m kwutil.process_context ProcessContext:0
Example
>>> # xdoctest: +REQUIRES(module:psutil) >>> from kwutil.process_context import * >>> import rich >>> # Adding things like disk info an tracking emission usage >>> self = ProcessContext(track_emissions='offline') >>> obj1 = self.start().stop() >>> self.add_disk_info('.') >>> # >>> # Telemetry can be mostly disabled >>> self = ProcessContext(track_emissions='offline', request_most_telemetry=False) >>> obj2 = self.start().stop() >>> self.add_disk_info('.') >>> # Telemetry can be completely disabled >>> self = ProcessContext(track_emissions='offline', request_all_telemetry=False) >>> obj3 = self.start().stop() >>> self.add_disk_info('.') >>> rich.print('full_telemetry = {}'.format(ub.urepr(obj1, nl=3))) >>> rich.print('some_telemetry = {}'.format(ub.urepr(obj2, nl=3))) >>> rich.print('no_telemetry = {}'.format(ub.urepr(obj3, nl=3)))
Example
>>> # xdoctest: +REQUIRES(module:psutil) >>> # xdoctest: +REQUIRES(module:codecarbon) >>> from kwutil.process_context import * >>> # flush can measure intermediate progress >>> self = ProcessContext(track_emissions='offline') >>> self.add_disk_info('.') >>> obj1 = self.start().flush() >>> obj1_orig = obj1.copy() >>> obj2 = self.stop()
- property is_running¶
Has the context object started and not yet been stopped?
- property is_started¶
Has the context object ever started? This can still return True if it has stopped.
- __call__(func)[source]¶
Experimental use as a decorator.
CommandLine
kernprof -lvp -p kwutil -m xdoctest -m kwutil.process_context ProcessContext.__call__
Example
>>> # xdoctest: +REQUIRES(module:psutil) >>> import ubelt as ub >>> dpath = ub.Path.appdir('kwutil/test/process-context') >>> # >>> import kwutil >>> self = kwutil.ProcessContext(output_dpath=dpath) >>> def func(): >>> ... >>> _wrapper = self(func) >>> _wrapper.context >>> _wrapper()
Example
>>> # xdoctest: +REQUIRES(module:psutil) >>> import kwutil >>> import ubelt as ub >>> dpath = ub.Path.appdir('kwutil/test/process-context') >>> @kwutil.ProcessContext(output_dpath=dpath) >>> def myfunc(): >>> ... >>> myfunc() >>> print(f'myfunc.context.obj = {ub.urepr(myfunc.context.obj, nl=3)}')
- write_invocation(invocation_fpath)[source]¶
Write a helper file that contains a locally reproducible invocation of this process.
- add_device_info(device)[source]¶
Add information about a torch device that was used in this process.
Does nothing if telemetry is disabled.
- Parameters:
device (torch.device) – torch device to add info about
Example
>>> # xdoctest: +REQUIRES(module:torch) >>> from kwutil.process_context import * >>> import torch >>> import rich >>> device = torch.device(0) if torch.cuda.is_available() else torch.device('cpu') >>> # Adding things like disk info an tracking emission usage >>> self = ProcessContext(track_emissions='offline') >>> obj1 = self.start().stop() >>> self.add_disk_info('.') >>> self.add_device_info(device) >>> # >>> # Telemetry can be mostly disabled >>> self = ProcessContext(track_emissions='offline', request_most_telemetry=False) >>> obj2 = self.start().stop() >>> self.add_disk_info('.') >>> self.add_device_info(device) >>> # Telemetry can be completely disabled >>> self = ProcessContext(track_emissions='offline', request_all_telemetry=False) >>> obj3 = self.start().stop() >>> self.add_disk_info('.') >>> self.add_device_info(device) >>> rich.print('full_telemetry = {}'.format(ub.urepr(obj1, nl=3))) >>> rich.print('some_telemetry = {}'.format(ub.urepr(obj2, nl=3))) >>> rich.print('no_telemetry = {}'.format(ub.urepr(obj3, nl=3)))