kwutil.copy_manager module

DEPRECATED.

CopyManager has moved to fsops_managers.

class kwutil.copy_manager.CopyManager(workers=0, mode='thread', eager=False, overwrite=False, skip_existing=False)[source]

Bases: _FilesystemOperationManager

Helper to execute multiple copy operations on a local filesystem.

Notes

It would be nice for this to support an rsync backend that could sync at the src/dst pair level. Not sure if this works.

References

https://unix.stackexchange.com/questions/133995/rsyncing-multiple-src-dest-pairs https://serverfault.com/questions/163859/using-rsync-as-a-queue https://unix.stackexchange.com/questions/602606/rsync-source-list-to-destination-list

Todo

  • [ ] Add optional check that all src paths exist

  • [ ] Add optional check that all dst paths do not exist (unless overwrite=True or skip_existing=True)

  • [ ] Add optional check that that no dst path is or is inside of a src

    dpath (would make things ambiguous), the operation graph should be bipartite.

  • [ ] Add backend that uses a fast protocol like rsync (or write one in Rust)

Example

>>> import ubelt as ub
>>> from kwutil.fsops_managers import CopyManager
>>> dpath = ub.Path.appdir('kwutil', 'tests', 'copy_manager')
>>> src_dpath = (dpath / 'src').ensuredir()
>>> dst_dpath = (dpath / 'dst').delete()
>>> src_fpaths = [src_dpath / 'file{}.txt'.format(i) for i in range(10)]
>>> for fpath in src_fpaths:
>>>     fpath.touch()
>>> # To use a copy manager, iterate through your source and
>>> # destination paths and submit them.
>>> copyman = CopyManager(workers=0)
>>> # by default it will do nothing
>>> # unless you specify eager=True or explicitly call run.
>>> for fpath in src_fpaths:
>>>     dst = fpath.augment(dpath=dst_dpath)
>>>     copyman.submit(fpath, dst)
>>> report = copyman.report()
>>> print(f'report = {ub.urepr(report, nl=1)}')
>>> copyman.run()

Example

>>> import ubelt as ub
>>> from kwutil.fsops_managers import CopyManager
>>> dpath = ub.Path.appdir('kwutil', 'tests', 'copy_manager')
>>> src_dpath = (dpath / 'src').ensuredir()
>>> dst_dpath = (dpath / 'dst').delete()
>>> src_fpaths = [src_dpath / 'file{}.txt'.format(i) for i in range(10)]
>>> for fpath in src_fpaths:
>>>     fpath.touch()
>>> copyman = CopyManager(workers=0)
>>> for fpath in src_fpaths:
>>>     dst = fpath.augment(dpath=dst_dpath)
>>>     copyman.submit(fpath, dst)
>>> copyman.run()
>>> assert len(dst_dpath.ls()) == len(src_dpath.ls())
>>> copyman = CopyManager(workers=0)
>>> for fpath in src_fpaths:
>>>     dst = fpath.augment(dpath=dst_dpath)
>>>     copyman.submit(fpath, dst)
>>> import pytest
>>> with pytest.raises(FileExistsError):
>>>     copyman.run()
>>> copyman = CopyManager(workers=0)
>>> for fpath in src_fpaths:
>>>     dst = fpath.augment(dpath=dst_dpath)
>>>     copyman.submit(fpath, dst, skip_existing=True)
>>> copyman.run()
Parameters:
  • workers (int) – number of parallel workers to use

  • mode (str) – thread, process, or serial

  • eager (bool) – if True starts copying as soon as a job is submitted, otherwise it wait until run is called.

  • overwrite (bool) – if True will overwrite the file if it exists, otherwise it will error unless skip_existing is True. Defaults to False.

  • skip_existing (bool) – if jobs where the destination already exists should be skipped by default. Default=False

_operation_name = 'copy'
_unsubmitted_report()[source]

Build a report on the unsubmitted jobs.

_worker_func(dst, skip_existing, overwrite, follow_file_symlinks, follow_dir_symlinks, meta)
Parameters:
  • str (PathLike | str)

  • dst (PathLike | str)

  • overwrite (bool)

  • skip_existing (bool)

report()[source]
submit(src, dst, skip_existing=False, overwrite=None, follow_file_symlinks=False, follow_dir_symlinks=False, meta='stats')[source]
Parameters:
  • src (str | PathLike) – source file or directory

  • dst (str | PathLike) – destination file or directory

  • skip_existing (bool | None) – if jobs where the destination already exists should be skipped by default. If None, then uses the class default. Default=None

  • overwrite (bool | None) – if True will overwrite the file if it exists, otherwise it will error unless skip_existing is True. If None, then uses the class default. Default=None.

  • follow_file_symlinks (bool) – If True and src is a link, the link will be resolved before it is copied (i.e. the data is duplicated), otherwise just the link itself will be copied.

  • follow_dir_symlinks (bool) – if True when src is a directory and contains symlinks to other directories, the contents of the linked data are copied, otherwise when False only the link itself is copied.

  • meta (str | None) – Indicates what metadata bits to copy. This can be ‘stats’ which tries to copy all metadata (i.e. like shutil.copy2()), ‘mode’ which copies just the permission bits (i.e. like shutil.copy()), or None, which ignores all metadata (i.e. like shutil.copyfile()).