Issue
I am new to python I am trying to build a small script that can collect images from all over the server, I have certain image naming:
AMZ_1004.jpg
AMZ_1272.jpg
GOO_1.jpeg
GOO_2.png
I want the script to look through every directory and copy (not move) the files into AMZ & GOO
import shutil,os
goo_dst = '/home/usr2/Pictures/GOO'
amz_dst = '/home/usr2/Pictures/AMZ'
os.makedirs(goo_dst,exist_ok=1)
os.makedirs(amz_dst,exist_ok=1)
for root, dirs, files in os.walk('/'):
for name in files:
path = os.path.join(root, name)
if name.startswith('GOO_') and (name.endswith('.jpg') or name.endswith('.jpeg') or name.endswith('.png')):
shutil.copyfile(path, goo_dst)
elif name.startswith('AMZ_') and name.endswith('.jpg'):
shutil.copyfile(path, amz_dst)
the script runs ok, is there a way speed the process ?
the script runs on Arch Linux if it matters
Solution
The biggest optimization you can make to the script is not starting your search on the filesystem root.
This method goes over many things that are not files (such as the /dev
and /proc
folders) as well as over system folder where your files are unlikely to exist.
(You don't really expect any images to be under /bin
or /usr/bin
right?)
Try to narrow down the real search path, such as /var/www
which is where Apache folders reside.
Another optimization might be not using Python at all, but instead shell script directly:
#!/bin/sh
GOO_DST='/home/usr2/Pictures/GOO'
AMZ_DST='/home/usr2/Pictures/AMZ'
mkdir -p ${GOO_DST}
mkdir -p ${AMZ_DST}
find / -type f -name 'GOO_*.jpg' -o -name 'GOO_*.jpeg' -o -name 'GOO_*.png' -exec cp {} ${GOO_DST} \;
find / -type f -name 'AMZ_*.jpg' -exec cp {} ${AMZ_DST} \;
The find
utility should give you faster results than manual traversal.
If you insists on using Python, at least move the path = os.path.join(root, name)
to avoid some extra work on files that are not relevant (which is most files).
This is a tiny optimization, but can still help.
Another option would be using multithreading to parallelize the search, but you will need to manually decide which part of the filesystem each thread will search.
If 2 threads go over the same folders, it will be an even bigger waste of time. Also, note that multithreading this script might cause it to take more CPU while running.
Answered By - Lev M.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.