Issue
As stated in the title, is there a better way to delete multiple files in python? Currently, I am deleting by looping through each files.
import os
files = ["test_file.txt", "test_failed.txt"]
for file in files:
if os.path.exists(file):
os.remove(file)
Solution
Let's put this in perspective.
We start by disassembling the for
-loop into bytecode using the dis
module:
In [23]: dis.dis('for f in files: os.remove(f)')
1 0 SETUP_LOOP 22 (to 24)
2 LOAD_NAME 0 (files)
4 GET_ITER
>> 6 FOR_ITER 14 (to 22)
8 STORE_NAME 1 (f)
10 LOAD_NAME 2 (os)
12 LOAD_METHOD 3 (remove)
14 LOAD_NAME 1 (f)
16 CALL_METHOD 1
18 POP_TOP
20 JUMP_ABSOLUTE 6
>> 22 POP_BLOCK
>> 24 LOAD_CONST 0 (None)
26 RETURN_VALUE
The only real "inefficency" here (and a small one at that) is the repeated name lookup for os.remove
. So let's get rid of that by creating a local alias for that first.
In [24]: rm = os.remove
Out[24]: <function posix.remove(path, *, dir_fd=None)>
In [25]: dis.dis('for f in files: rm(f)')
1 0 SETUP_LOOP 20 (to 22)
2 LOAD_NAME 0 (files)
4 GET_ITER
>> 6 FOR_ITER 12 (to 20)
8 STORE_NAME 1 (f)
10 LOAD_NAME 2 (rm)
12 LOAD_NAME 1 (f)
14 CALL_FUNCTION 1
16 POP_TOP
18 JUMP_ABSOLUTE 6
>> 20 POP_BLOCK
>> 22 LOAD_CONST 0 (None)
24 RETURN_VALUE
This saves one bytecode instruction (LOAD_METHOD
) per file. :-/
Generally, list comprehensions can be faster than for
-loops. But when I tried both using list of 10 empty but existing files:
In [15]: %timeit -n1 -r1 for f in files: os.remove(f)
71.3 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
compared with a list comprehension using a local alias
In [32]: %timeit -n1 -r1 [rm(f) for f in files]
71 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
there is practically no difference.
Measuring on a recent UNIX system (FreeBSD 12, UFS filesystem on a HDD, using %timeit
in IPython);
os.path.exists()
takes around 2 µs per file in a loop.os.remove()
takes around 7-10 µs per file in a loop.
Using os.stat
directly instead of via exists
does not make much of a difference.
And os.remove
uses the remove(3)
C library call. So most of its time is spent in file system operations, which are inherently really slow compared to a modern CPU.
So apart from writing this in C, using system calls (not C library functions) directly, there is probably not much to be gained.
Answered By - Roland Smith
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.