linux - Why is using tail to copy a file so much slower than cp, and using awk twice as fast? -
i'm trying strip out header line of large csv file. first methods tried (using tail , awk) work compared copying entire file!
so, fun, let's try few silly potentially didactically interesting methods copying files.
using cp:
$ time cp my_big_file.csv copy_of_my_big_file.csv real 0m2.208s user 0m0.002s sys 0m2.171s
using tail:
$ time tail -n+1 my_big_file.csv > copy_of_my_big_file.csv real 0m44.506s user 0m37.521s sys 0m3.107s
using awk:
$ time awk '{if (nr!=0) {print}}' my_big_file.csv > copy_of_my_big_file.csv real 0m24.951s user 0m20.336s sys 0m2.869s
what accounts such large discrepancies between using tail vs cp vs awk?
cp copying fs block block, without asking question. thing happening @ kernel level.
tail reading line line , filtering recreate file line line. of course, fs bufferize in read , write case, less efficient, cause have cross several layers (kernel-user space), , forth
Comments
Post a Comment