linux - Why is using tail to copy a file so much slower than cp, and using awk twice as fast? -


i'm trying strip out header line of large csv file. first methods tried (using tail , awk) work compared copying entire file!

so, fun, let's try few silly potentially didactically interesting methods copying files.

using cp:

$ time cp my_big_file.csv copy_of_my_big_file.csv  real    0m2.208s user    0m0.002s sys     0m2.171s 

using tail:

$ time tail -n+1 my_big_file.csv > copy_of_my_big_file.csv  real    0m44.506s user    0m37.521s sys     0m3.107s 

using awk:

$ time awk '{if (nr!=0) {print}}' my_big_file.csv > copy_of_my_big_file.csv  real    0m24.951s user    0m20.336s sys     0m2.869s 

what accounts such large discrepancies between using tail vs cp vs awk?

cp copying fs block block, without asking question. thing happening @ kernel level.

tail reading line line , filtering recreate file line line. of course, fs bufferize in read , write case, less efficient, cause have cross several layers (kernel-user space), , forth


Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -