Question How to write a lot of txt files in my hard drive in a less costly way.

Status
Not open for further replies.

claudiogc

Reputable
Jun 29, 2015
8
0
4,510
Hello!
So, I have this web crawler I made in Python and it will get more than 150000 texts files each one from a different link but from the same website and I will write them all in my very old hard drive.

My question is:
Which method is less stressful for the hard disk to write all these files? I was thinking about two.
1- Get 1000 text files at a time and write them each 1000. I choose 1000 because... I don't know.
2- Write them one by one.
 

kanewolf

Titan
Moderator
150,000 files needs to be broken up into multiple subdirectories. Windows (and linux) doesn't do well with too many files in one directory. Your 1000 file (or 2000 file) chunk probably should be a directory and then 150 siblings with the rest of your files. Directory listings, grep, find can have issues when you have too many files in a directory.
 

claudiogc

Reputable
Jun 29, 2015
8
0
4,510
150,000 files needs to be broken up into multiple subdirectories. Windows (and linux) doesn't do well with too many files in one directory. Your 1000 file (or 2000 file) chunk probably should be a directory and then 150 siblings with the rest of your files. Directory listings, grep, find can have issues when you have too many files in a directory.
150,000 files needs to be broken up into multiple subdirectories. Windows (and linux) doesn't do well with too many files in one directory. Your 1000 file (or 2000 file) chunk probably should be a directory and then 150 siblings with the rest of your files. Directory listings, grep, find can have issues when you have too many files in a directory.
So what is the maximum amount of files should I have in the same directory? I'm using linux. I read here ext4 supports an unlimited number of files per directory.
https://unix.stackexchange.com/questions/239146/linux-folder-size-limit

Also, I'll read and parse all this files with another python program. I'll not use ls command or something.
And about my hard disk which write method should I choose?
 
Last edited:
Also, I'll read and parse all this files with another python program. I'll not use ls command or something.
And about my hard disk which write method should I choose?
In that case you can write as many files as you like,windows only freaks out because it has to make icons for each file.
As kanewolf already said,the best way would be to use some cache,either some other disk or a ramdisk and use the append command to write the content of 1000 (or more) all at once.
If the text is that important you can make a subrutine that will check that everything got written correctly (and also you wouldn't be writing important data on a very old disk anyway) .
 

claudiogc

Reputable
Jun 29, 2015
8
0
4,510
Even if yop don't use "ls", you have to do some equivalent to read the files,. I would experiment. Run with 1, 2, 5, 10K files in a directory. See if there is any performance change. I would limit to 10K.
Thanks for your answer I changed my code and now each directory will contain 5000 files just to be sure.

But I think we change the subject a little bit. The original question was which method should I choose to write all these files in my hard disk. Which one would be less stressful for the hard disk.
 

kanewolf

Titan
Moderator
Thanks for your answer I changed my code and now each directory will contain 5000 files just to be sure.

But I think we change the subject a little bit. The original question was which method should I choose to write all these files in my hard disk. Which one would be less stressful for the hard disk.
How big are the TXT files? hundreds of bytes of hundreds of Kbytes? 100 byte files are very inefficient for I/O.

I/O speed can be a killer for performance.
 

claudiogc

Reputable
Jun 29, 2015
8
0
4,510
How big are the TXT files? hundreds of bytes of hundreds of Kbytes? 100 byte files are very inefficient for I/O.

I/O speed can be a killer for performance.
A few kbytes, the largest ones, a few dozens of kbytes. I believe but not sure not a single one gets more than 100kbytes. I'm asking this because my hard disk is old so I though this could degrade even more.
 
Status
Not open for further replies.