Question Number Crunching machine

Jan 16, 2023
4
0
10
I am doing research running Python programs iterating over large datasets that are causing memory errors and take hour to run ( when they do complete). My machine is an older A10-9600P Radeon R5 with 12 G RAM. I do not do gaming So :
  • Can I use USB thumb drive as RAM to increase the RAM size ?
  • What would be a lower end, middle, and high end machine that would be suited for number crunching- Processor and RAM size - if I were to buy a new one ?
  • I there a service that would host a Python instance where I can run my analysis ?
 
I am doing research running Python programs iterating over large datasets that are causing memory errors and take hour to run ( when they do complete). My machine is an older A10-9600P Radeon R5 with 12 G RAM. I do not do gaming So :
  • Can I use USB thumb drive as RAM to increase the RAM size ?
  • What would be a lower end, middle, and high end machine that would be suited for number crunching- Processor and RAM size - if I were to buy a new one ?
  • I there a service that would host a Python instance where I can run my analysis ?
 
Looking for permutations / combinations of multiple dataset that can have 100K records each.. so after each run the result is then analyzed & reduced because the resulting datasets reach billions of records.. I understand I will not be able to do this in one big dataset but by increasing what the machine can handle at the same time crunching it down I'm hoping to find a workable middle ground.
 
First, see if there is actually anything wrong with your ram.
Run memtest86 or memtest86+
They boot from a usb stick and do not use windows.
You can download them here:
If you can run a full pass with NO errors, your ram should be ok.

Running several more passes will sometimes uncover an issue, but it takes more time.
Probably not worth it unless you really suspect a ram issue.

Do you have a budget for a new pc?

My understanding is that python is largely single threaded.
Run the cpu-Z bench on your pc and look at the single thread performance rating.
It should be about 160; abysmal by today's standards.
http://valid.x86.fr/bench/tvr752

How much ram would it take to hold the full complement of 100K records?
If ram will not hold enough, use a ssd for storage.

A simple $120 I3-12100 will score 657.
A lga1700 motherboard will be about $100.
16gb of ddr4 ram will be $40.
A intel 512mb 660P pcie ssd will be $40.
 
When you say memory errors, do you mean an out of memory exception or some other memory corruption error? The former can be addressed with more RAM, the latter would suggest something wrong with your system.

In terms of your workload, in layman's terms are you iterating through a dataset and determining all the different ways the data can be arranged?

Looking for permutations / combinations of multiple dataset that can have 100K records each.. so after each run the result is then analyzed & reduced because the resulting datasets reach billions of records.
It sounds like your already attempting to manage your RAM use, but my first question when someone says they are running out of RAM is, do you have only the essential data required for what is currently being processed stored in memory. Is there any scope to move it to a database and work on a small snapshot at a time?

What would be a lower end, middle, and high end machine that would be suited for number crunching- Processor and RAM size - if I were to buy a new one ?
We need more information to make any recommendations:

  1. What is your budget?
  2. Is your workload multi-threaded or is it single threaded? If it is currently single threaded, can it be multi-threaded?

Modern hardware is a vast improvement over what you have so it won't be difficult to cut your completion time into well below an hour.
 
Thank you for your information and effort. I did not consider more RAM. I assumed the best option is to buy a new machine. Eons ago I knew a 286 was slower than a 386. Today I have no clue with all the processors and combinations. I also have no clue as to cost which is why I figured I'd ask what a low-med-high machine would be and then see what the cost would be.

As for the threading my work is Python and "R" programs in the Anaconda environment which from what I read is single threaded.