Andrés Torrubia calls his machine “Osado”. The name is not bad, because this beast has nothing less than six NVIDIA GeForce RTX 3090 graphics cards, something that turns the computer into a small supercomputer with a brutal calculation capacity.
The first thing one might think is that it uses those six charts to mine cryptocurrencies, but actually it takes advantage of that PC for a very special area: research in artificial intelligence and deep learning. This is the story of the creation of that “monster”.
Deep learning is very gluttonous
Andres Torrubia (antor) is an old acquaintance of this house: we have interviewed him and he has collaborated on various topics related to the field of artificial intelligence. He was also a special guest of two of the Captcha episodes (1×03 and 2×04), the video / podcast that we created in Xataka in 2018.
Andrés, a telecommunications engineer by training, defines himself as “Bruce Wayne by day (co-founder of the Institute of Artificial Intelligence (IIA) and Medbravo) and Deep learning batman at night“.
Not bad description: has been participating in artificial intelligence competitions for years In which it fights against large technological giants, and often surpasses them, as happened in a competition organized by Alibaba in 2019 that it won with its team, called “sanchinarro” (a residential area of the Hortaleza district, in Madrid).
This work requires important calculation resources, and here Andrés decided to ride it on his own. This entrepreneur told us in a telephone interview how “except for my MSX, my Atari and the laptops that I have used, I manage to assemble equipment, and I have assembled all my PCs”.
That meant that when he began to work in the field of deep learning (Deep Learning or DL just), a team was assembled for this type of task. “In the wake of artificial intelligence competitions and other work topics I ended up creating teams to train DL, and is that the DL is gluttonous and insatiable in resources. The more the merrier. “
He started by creating a team with a GTX 1080, which he later expanded. First he replaced that graphic with a 1080 Ti, then added a second 1080 Ti and later I would end up replacing those two graphics with the later RTX 2080 Ti. That computer, which he also used for his home theater system, was called “Cinebuntu.”
“The DL is gluttonous and insatiable in resources. The more the merrier”
As he explained to us, “Until recently had three RTX 2080 Ti accompanying an AMD Ryzen Theadripper 1950x which has 16 cores and 32 threads, in addition to using 128 GB of RAM. “
In its day the Threadripper (especially in the first generation) gave compatibility problems if you played 8 modules, but nowadays they no longer and even The Threadripper PRO also with 8 modules access to memory is faster because it uses 8 channels.
Accompanying those graphs of a more suitable processor had a certain trick: EPYC and Threadripper PRO could not be purchased on a regular basis, they used to only be available on IBM machines. Fortunately, the Threadripper PRO are now available to the enthusiast market.
While for the final consumer it is normal to be able to access microphones that support four modules (“sticks”, as he said) of RAM memory, the ideal would be access the latest Threadripper, which support eight “sticks” and they have fewer limitations, and even make the leap to EPYC or Threadripper PRO that have “zero limitations”.
Those limitations of the most modern Threadripper, he told us, are centered especially in the way they handle PCIe lanes: to the domestic Threadripper “they drop less tracks”, but as it emphasized “if you want to put two or even three graphic cards you have to go to a Threadripper for the PCIe tracks”. What is the final configuration?
- CPU: AMD Threadripper 1950X
- Charts: 6 x NVIDIA GeForce RTX 3090
- Memory: 256 GB of RAM
- Storage: 10 TB on M.2 NVMe SSD drives
Don’t forget the fire extinguisher
The choice of the RTX 3090 was not trivial, and in fact Andrés’s interest was not focused so much on the power of the graphics (which also) but on the fact that each of these cards has 24 GB of GDDR6 memory.
He was clear about it, and especially stressed that although initially the drivers were still somewhat green, today you can already enjoy those “good” drivers to take advantage of the 3090 in DL. In games she is fast on the nose, but it is that the improvement in DL is 50% and even 100% in certain processes when comparing it with the 2080 Ti “.
For him, he insisted, “the reason for choosing the 3090 is memory, which is key to DL. I could do what I’m doing with the 3080 and wait longer, but the problem was memory. In the 2080 Ti it had 11 GB, and now I take a brutal leap to 24 GB “.
“I had a fire extinguisher and I put it aside just in case”
The montage of that beast has many nice anecdotes, but there is a curious one: when the montage began, he began by placing three of those cards, and knowing the heat they dissipate, he told us how “I had a fire extinguisher and I put it next to it just in case“.
A team like this imposes other difficulties: “there are no boxes in which you can mount six graphs, but what if I saw is all the cryptocurrency mining rigs were doing“.
The independent miners they mount those teams in open chassis with a basic structure on which the graphics and the rest of the components are supported.
That was when the “bricomania” phase arrived: “I discovered the concept of 20×20 profile. It’s like the windows, if you want you can buy 20×20 profiles, they can be cut and sent to you by segments “and then assemble them and create a” closet “so that the whole machine is perfectly assembled there.
Andrés had to manufacture the supports for the power supplies (from STL files, very popular in the field of 3D printing) and the graphics. “I got out of the way the Noctua ‘skyscraper’ that used to cool the processor and I put a liquid cooling system, not because it was thermally necessaryBut because on top of where the skyscraper was, I couldn’t put any card and now I can. “
Assembling such a machine has nothing to do with assembling a PC
The idea was good, but that didn’t save him from trouble. There is, as he explained, “a fundamental difference with cryptos. In that area, people connect graphics with USB risers, but in my case I needed to use risers that are ‘cablacos’ (similar to old hard drive IDE cables) that go the full width of the PCIe slot. “
“To physically hold the plate they create an obstruction, and placing them all is a story,” he continued. The wide cord they were selling did not fit in the closet, which made it clear to him that I had to be careful with the measurements of that cabinet-chassis.
Another key element is consumption: “each of these graphs consumes 350 W peakIf you put four – he continued with that number – it’s 1,400 W, and if you also add the Threadripper and something else, you add another 200.
“The PCI bus is very sophisticated, but it is used in a limited way on the PC.”
“A single 1,600 W source is going to get a bit caught“What did he do: place two sources instead of just one to be left over and thus” be able to place two more graphs. “
There he encountered the following obstacle: he did not know “consumer motherboards and not so consumer motherboards that have six PCI slots. From Threadripper there is no, and the one he had admitted only four “.
It was at that moment when “I discovered that there is a madman in Germany, an electronic engineer. I know about it, I know the PCI bus, it is very sophisticated but in a PC they are used in a limited way. They have a lot of features, including multiplexing. “That was the solution to your problem.
Searching with that term I ended up finding “a guy who sold PCIe splitters: I sold you boards that divide a PCIe x16 into four PCIe x4. Your motherboard and your BIOS don’t even have to support it “His is ASRock and it supports it, it has AMI BIOS and is ready for AMD Threadripper.
There a new problem appeared: “I clicked the cards but I only got five, not six, when puncturing the sixth it did not start“He kept working with those five graphics until he solved the problem, which was a BIOS conflict.
“This consumes an egg, but of course, you can’t complain”
Another challenge of running such a team is energy consumption. “This consumes an egg, but of course, you can’t complain“To alleviate this burden, it has already commissioned a consumption study aimed at the installation of solar panels: those who carry out the study size the project according to consumption, and that will allow that part to be solved as well.
Making your own home supercomputer pays off: the graphic version of the DGX100 with 384 GB of video is worth about 200,000 euros, he told us: his equipment is not so different, it is certainly inferior in performance and perhaps reaches 40% of what gives that DGX100, but he has invested 15,000 euros, not 200,000.
The end result is, he assures, spectacular. Did some light testing with games. “With Windows it detects them and works for example to play Cyberpunk 2077But it doesn’t take advantage of the six cards because it doesn’t have multi-GPU support, but with the 3090 you don’t really need it: with it you can play at full speed in 4K, something that the 2080 Ti couldn’t do. “
“With five cards the team sets fire to Deep Learning tasks”
However for his work with deep learning he uses them with Linux. “With five fires,” he explained, highlighting how NVIDIA is clearly ahead of AMD in this area. “For games there will be discrepancies, in raytracing NVIDIA seems better, but no color on DL, NVIDIA is way ahead in software. At AMD it’s experimental, and that’s generously speaking. In the RTX you can even modulate the consumption so that they do not consume more than 200 W “.
He in fact has had to limit that consumption “Because I was when I was charging the electric car and I was going to put the washing machine, the warning of the power has 5 kW jumped.” As a curiosity, he told us, “the function of solar panels apart from spending less is to increase the available power during the hours of sunshine, that is, I have 5.5 kW + whatever they are generating for me”.
In fact he wanted to go further. “with the investor I want to use an API and he ask you in real time how much you are consuming, so that it measures how much available power I have and if you see that I am too fair, notify the computer so that it reduces consumption “.
“The speed is great, but the really important thing is that now I can also do six experiments at the same time.”
In the end, he explained, the improvement in the speed at which these tasks are performed is important, but “the important thing is that now I can also do six experiments at the same time or do something called cross-validation: you have to train the same network typically five times; and in this way I can do it concurrently (and there is one free in case I wanted to do something) “.
A good example is in the language models based on transformers, which as he explained “they are very gluttonous in memory. You can’t put large models in, you can’t even carry them “, but this type of machine allows you to work with quite large models without problems. Now, to continue winning deep learning competitions.