

-
You don’t need to be rude.
-
My original comment was in reply to someone looking for this type of information, the conversation then continued.
-
Disengage: I don’t want to deal with it today frankly, I don’t have time for rude people.
Don’t DM me without permission please


You don’t need to be rude.
My original comment was in reply to someone looking for this type of information, the conversation then continued.
Disengage: I don’t want to deal with it today frankly, I don’t have time for rude people.


I like when it insists I’m using escape characters in my text when I absolutely am not and I have to convince a machine I didn’t type a certain string of characters because on its end those are absolutely the characters it recieved.
The other day I argued with a bot for 10 minutes that I used a right caret and not the html escape sequence that results in a right caret. Then I realized I was arguing with a bot, went outside for a bit, and finished my project without the slot machine.


Yes, precisely.
If you’re trying to use large models, you need more RAM than consumer grade nvidia products can supply. Without system ram sharing, the models error out and start repeating themselves or just crash and need to be restarted.
This can be fixed with CPU inferencing but would be much slower.
An 8b model will run fine on an RTX30 series, a 70b model will absolutely not. BUT you can do cpu inferencing with the 70b model if you don’t mind the wait.


8b parameter models are relatively fast on 3rd gen RTX hardware with at least 8gigs of vram, CPU inferencing is slower and requires boatloads of ram but is doable on older hardware. These really aren’t designed to run on consumer hardware, but the 8b model should do fine on relatively powerful consumer hardware.
If you have something that would’ve been a high end gaming rig 4 years ago, you’re good.
If you wanna be more specific, check huggingface, they have charts. If you’re using linux with nvidia hardware you’ll be better off doing CPU inferencing.
Edit: Omg y’all I didn’t think I needed to include my sources but this is quite literally a huge issue on nvidia. Nvidia works fine on linux but you’re limited to whatever VRAM is on your video card, no RAM sharing. Y’all can disagree all you want but those are the facts. Thays why AMD and CPU inferencing are more reliable, and allow for higher context limits. They are not faster though.
Sources for nvidia stuff https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/618
https://forums.developer.nvidia.com/t/shared-vram-on-linux-super-huge-problem/336867/
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/758


p l a n t i n g g r a s s o n t h e a s t e r o I d s
If your distrobution’s maintainers have your package in their repos it will generally only be 3-5 clicks in the GUI package manager or 1-2 lines at the terminal.
Flatpak solved compatibility and library issues, becoming huge in the process. AppImage is basically like an Exe for windows.
This has been possible for over 20 years, but with the more recent changes to WINE most (MOST not ALL) windows apps will work fine but you really shouldn’t be trying to use the windows apps unless there’s no other option