AI

Follow (Blogs/Podcasts/Sites)

Top trending open-source startups in Q4 2025
Good AI List - AI Open Source Projects
Models Table
LLM Explorer
https://thefrugalarchitect.com/

Papers

https://www.alphaxiv.org/
https://journalclub.io/

AI Tools

https://www.wafer.ai/
https://www.rightnowai.co/
https://supercode.sh/en
https://aider.chat/
https://e2b.dev/
https://www.cmux.dev/
https://roocode.com/
https://github.com/mcp
https://www.augmentcode.com/
https://factory.ai/
https://opencode.ai/
https://github.com/Kilo-Org/kilocode

https://github.com/lutzroeder/netron

Local Inference

https://ollama.com/
https://github.com/ggml-org/llama.cpp
https://github.com/exo-explore/exo
https://github.com/0xSojalSec/airllm

Platforms

https://saturncloud.io/

Inference

https://github.com/microsoft/BitNet
https://lmcache.ai/

Services

https://xpu.wiki/
https://gpuaas.com/
https://instances.vantage.sh/

Models

https://www.minimax.io/
LMArena Leaderboard
vellum - LLM Leaderboard

GPU Sharing

docs.nvidia.com - NVIDIA Multi-Instance GPU

Well, I try to give a more generic answer: vGPU virtualizes the GPU in timesharing its GFX and compute resources, and that needs a license and a special driver in hypervisor as well as in guest OS. vGPU allows for a lot of flexibility in how to assign fractions of the GPU to users, from full GPU to single user up to 12/16/24 fragments, one per user…
Compared to MIG, vGPU might have less predictable, sometime longer latency, if the user/jobs needs to wait for his time to use all GPU resources is to come again…
MIG has a fraction of all GPU resources fix assigned to a user/tenant/job, but is much less flexible in how to change the fraction size per user/job. MIG is only avail on the high end datacenter GPUs, can fragment the GPU between a single or up to 7 users/jobs/instances, but to change assignment, all jobs need to be stopped, and GPU needs to be reconfigured and like re-set…
So basically with MIG you trade lower, more predictability of resources/response/result for easier and more flexible managebility via vGPU…
Hope this helps as some guidance.

docs.nvidia.com - Time-Slicing GPUs in Kubernetes
docs.nvidia.com - Multi-Process Service

GPU Telemetry

docs.nvidia.com - GPU Telemetry

Tools

https://cloud.google.com/products/calculator?hl=en

Kubernetes

https://virtual-kubelet.io/
https://gateway-api.sigs.k8s.io/
https://github.com/derailed/popeye
https://projectcontour.io/
https://www.getambassador.io/products/api-gateway
https://lws.sigs.k8s.io/docs/overview/

Performance Optimization

https://developer.nvidia.com/nsight-systems https://docs.csc.fi/apps/nsys/ https://github.com/gpu-mode/lectures https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog https://github.com/deepaksatna/NVIDIA-Nsight-Systems-Profiling-for-Distributed-LLM-Training https://alexarmbr.github.io/2024/08/10/How-To-Write-A-Fast-Matrix-Multiplication-From-Scratch-With-Tensor-Cores.html

NVIDIA

https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/
https://docs.nvidia.com/deploy/cuda-compatibility/why-cuda-compatibility.html
https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html
https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries
https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#runcont
https://docs.nvidia.com/deploy/nvidia-smi/index.html
https://docs.nvidia.com/deploy/driver-persistence/index.html
https://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__gpupstate.html
https://github.com/NVIDIA/aistore
https://developer.nvidia.com/deepstream-sdk
https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html
https://github.com/NVIDIA/DALI
https://triton-inference-server.github.io/model_navigator/

NVIDIA: NGC Catalog

NVIDIA CUDA

https://github.com/Infatoshi/cuda-course

Use Cases

https://build.nvidia.com/blueprints

Infrastructure

Research

Video models are zero-shot learners and reasoners

Jupyter Services

https://www.zerve.ai/
https://marimo.io/
https://deepnote.com/
https://www.quadratichq.com/

AI Image & Video

https://www.postos.io/

ML System Design

https://www.theunwindai.com/ https://www.evidentlyai.com/ml-system-design https://www.zenml.io/llmops-database https://www.zenml.io/blog/llmops-in-production-another-419-case-studies-of-what-actually-works
https://wandb.ai/mostafaibrahim17/ml-articles/reportlist https://bytebytego.com/courses/machine-learning-system-design-interview/visual-search-system https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies https://github.com/IaroslavElistratov/ml-systems-course https://github.com/mercari/ml-system-design-pattern https://github.com/khangich/machine-learning-interview

NimbleCore

https://ethanding.substack.com/p/ai-subscriptions-get-short-squeezed
https://www.bentoml.com/blog/should-you-build-or-buy-your-inference-platform

Companies

turba optimizes AI/ML infrastructures for faster and more efficient model deployment
https://zenik.co/
https://hyperlink.nexa.ai/
https://inferact.ai/

Edge AI

https://raiderchip.ai/