Inbox
https://ggml.ai/
https://github.com/ggml-org
https://github.com/danbev/learning-ai
https://ggerganov.com/
https://journaliststudio.google.com/pinpoint/collections
https://www.chatbase.co/
https://slurm.schedmd.com/
https://metaflow.org/
https://flyte.org/
https://www.kubeflow.org/
https://adalflow.sylph.ai/index.html
https://github.com/kserve/kserve
https://marimo.io/
https://dlthub.com/
https://github.com/meta-pytorch/torchtune
https://github.com/NexaAI/nexa-sdk
https://pytorch.org/blog/torchtune-fine-tune-llms/
https://llm-d.ai/blog/llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga
https://developer.nvidia.com/nccl
https://github.com/run-llama/semtools
https://triton-lang.org/main/index.html
https://daisytuner.com/
https://docs.aws.amazon.com/eks/latest/best-practices/karpenter.html
https://karpenter.sh/
https://www.pixeltable.com/
https://knative.dev/docs/
https://neocloud.tools/gpu-picker/
https://chalk.ai/
https://www.swebench.com/
https://github.com/NVIDIA/KAI-Scheduler
https://dstack.ai/
https://roocode.com/
https://www.requesty.ai/
https://spacelift.io/
https://inferencemax.semianalysis.com/
https://github.com/InferenceMAX/InferenceMAX
https://docs.skypilot.co/en/latest/docs/index.html
https://github.com/vllm-project/semantic-router
https://github.com/vllm-project/production-stack
LLM Gateway to provide model access, fallbacks and spend tracking across 100+ LLMs. All in the OpenAI format.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://www.llmwatch.com/
https://pytorch.org/blog/
https://cme295.stanford.edu/
https://github.com/harvard-edge/cs249r_book
https://tuhdo.github.io/os01/
https://github.com/codecrafters-io/build-your-own-x
https://madaidans-insecurities.github.io/guides/linux-hardening.html
https://use-the-index-luke.com/
https://app.codecrafters.io/courses/redis/overview
https://cstack.github.io/db_tutorial/
https://linkedin.github.io/school-of-sre/
https://github.com/linkedin/school-of-sre
https://nakabonne.dev/posts/write-tsdb-from-scratch/
https://github.com/aphyr/distsys-class
https://makelinux.github.io/kernel/map/
https://onnx.ai/onnx/intro/concepts.html#
https://blog.dailydoseofds.com/p/4-strategies-for-multi-gpu-training?
https://github.com/omerbsezer/Fast-Kubernetes
https://github.com/karpathy/LLM101n
https://stanford-cs329s.github.io/
https://arxiv.org/abs/2510.08731
https://github.com/karpathy/nanochat
https://sebastianraschka.com/llms-from-scratch/
https://huggingface.co/learn/diffusion-course/unit0/1
https://huggingface.co/docs/diffusers/main/en/index
https://bbycroft.net/llm
https://www.bentoml.com/blog/nvidia-data-center-gpus-explained-a100-h200-b200-and-beyond
https://www.bentoml.com/blog/which-inference-platform-is-right-for-enterprise-ai
https://www.bentoml.com/blog/deepseek-ocr-contexts-optical-compression-explained
https://www.bentoml.com/llm/getting-started/choosing-the-right-gpu
https://www.bentoml.com/llm-perf/
https://www.bentoml.com/blog/announcing-llm-optimizer
https://bentoml.com/llm/inference-optimization/kv-cache-offloading
https://bentoml.com/llm/
https://bentoml.com/llm/inference-optimization/llm-performance-benchmarks
https://www.doubleword.ai/resources/behind-the-stack-ep-1-what-should-i-be-observing-in-my-llm-stack
https://www.bentoml.com/blog/3x-faster-llm-inference-with-speculative-decoding
https://www.bentoml.com/blog/amd-data-center-gpus-mi250x-mi300x-mi350x-and-beyond
https://newsletter.semianalysis.com/p/the-gpu-cloud-clustermax-rating-system-how-to-rent-gpus
https://www.baseten.co/resources/guide/the-baseten-inference-stack/
https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo
CS230 Deep Learning
A hands-on course for real AI Engineers
The AI Engineering Playbook
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
SkyPilot: An Intercloud Broker for Sky Computing
GPU-Enabled Platforms on Kubernetes
https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/
https://cgnarendiran.github.io/blog/kv-caching-mla-is-attention-all-you-really-need/
https://cgnarendiran.github.io/blog/hnsw-graph-based-vector-search/
https://cgnarendiran.github.io/blog/lora-efficient-fine-tuning-llms/
https://alidarbehani.com/2025/08/24/beyond-gpus-mastering-ultra-scale-llm-training/
https://alidarbehani.com/2025/08/28/beyond-gpus-mastering-ultra-scale-llm-training-part-2/
https://newsletter.semianalysis.com/p/amazons-ai-resurgence-aws-anthropics-multi-gigawatt-trainium-expansion
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
https://writings.stephenwolfram.com/2024/08/whats-really-going-on-in-machine-learning-some-minimal-models/
https://www.cloudskillsboost.google/course_templates/537
https://www.cloudskillsboost.google/course_templates/538
https://www.cloudskillsboost.google/course_templates/543
https://developer.download.nvidia.com/GTC/PDF/1083_Wang.pdf
https://bluewaters.ncsa.illinois.edu/liferay-content/document-library/Documentation%20Documents/Workshops/Advanced%20User%20Workshop%20Oct%202014/NCSA02_Fundamental_CUDA_Optimization.pdf
https://triton-inference-server.github.io/pytriton/0.7.0/
https://github.com/triton-inference-server/pytriton/blob/v0.2.5/examples/huggingface_bert_jax/server.py
https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton-inference-server-2580/user-guide/docs/model_navigator/docs/inference_deployment/pytriton/deployment.html
https://clickhouse.com/blog/breaking-free-from-rising-observability-costs-with-open-cost-efficient-architectures
https://www.uber.com/en-FI/blog/building-ubers-data-lake-batch-data-replication-using-hivesync/
https://medium.com/@isanghao/io-bound-or-compute-bound-in-ai-c9c541cd6696
https://medium.com/@oril_/transforming-mobile-development-with-backend-driven-ui-c65df97baa79
https://developer.nvidia.com/blog/nvidia-blackwell-ultra-sets-new-inference-records-in-mlperf-debut/
https://developer.nvidia.com/blog/inside-nvidia-blackwell-ultra-the-chip-powering-the-ai-factory-era/?ncid=so-link-929079&linkId=100000379792615
https://developer.nvidia.com/blog/smart-multi-node-scheduling-for-fast-and-efficient-llm-inference-with-nvidia-runai-and-nvidia-dynamo/
https://mlcommons.org/2025/09/deepseek-inference-5-1/
https://medium.com/@piyushkashyap045/tokens-and-embeddings-5d65c7543dea
https://finance.yahoo.com/news/redis-acquire-real-time-data-130000238.html
https://www.spaceo.ai/case-study/ai-agent-cost-optimization/
https://www.uber.com/en-IT/blog/from-predictive-to-generative-ai/
https://stytch.com/blog/best-authentication-services/
https://www.unite.ai/what-every-data-scientist-should-know-about-graph-transformers-and-their-impact-on-structured-data/
https://tamerlan.dev/how-i-manage-my-dotfiles-using-gnu-stow/
https://veitner.bearblog.dev/gpu-l2-cache-persistence/
https://www.duckbillgroup.com/blog/figmas-300k-daily-aws-bill-isnt-the-scandal-you-think-it-is/
https://medium.com/@knish5790/fine-tuning-large-language-models-llms-in-2025-623567db84e9
https://www.philschmid.de/agents-2.0-deep-agents
https://medium.com/fresha-data-engineering/the-good-the-bad-and-the-automq-5aa7a8748e71
https://medium.com/@sahilkatiyar2024/inside-the-mind-of-a-cnn-architecture-explained-simply-7b1168a628c7
https://oaqlabs.com/2025/10/12/kernel-level-gpu-optimization-for-transformer-attention-a-technical-deep-dive/
https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades#check_the_status_of_node_upgrades
https://developers.openai.com/codex/cli/
https://contact.runpod.io/hc/en-us/articles/39403705226003-Help-to-setup-Google-Cloud-s-Artifact-Registry-GAR-with-RunPod
https://www.linkedin.com/posts/bevinpavithran_research-papers-for-ai-engineering-1-tokenization-activity-7381285969099964416-p7JV/
https://www.linkedin.com/posts/brianna-bentler-2397a11a3_artificial-intelligence-index-report-2025-ugcPost-7384073672547868672-r9ue/
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://developer.download.nvidia.com/triton/vLLM-x-Triton-meetup-External.pdf
https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/
https://developer.nvidia.com/blog/maximizing-deep-learning-inference-performance-with-nvidia-model-analyzer
https://developer.nvidia.com/blog/optimizing-and-serving-models-with-nvidia-tensorrt-and-nvidia-triton/
https://developer.nvidia.com/blog/serving-ml-model-pipelines-on-nvidia-triton-inference-server-with-ensemble-models/
https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
https://caseymuratori.com/blog_0024
https://www.anthropic.com/engineering/contextual-retrieval
https://softwaremill.com/triton-inference-server-tips-and-tricks/
https://softwaremill.com/ml-engineer-comparison-of-pytorch-tensorflow-jax-and-flax/
https://gonzoml.substack.com/p/deep-learning-frameworks
https://www.uber.com/en-FI/blog/open-source-and-in-house-how-uber-optimizes-llm-training/
Real-time Data Infrastructure at Uber
https://www.wevolver.com/article/asic-vs-fpga
https://www.lattepanda.com/blog-323098.html
https://docs.nats.io/nats-concepts/overview/compare-nats
The Deep Learning Compiler: A Comprehensive Survey
MLIR: A Compiler Infrastructure for the End of Moore’s Law
Training Compute-Optimal Large Language Models
Introducing LLaMA: A foundational, 65-billion-parameter large language model
Understanding RAG vs fine-tuning
How to train a new language model from scratch using Transformers and Tokenizers
Attention is All You Need
Constitutional AI: Harmlessness from AI Feedback
Scaling laws for neural language models
Introducing Agent Skills
https://www.bentoml.com/blog/chatgpt-usage-limits-explained-and-how-to-remove-them
https://www.elastic.co/search-labs/blog/gpu-inference-elastic-semantic-search
https://clickhouse.com/blog/netflix-petabyte-scale-logging
https://huggingface.co/spaces/transformers-community/Transformers-tenets
https://www.anthropic.com/news/skills
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
https://medium.com/@derrickchwong/a-deep-dive-into-zero-downtime-blue-green-kubernetes-cluster-upgrade-e812e34a3431
https://www.theregister.com/2025/10/20/aws_outage_amazon_brain_drain_corey_quinn/
https://github.com/openai/openai-cookbook/blob/main/articles/what_makes_documentation_good.md
https://www.scmp.com/business/article/3329450/alibaba-cloud-claims-slash-nvidia-gpu-use-82-new-pooling-system
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
The Free Transformer
https://arxiv.org/html/2408.13296v1?_bhlid=5926ce7fdaa222ff4e49b0ba7b48c6da865ef4bf
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities
https://pytorch.org/blog/introducing-pytorch-monarch/
https://www.stateof.ai/
https://docs.ray.io/en/latest/data/data.html
https://neptune.ai/blog/instruction-fine-tuning-fundamentals
https://neptune.ai/blog/instruction-fine-tuning-evaluation-and-advanced-techniques-for-efficient-training
https://developer.nvidia.com/blog/train-an-llm-on-an-nvidia-blackwell-desktop-with-unsloth-and-scale-it/
https://medium.com/@minh.hoque/understanding-kv-caching-in-transformers-729271c9b74a
https://bornlex.github.io/posts/gpt-mha/
https://bornlex.github.io/posts/positional-embedding/
https://bornlex.github.io/posts/triton1/
https://horace.io/brrr_intro.html
https://damek.github.io/random/basic-facts-about-gpus/
https://www.uber.com/en-IT/blog/requirement-adherence-boosting-data-labeling-quality-using-llms/
https://simonwillison.net/2025/Oct/16/claude-skills/
https://github.com/BSVogler/k8s-runpod-kubelet
https://www.densify.com/kubernetes-autoscaling/kubernetes-affinity/
https://habr.com/ru/companies/yandex/articles/674902/
https://www.uber.com/en-FI/blog/enabling-deep-model-explainability-with-integrated-gradients/
https://blog.streambased.io/p/the-9-ways-to-move-data-kafka-iceberg
https://clickhouse.com/blog/log-compression-170x
https://llm-d.ai/blog/kvcache-wins-you-can-see
https://psinghal.me/posts/03-vllm-semantic-router/
https://blog.vllm.ai/2025/10/27/semantic-router-modular.html
https://blog.vllm.ai/2025/10/28/Kimi-K2-Accuracy.html
https://blog.vllm.ai/2025/10/26/sleep-mode.html
https://www.mooncake.dev/
https://www.tempo.new/
kodekloud.com - DevOps courses
https://nodes.io/
https://acquire.com/
https://imgproxy.net/
https://maze.co/
https://zrok.io/
https://syncable.dev/
https://www.coderabbit.ai/
https://github.com/vllm-project/aibrix
https://nocodb.com/
https://rustfs.com/en/
https://clerk.com/
https://www.liquid.ai/
https://roocode.com/
https://starwatcher.ai/
https://portkey.ai/
https://www.clay.com/
https://www.eraser.io/
https://streamyard.com/
https://streamlabs.com/
https://www.singlestore.com/
https://www.coreweave.com/
https://tracto.ai/
https://datasaur.ai/
https://www.hyperstack.cloud/
https://zamurovic.com/
whichllm.together.ai - Which LLM is best for my use case?
deepwiki.org - Which repo would you like to understand?
gitdiagram.com - Repository to diagram
docs.nvidia.com - NVIDIA NeMo Framework / Tarred Datasets
https://github.com/NVIDIA/NeMo/blob/stable/scripts/speech_recognition/convert_to_tarred_audio_dataset.py
Dify - Open-source LLM app platform
Build, evaluate, and deploy LLM apps with visual workflows, RAG, agents, datasets, and observability.
llm platform, workflow builder, agents, rag, open-source, evaluation, monitoring
DBOS - Transactional serverless runtime
Database-as-OS model providing durable workflows and ACID transactions for cloud applications.
serverless, workflows, transactions, durability, distributed systems
Flowise - Drag-and-drop LLM workflow builder
Open-source visual builder for creating LLM workflows, agents, and chatbots with no-code interface.
no-code, workflow builder, agents, chatbots, open-source, visual builder
Zep - Memory and vector store for LLM apps
Long-term session memory, embeddings, and semantic search to power RAG and personalized assistants.
memory, vector database, embeddings, semantic search, rag, llm memory
Maze - User research and testing platform
Conduct user research and usability testing with remote, unmoderated testing tools and analytics.
user research, usability testing, analytics, remote testing, user experience
zrok - Secure tunneling and sharing platform
Zero-trust networking for secure sharing and tunneling with built-in authentication and access controls.
tunneling, secure sharing, zero-trust, networking, access control
Syncable - Real-time data synchronization
Real-time data synchronization platform for building collaborative applications with offline support.
data sync, real-time, collaborative apps, offline support, synchronization
CodeRabbit - AI-powered code review
AI-powered code review platform that provides intelligent feedback and suggestions for pull requests.
code review, ai assistant, pull requests, code quality, automated feedback
AIBrix - VLLM inference optimization
High-performance inference optimization toolkit for large language models with VLLM integration.
llm inference, performance optimization, vllm, inference acceleration, model serving
WorkOS - Enterprise authentication platform
Enterprise-grade authentication and user management APIs for developers building B2B applications.
enterprise auth, sso, user management, b2b apis, authentication
NocoDB - Open-source Airtable alternative
Open-source no-code database platform that turns any database into a smart spreadsheet interface.
no-code database, airtable alternative, open-source, spreadsheet interface, database management
RustFS - Rust-based file system
High-performance file system implementation written in Rust with focus on safety and concurrency.
file system, rust, performance, safety, concurrency
Clerk - Authentication and user management
Complete authentication and user management platform for React, Next.js, and modern web applications.
authentication, user management, react, nextjs, web apps
Liquid AI - Foundation models and neural networks
Advanced AI research company developing foundation models and neural network architectures.
foundation models, neural networks, ai research, machine learning, deep learning
RooCode - Development platform
Cloud-based development platform for building and deploying applications with collaborative features.
cloud development, deployment platform, collaborative coding, application building
StarWatcher - AI-powered GitHub analytics
AI-powered analytics platform for tracking GitHub repositories, stars, and open-source project insights.
github analytics, repository tracking, open-source insights, ai analytics, project monitoring
Portkey - AI gateway and LLM operations
AI gateway platform for managing, monitoring, and scaling LLM applications with observability features.
ai gateway, llm operations, monitoring, observability, model management
Clay - Data enrichment and automation
Data enrichment and automation platform for sales and marketing teams with AI-powered insights.
data enrichment, sales automation, marketing tools, ai insights, lead generation
Eraser - Documentation and diagrams
Collaborative platform for creating technical documentation, diagrams, and architectural designs.
documentation, diagrams, collaboration, technical writing, architecture design
StreamYard - Live streaming platform
Browser-based live streaming platform for creating professional broadcasts and webinars.
live streaming, webinars, broadcasting, online events, content creation
Streamlabs - Streaming software and tools
Comprehensive streaming software suite with alerts, overlays, and monetization tools for creators.
streaming software, content creation, alerts, overlays, monetization
SingleStore - Distributed database platform
High-performance distributed database for real-time analytics and transactional workloads.
distributed database, real-time analytics, high performance, transactional, data processing
Lightning AI - Machine learning platform
End-to-end machine learning platform for training, deploying, and scaling AI models with PyTorch.
machine learning, pytorch, model training, ai platform, model deployment
CoreWeave - GPU cloud computing
Specialized GPU cloud platform optimized for AI, machine learning, and high-performance computing workloads.
gpu cloud, ai computing, machine learning infrastructure, high performance computing, cloud gpu
Tracto - AI workflow automation
AI-powered workflow automation platform for streamlining business processes and task management.
workflow automation, ai automation, business processes, task management, process optimization
Datasaur - Data labeling platform
Collaborative data labeling platform for machine learning projects with AI-assisted annotation tools.
data labeling, machine learning, annotation tools, ai-assisted, collaborative platform
Hyperstack - GPU cloud platform
GPU cloud infrastructure platform designed for AI, machine learning, and compute-intensive applications.
gpu cloud, ai infrastructure, machine learning, compute platform, cloud computing
Character.AI - Create and chat with AI characters
Platform to create and interact with AI characters for assistance, entertainment, and roleplay experiences.
ai characters, chatbots, roleplay, consumer ai, conversational ai