Lessup LessUp

聚焦 AI 基础设施、CUDA Kernel 与高性能系统工程

🔬 Focus: AI Infrastructure · CUDA Kernels · LLM Inference · HPC Systems
🌱 Currently: Building high-throughput inference pipelines and GPU-first systems
🤝 Open to: AI infrastructure, performance engineering, research collaboration, and open-source collaboration

👨‍💻 About Me / 关于我

I build AI infrastructure and GPU-first high-performance systems with C++/CUDA, Python, and Go. 主要聚焦 AI 基础设施、GPU 算子优化与高性能系统工程实践。

🔥 GPU Kernel Engineering — CUDA/Triton kernels for FlashAttention, GEMM, quantization, and memory-aware operator design
🧠 AI Inference Systems — lightweight LLM runtimes, KV Cache, W8A16/FP8 quantization, and inference path optimization
⚡ High-Performance Computing — simulation, rendering, and image-processing pipelines tuned for throughput and scalability
🌐 Real-time Systems — RTC signaling, streaming applications, and digital human platforms with system-level integration

Currently / 当前关注: inference acceleration, kernel fusion, and end-to-end GPU system design.

🚀 Selected Work / 项目全景

Featured Projects / 核心项目 — Start here for the quickest overview of my work in CUDA kernels, inference systems, HPC simulation, and production-facing applications.
如果你想快速判断我的技术重心与代表作，建议先看下面 4 个项目。
_{Best entry points for collaboration, hiring conversations, and technical review.}

⭐ TensorCraft-HPC Flagship CUDA kernel library covering GEMM, FlashAttention, Conv2D, SpMV, and FP8 quantization.	⭐ Tiny-LLM Compact LLM inference engine focused on W8A16 quantization, KV Cache, and practical runtime design.
⭐ N-Body Simulation Million-particle GPU simulation exploring direct N², Barnes-Hut, and CUDA-OpenGL interop.	⭐ MetaHuman 3D digital human platform combining real-time rendering, interaction, and behavior control.

⚡ GPU Kernel Optimization / GPU 算子优化

🔷 TensorCraft-HPC

Modern C++17/CUDA kernel library for elementwise ops, GEMM, FlashAttention, Conv2D, SpMV, and FP8 quantization.

🔷 SGEMM Optimization

Stepwise CUDA SGEMM optimization from naive loops to Tensor Core kernels, reaching 40% of cuBLAS.

🔷 Triton Fused Ops

Triton fusion kernels for RMSNorm+RoPE, Gated MLP, and FP8 GEMM with auto-tuning.

🔷 LLM-Speed

CUDA kernel playground for FlashAttention, FP16/INT8 GEMM, and Tensor Core inference primitives.

🧠 AI Inference Engines / AI 推理引擎

🟢 Tiny-LLM

Lightweight LLM runtime with W8A16 quantization, KV Cache, and practical multi-sampling support.

🟢 Mini Inference Engine

Educational CUDA inference engine with seven GEMM optimization stages, reaching 72% of cuBLAS.

🟢 Tiny-DL-Inference

WebGPU micro inference engine implementing Conv2d, kernel fusion, Im2Col, and MNIST classification.

🟢 YOLO-Toys

Real-time multi-model vision stack combining YOLO, DETR, OWL-ViT, BLIP, and WebSocket streaming.

🎮 GPU Computing & Simulation / GPU 计算与仿真

🟠 CUDA Ray Tracer

CUDA ray tracer featuring Phong shading, path tracing, BVH acceleration, and warp-divergence tuning.

🟠 N-Body Simulation

Million-particle CUDA simulation covering direct N², Barnes-Hut, spatial hashing, and OpenGL interop.

🟠 Particle Fluid Sim

Real-time WebGPU fluid simulation with 10K particles, compute shaders, and visual trail effects.

🟠 Mini-OpenCV

CUDA image-processing library covering convolution, morphology, geometric transforms, and pipeline stages.

🟠 Mini-ImagePipe

DAG-based heterogeneous image pipeline with multi-stream scheduling and pinned-memory pools.

🎓 Background & Experience / 教育与经历

🎓 Education

Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray · ZEGO · BGI

Engineering across medical imaging, RTC systems, and genomic-scale data workflows. / 覆盖医疗影像、实时音视频系统与基因数据工程。

🛠️ Tech Stack / 技术栈

Category	Technologies
Languages
AI & HPC	CUDA · Triton · cuBLAS · Tensor Core · WebGPU · Quantization
System & DevOps	Inference pipelines · Performance tuning
Web & Frontend	Real-time apps · Visualization

📊 Signals & Activity / 数据概览

🏆 Highlights & More Stats / 高亮与更多数据

📈 Activity Graph / 活动图

🧬 Visual Signature / 视觉标识

📫 Collaboration & Contact / 联系方式

Reach out if you're building AI infrastructure, inference acceleration, GPU systems, or performance-critical tooling.
欢迎联系我交流 AI 基础设施、推理加速、GPU 系统，以及对性能敏感的工程项目。
_{Open to technical collaboration, engineering roles, research discussions, and thoughtful open-source work.}