learning DeepSeek V3 technical report

an example of a distill-style blog post and main elements

Architecture

MLA

MOE

MTP

Infrastructure

pre-training stage

trainging framework

post-training stage

RL

SFT

###

##