Ruihan Zhu
Toggle navigation
about
repositories
blog
publications
projects
cv
learning DeepSeek V3 technical report
an example of a distill-style blog post and main elements
Contents
Equations
Citations
Footnotes
Code Blocks
Interactive Plots
Mermaid
Diff2Html
Leaflet
Chartjs, Echarts and Vega-Lite
TikZ
Typograms
Layouts
Other Typography?
Architecture
MLA
MOE
MTP
Infrastructure
pre-training stage
trainging framework
FP8 mixed precision
DualPipe framework
post-training stage
RL
SFT
###
##
Please enable JavaScript to view the
comments powered by giscus.