About me
Hi! This is Luning Wang (王麓宁)’s homepage. I’m currently working on Large Recommendation Models & Generative Recommendation at ByteDance Douyin Group as a full-time algorithm engineer.
I mainly focused on the infrastructure and efficiency optimization of Large Language Models in my past work. I had experience on optimizing LLMs with techniques like quantization, kv-cache compression and speculative decoding. I’ve also done works related to LLM Systems, in which I gained experience of working on vLLM and parallel computing. Furthermore, I’ve been exploring the application of LLMs in the field of biomedical engineering for some time at UofM.
I’d casually put my notes for new works & random thoughts in my blog posts, hope that could help as a reference if you’re also interested in related topics. I’m open to discussion and collaboration, feel free to drop me an Email or reach out on LinkedIn!
🎓 Education
- [08/2024~05/2026] M.S. Department of Electrical and Computer Engineering, University of Michigan
- [09/2020~06/2024] B.Eng. Department of Electronic Engineering, Tsinghua University
💻 Work Experience
- [06/2026~Now] ByteDance (Douyin), AI Algorithm Engineer.
- [04/2025~08/2025] Huawei (Noah’s Ark Lab), AI System Engineer (Intern).
- [02/2024~06/2024] Infinigence AI, AI Algorithm Engineer (Intern).
- [09/2023~01/2024] ByteDance (TikTok), AI Algorithm Engineer (Intern).
See my CV for more details of my work.
📚 Research Experience
- [01/2025~10/2025] Independent Researcher at Dept of ECE, UofM
- Collaborator: Chenwei Wu, Zitao Shuai, Zhengxu Tang, Jun-En Ding, Hsin-Ling Hsu.
- [09/2022~06/2024] Undergrad Research Assisstant at NICS-EFC lab, Dept of EE, THU
- Advisor: Prof. Yu Wang, Dr. Xuefei Ning
- Project Supervisor: Dr. Shiyao Li
📝 Selected Publications
Here are some of my representative works:
- [ACL’25] MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation. Hsin-Ling Hsu*, Cong-Tinh Dao*, Luning Wang, et al. [pdf]
- [NeurIPS ENLSP Workshop’24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios. Luning Wang, et al. [pdf] [github]
- [ICML’24] Evaluating Quantized Large Language Models. Shiyao Li, Xuefei Ning, Luning Wang, et al. [pdf] [github]
- [NeurIPS ENLSP Workshop’23] LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment. Shiyao Li, Xuefei Ning, Ke Hong, Tengxuan Liu, Luning Wang, et al. [pdf]
See my Google Scholar for the full list of my publications.
⚙️ Academic Services
- [09/2025~12/2025] Paper reviewer for the ICLR’26 main conference.
- [02/2025~03/2025] Paper reviewer for the ICLR’25 Workshop on Reasoning and Planning for LLMs.
