Zhiqi Huang
I am an M.Phil. student in Information Architecture at Waseda University (早稲田大学), where I work on geometry-aware generative modeling at the intersection of computer graphics, computer vision, and generative AI. My research studies how 3D representations—including meshes, PBR materials, 3D Gaussian Splatting, and geometry/depth/normal priors—can serve not only as final assets, but also as structural intermediate representations for 3D-aware video/world models, simulation-ready environments, and controllable digital humans.
Before starting my master’s, I spent three years at 4399 Games building cross-platform real-time rendering systems for shipped mobile and PC titles. This industry experience shaped my interest in generative systems that are controllable, efficient, and compatible with practical graphics and simulation pipelines.
My recent work includes SIE3D, a framework for generating text-controllable 3D avatars from a single image, and a first-author manuscript on geometry-aware PBR material generation from long text. Going forward, I am interested in extending controllable 3D generation from static assets to dynamic world models for embodied interaction, including robot manipulation, AR/VR, interactive simulation, and game-scale virtual worlds.
I am currently seeking Research Assistant (RA) opportunities and 2027 Ph.D. opportunities in graphics, vision, generative AI, and embodied/world-model research. I received my B.E. in Software Engineering from Sun Yat-sen University (中山大学) in 2021.
🔥 News
- [Jun. 2026] I am seeking Research Assistant (RA) opportunities in geometry-aware generative modeling, 3D-aware video/world models, embodied simulation, and efficient 3D generation.
- [2027 Intake] I am interested in Ph.D. opportunities in graphics, vision, generative AI, and world models for embodied interaction.
- [Apr. 2026] A first-author manuscript on geometry-aware PBR material generation from long text is currently under review.
- [Jan. 2026] My first-author paper “SIE3D” was accepted to ICASSP 2026.
🔭 Research Direction
I view 3D not only as an output modality, but also as a useful interface for grounding generative models in space, material, lighting, and action. My long-term goal is to build generative systems that connect text, images, videos, and embodied agents through controllable 3D representations. I am especially interested in methods that improve spatial consistency, physical plausibility, editability, and downstream usability without relying on extremely large-scale training from scratch.
Current and future interests include:
- Geometry-aware generative modeling: injecting geometry, depth, normal, material, and multi-view priors into 3D, video, and world models.
- 3D-aware video/world models: generating spatially consistent future visual states for embodied interaction, robot manipulation, and interactive simulation.
- Simulation-ready assets and digital twins: creating relightable PBR materials, controllable 3D objects, and scene assets for games, AR/VR, and robotic simulation.
- Controllable digital humans and avatars: identity-preserving, language-controllable 3D Gaussian avatars for gaming, VR, and interactive agents.
- Efficient generative systems: parameter-efficient adaptation, consumer-grade GPU inference, and practical evaluation protocols for multimodal 3D generation.
🎯 Application Scenarios
- Embodied simulation and robot manipulation: using geometry-aware generation to produce spatially consistent objects, scenes, and future visual rollouts for policy learning, evaluation, and debugging.
- Interactive games and AR/VR: generating controllable avatars, relightable PBR assets, and editable virtual environments from multimodal instructions.
- 3D-aware video and world modeling: using 3D structure as an intermediate layer to make image/video generation more consistent across views, time, lighting, and user interaction.
📝 Publications
- SIE3D: Single-Image Expressive 3D Avatar Generation via Semantic Embedding and Perceptual Expression Loss
- IEEE ICASSP 2026 (Accepted) - First Author, Corresponding Author
- Authors: Zhiqi Huang, Dulongkai Cui, Jinglu Hu
- A framework for generating text-controllable 3D avatars from a single image using semantic embedding fusion and perceptual expression supervision. This work connects identity-preserving 3D generation with natural-language control, targeting controllable digital humans for games, VR, and interactive systems on consumer-grade GPUs.
- Links: Project Page Paper (arXiv:2509.24004) Code (GitHub)
📄 Manuscripts Under Review
- First-author manuscript on geometry-aware PBR material generation from long text
- Currently under review
- A method for generating relightable PBR materials for 3D meshes from long-form descriptions by grounding material semantics in local geometry. The work targets simulation-ready assets with stronger semantic alignment, multi-view consistency, physical plausibility, and efficient inference on consumer-grade GPUs.
🎓 Education
- Waseda University (早稲田大学), Fukuoka, Japan
- M.Phil. in Information Architecture (English-taught program)
- Apr. 2025 - Mar. 2027 (expected)
- Research focus: geometry-aware generative modeling, 3D generation, and world models for embodied interaction
- Current GPA: 3.8 / 4.0
- Sun Yat-sen University (中山大学), Guangzhou, China
- B.E. in Software Engineering
- 2017 - 2021
- GPA: 3.7 / 4.0
💼 Industry Experience
- 4399 Games, Guangzhou, China
- Senior Graphics Engineer
- 2023 - 2024
- Led a rendering team of 3-5 engineers and owned the rendering roadmap for “Era of Conquest” on mobile and PC.
- Drove graphics development for new projects including “Catch & Build: Land of Pals” across mobile, PC, and web.
- Graphics Engineer
- 2021 - 2023
- Built and optimized the real-time rendering pipeline for “Era of Conquest”, focusing on cross-platform shader optimization, physically based rendering, and performance profiling.
- Senior Graphics Engineer
💻 Technical Skills
- Languages: C++, Python, C#, GLSL/HLSL
- Graphics & Engines: Vulkan, OpenGL, Unity, Real-time Rendering, Physically Based Rendering (PBR)
- ML & 3D: PyTorch, 3D Gaussian Splatting, Diffusion Models, CLIP/LongCLIP, Mesh Processing, Multimodal Generation
- Research Practices: parameter-efficient fine-tuning, multi-view rendering/evaluation, simulation-ready asset pipelines
🗣️ Languages
- Chinese: Native (Mandarin & Cantonese)
- English: Professional working proficiency (TOEFL iBT: 90; English-taught master’s program)
🏆 Honors & Awards
- Outstanding Student Scholarship (Third Prize), Sun Yat-sen University, 2018-2019
