cv
Basics
| Name | Yuantian Huang | 
| Research interest | Image Editing and Generation, Computer Graphics, Computer Vision | 
| Url | https://sky24h.github.io/ | 
| huang_yuantian (at) cyberagent.co.jp | 
Publications
-  ● International Conference (Peer-reviewed)
 1. H. Liu, X. Yang, T. Akiyama, Y. Huang, Q. Li, S. Kuriyama, T. Taketomi, ''TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation'', International Conference on Learning Representation, (ICLR Oral), 2025. PDF, Website
 2. Y. Huang, S. Iizuka, and K. Fukui, ''Training-Free Zero-Shot Semantic Segmentation with LLM Refinement'', The British Machine Vision Conference (BMVC) 2024, 2024. PDF, Website
 3. Y. Huang, S. Iizuka, and K. Fukui, ''Diffusion-based Semantic Image Synthesis from Sparse Layouts'', Computer Graphics International (CGI) 2023, 2023. PDF, Website
 4. T. Okada, Y. Huang, G. Hao, and S. Iizuka, K. Fukui, ''Low-Level Feature Aggregation Networks for Disease Severity Estimation of Coffee Leaves'', 18th International Conference on Machine Vision and Applications (MVA), 2023. Website
 5. Y. Huang, S. Iizuka, and K. Fukui, ''Free-View Expressive Talking Head Video Editing'', IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, h5-index: 123, top 3% paper), 2023. PDF, Website
-  ● Domestic Conference
 1. Y. Huang, S. Iizuka, and K. Fukui, ''Free-View Expressive Talking Head Video Editing'', Visual Computing 2023, 2023. (Invited talk)
 2. Y. Huang, S. Iizuka, and K. Fukui, ''Diffusion-based Semantic Image Synthesis from Sparse Layouts'', 26th Meeting on Image Recognition and Understanding, 2023. (Non-peer-reviewed, poster)
 3. Y. Huang, S. Iizuka, E. Simo-Serra, and K. Fukui, ''High-quality Multi-domain Artwork Generation from Semantic Layouts'', The 24th Meeting on Image Recognition and Understanding, 2021. (Peer-reviewed, short oral)
 4. Y. Huang, S. Iizuka, and K. Fukui, ''Controllable Artwork Synthesis via Two-stage Adversarial Networks'', The 23th Meeting on Image Recognition and Understanding, 2020. (Peer-reviewed, short oral)
Work
-  2024.04 - nowResearch EngineerCyberAgent AI LabR&D on image editing and generation, computer graphics, and computer vision.
Education
-  2021.04 - 2024.03 PhDComputer Science,University of Tsukuba, JapanDoctoral Thesis: Controllable Visual Content Synthesis with Deep Generative ModelsSupervisors: Assoc. Prof. Satoshi Iizuka, Prof. Kazuhiro Fukui
-  2017.10 - 2021.03 Research Student -> MasterComputer Science,University of Tsukuba, JapanMaster's Thesis: Controllable Multi-domain Semantic Artwork SynthesisSupervisors: Assoc. Prof. Satoshi Iizuka, Prof. Kazuhiro Fukui
-  2013.09 - 2017.06 
Awards
-  2024.03.25Department Chair's AwardUniversity of TsukubaAwarded by the Department of Computer Science Chair at the University of Tsukuba for outstanding performance during the Ph.D. program.
-  2023.06.04Top 3% Paper RecognitionIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023One of the top 3% of all accepted papers at the ICASSP 2023.
-  2021 - 2024Scholarship: "SPRING: Support for Pioneering Research Initiated by the Next Generation"Japan Science and Technology Agency (JST)A program to provide financial support (living & research allowance) for selected Ph.D students.
Skills
| Languages | |
| Chinese : Native speaker | |
| Japanese : Fluent | |
| English : Fluent | 
| Programming Languages | |
| Python : ★★★★★ | |
| C++ : ★★★☆☆ | 
| Deep Learning Frameworks | |
| PyTorch : ★★★★★ | |
| TensorFlow : ★★★☆☆ | 
| Tools & Platforms | |
| Linux : ★★★★★ | |
| Docker : ★★★★☆ | |
| GCP : ★★★★☆ | 
Interests
| Video Games | |
| Simulation | |
| Strategy | |
| RPG | 
| History | 
| Coding | |
| Implementing new ideas | |
| Automation of daily work | |
| Implementing SOTA models | 
| Sports | |
| Roller Skating | |
| Hiking | 
Projects
URL: https://sky24h.github.io/projects/
| ● Research | |
| 1. Online Demo for the paper "Free-View Expressive Talking Head Video Editing" | |
| 2. Online Demo for the paper "Controllable Multi-domain Semantic Artwork Synthesis" | |
| 3. Online Demo for the paper "Training-Free Zero-Shot Semantic Segmentation with LLM Refinement" | 
| ● Fun | |
| 1. A serverless GPU application that uses AnimateDiff to run a Text-to-Video task | |
| 2. A serverless GPU application that uses Stable Diffusion XL to run a Text-to-Image task | |
| 3. One-shot face animation using webcam, capable of running in real time. | |
| 4. Simple implementation using ChatGPT (and GPT-4) API, deployed as a Telegram Bot. |