Anxhelo Diko
PhD Student In Computer Science
A highly motivated and results-oriented Computer Vision Ph.D. student with a deep passion for advancing the field of artificial intelligence. My research focuses on building multimodal representations and understanding human activities, addressing key challenges for autonomous agents and AI in general. I have extensive experience with multimodal large language models for video captioning and question answering and a keen interest in view-invariant video representation learning. I am particularly committed to exploring how to effectively bridge the gap between representations of different modalities while preserving their unique characteristics. In addition to my research expertise, I have a strong engineering foundation honed through academic and industry experiences. Proficient in Python, C++, and CUDA, I excel at rapidly prototyping and implementing innovative ideas. I am eager to leverage my skills and knowledge to contribute to cutting-edge research and development in this dynamic field.