Anxhelo Diko

PhD Student In Computer Science

VisionLab Research Group, Sapienza University of Rome

Biography

A highly motivated and results-oriented Computer Vision Ph.D. student with a deep passion for advancing the field of artificial intelligence. My research focuses on building multimodal representations and understanding human activities, addressing key challenges for autonomous agents and AI in general. I have extensive experience with multimodal large language models for video captioning and question answering and a keen interest in view-invariant video representation learning. I am particularly committed to exploring how to effectively bridge the gap between representations of different modalities while preserving their unique characteristics. In addition to my research expertise, I have a strong engineering foundation honed through academic and industry experiences. Proficient in Python, C++, and CUDA, I excel at rapidly prototyping and implementing innovative ideas. I am eager to leverage my skills and knowledge to contribute to cutting-edge research and development in this dynamic field.

Interests

Artificial Intelligence
Computer Vision
Machine Learning
Deep Learning
Human Activity Understanding
Representation Learning
AI for medicine

Education

PhD in Computer Science, 2021 - ongoing
Sapienza University
MSc in Computer Science, 2018 - 2020
Sapienza University
BSc in Business Computer Science (a.k.a. Data Science), 2015 -2018
University of Tirana

Skills

Machin Learning

5+ Years

Deep Learning

5+ Years

Computer Vision

4+ Years

Research

3+ Years

Programming

5+ Years

Experience

Research Scientist (internship)

Meta Reality Labs

August 2025 – January 2026 Burlingame, California

Research on elastic and distributed VJEPA world models that can be deployed on diverse compute environments targetting cross-device communication.

Applied Scientist II Intern

Amazon Prime Video UK

March 2025 – August 2025 London, UK

Conducting research on the following areas of computer vision:

Multimodal Foundation Models
Video Quality Assesment
Sound Quality Assesment
Very Long Contenxtual Understanding

Computer Vision Research Scientist

Huawei Research Center Helsinki

March 2024 – September 2024 Helsinki, Finland

Conducting research on the following areas of computer vision:

Multimodal Large Language Models
Long-term Video Understanding
Dense video captioning
Temporal Event Localization
Question and Anserwing from Videos

Research Fellow

Sapienza University of Rome

March 2021 – Present Rome, Italy

Main responsibilities include:

Designing and implementing computer vision solutions for gait analysis through RGB videos that assist physicians in diagnosing patients with mobility disorders.
Designing and implementing machine learning approaches for medical image processing.
Designing and implementing machine learning approaches on resource optimization for post-intervention patients.

Machine Learning Specialist

MedLear srls

March 2020 – July 2021 Rome, Italy

Main responsabilities include:

Designing and implementing machine learning algorithms for respiratory diagnosis and prognosis by applying classification, regression, and segmentation techniques on CT images and patient medical history. The implemented solutions would provide MedLea with a suite of algorithms that could analyze patient data for different respiratory problems.
Deploying machine learning models.
Designing and implementing a parallel and scalable Ray-Tracing algorithm for GPUs for discretizing 3D mesh representation of geometries into a volumetric representation. The implemented algorithm would cut the computational costs of the services offered by MedLear by 30% in the preparation phase.
Manage MedLea computing infrastructure

Machine Learning Intern

PaperClicks

January 2018 – July 2018 Rome, Italy

Main Responsabilities include:

Responsible for designing and implementing a machine learning algorithm for the optimization of affiliate marketing campaigns enabling automated profits.

Accomplishments

Advanced C++ developer

Udemy Nov 2021

See certificate

Publications

Anxhelo Diko, Tinghuai Wang, Wassim Swaileh, Shiyan Sun, Ioannis Patras (2025). ReWind: Understanding Long Videos with Instructed Learnable Memory (CVPR2025).

PDF Code

Anxhelo Diko, Antonino Furnari, Luigi Cinque, Giovanni Maria Farinella (2025). LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces.

PDF Code

Anxhelo Diko, Danilo Avola, Bardh Prenkaj, Federico Fontana, Luigi Cinque (2024). S-GEAR: Semantically Guided Representation Learning for Action Anticipation (ECCV2024).

PDF Code

Anxhelo Diko, Danilo Avola, Marco Cascio, Luigi Cinque (2023). ReViT: Enhancing vision transformers with residual attention (Journal of Pattern Recognition).

PDF Code

Danilo Avola, Luigi Cinque, Anxhelo Diko, Alessio Fagioli, Gian Luca Foresti, Alessio Mecca, Daniele Pannone, Claudio Piciarelli (2021). MS-Faster R-CNN: Multi-stream backbone for improved Faster R-CNN object detection and aerial tracking from UAV images.

PDF

See all publications

Collaboration (inter-disciplinary)

Emanuele Agrimi, Anxhelo Diko, Daniele Carlotti, Andrea Ciardiello, Manash Borthakur, Stefano Giagu, Simone Melchionna, Cecilia Voena (2023). COVID-19 therapy optimization by AI-driven biomechanical simulations.

PDF

Danilo Avola, Irene Cannistraci, Marco Cascio, Luigi Cinque, Anxhelo Diko, Alessio Fagioli, Gian Luca Foresti, Romeo Lanzini, Maurizio Mancini, Alessio Mecca, Daniele Pannone (2022). A novel gan-based anomaly detection and localization method for aerial video surveillance at low altitude.

PDF

Danilo Avola, Luigi Cinque, Angelo Di Mambro, Anxhelo Diko, Alessio Fagioli, Gian Luca Foresti, Marco Raoul Marini, Alessio Mecca, Daniele Pannone (2021). Low-altitude aerial video surveillance via one-class SVM anomaly detection from textural features in UAV images.

PDF

Manash Pratim Borthakur, Sauro Succi, Fabio Sterpone, Franck Pérot, Anxhelo Diko, Simone Melchionna (2021). In-silico analysis of airflow dynamics and particle transport within a human nasal cavity.

PDF

Contact

Contact Me

diko@di.uniroma1.it
393497921839
Via Sorrento, Roma, Lazio 00177
Building 11, A side, 21th

Anxhelo Diko

PhD Student In Computer Science

VisionLab Research Group, Sapienza University of Rome

Biography

Skills

Experience

Accomplish­ments

Publications

Collaboration (inter-disciplinary)

Contact

Accomplishments