MUHAMMAD RAMEEZ UR RAHMAN

Dottore di ricerca

ciclo: XXXVI


supervisore: Prof. Fabio Galasso

Titolo della tesi: Capitalizing on Self-supervision and Pre-trained Models in Computer Vision

This thesis addresses the overarching challenge of advancing computer vision tasks under the constraints of limited labeled data and the imperative to capitalize on pre-existing knowledge encoded in pre-trained models. By exploring three distinct computer vision tasks - classification, regression, and segmentation - this work presents diverse frameworks aimed at transcending the conventional boundaries imposed by data scarcity and task-specific methodologies. The first focus lies on Unsupervised Domain Adaptation (UDA) in visual recognition, a critical endeavor in bridging disparate visual domains for robust real-world performance. Existing approaches in UDA typically necessitate manual adaptation to specific backbone architectures, hindering adaptability over time as methods become outdated with evolving architectures. To circumvent this limitation, this thesis proposes a novel approach termed Adversarial Branch Architecture Search for UDA (ABAS). ABAS addresses the lack of target labels by employing a data-driven ensemble approach for model selection and explores auxiliary adversarial branches to drive domain alignment. Extensive validation on standard visual recognition datasets demonstrates ABAS's efficacy in enhancing modern UDA techniques, robustly yielding superior performances across diverse domains. In the realm of regression tasks, the thesis delves into collaborative human pose forecasting, an understudied domain with the potential for improved performance through exploiting the correlated motion patterns of interacting individuals. By revisiting prevalent single-person practices and tailoring them to the collaborative setting, significant advancements are achieved. Notably, the integration of frequency input representations, space-time separable interaction encodings, and fully-learnable interaction adjacencies into a Graph Convolutional Network (GCN) framework showcases promising results. Furthermore, a novel initialization procedure for spatial interaction parameters enhances both performance and stability, culminating in a substantial performance boost over state-of-the-art methods on benchmark datasets. Lastly, the thesis tackles semantic segmentation in autonomous driving scenarios, leveraging the unique capabilities of event cameras for low-latency operation in challenging lighting conditions. We introduce OVOSE, the first open-vocabulary semantic segmentation approach explicitly tailored for event-based data. \ourovose leverages knowledge distillation from pre-trained image-based models and synthetic event data to enhance segmentation performance. Additionally, we propose a novel dissimilarity network to recalibrate mask loss, mitigating the effects of sub-optimal reconstructions and enabling precise fine-tuning of the segmentation model. Through this novel approach, OVOSE demonstrates superior performance in dynamic environments, outperforming existing conventional image-based models and state-of-the-art methods in unsupervised domain adaptation for event-based semantic segmentation. In summary, this thesis presents a holistic approach to computer vision tasks, unifying disparate methodologies under the common goal of leveraging pre-trained models and limited labels to achieve superior performance across diverse domains. By addressing specific challenges within classification, regression, and segmentation tasks, the proposed frameworks contributes towards advancing the frontier of computer vision in real-world applications.

Produzione scientifica

11573/1686540 - 2023 - Best Practices for 2-Body Pose Forecasting
Rahman, Muhammad Rameez Ur; Scofano, Luca; De Matteis, Edoardo; Flaborea, Alessandro; Sampieri, Alessio; Galasso, Fabio - 04b Atto di convegno in volume
congresso: IEEE Conference on Computer Vision and Pattern Recognition (Vancouver, Canada)
libro: IEEE Conference on Computer Vision and Pattern Recognition Workshops - ()

11573/1617762 - 2022 - Adversarial Branch Architecture Search for Unsupervised Domain Adaptation
Robbiano, Luca; Rahman, Muhammad Rameez Ur; Galasso, Fabio; Caputo, Barbara; Maria Carlucci, Fabio - 04b Atto di convegno in volume
congresso: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (Waikoloa, HI, USA)
libro: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) - (978-1-6654-0915-5)

11573/1657716 - 2022 - Efficient and Refined Deep Convolutional Features Network for the Crack Segmentation of Solar Cell Electroluminescence Images
Wang, C.; Chen, H.; Zhao, S.; Rahman, M. R. U. - 01a Articolo in rivista
rivista: IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING (IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: subscription-service@ieee.org, INTERNET: http://www.ieee.org, Fax: (732)981-9667) pp. - - issn: 0894-6507 - wos: WOS:000875885900007 (9) - scopus: 2-s2.0-85136108259 (11)

11573/1657578 - 2022 - Boosting RGB-D salient object detection with adaptively cooperative dynamic fusion network
Zhu, Jinchao; Zhang, Xiaoyu; Fang, Xian; Rahman, Muhammad Rameez Ur; Dong, Feng; Li, Yuehua; Yan, Siyu; Tan, Panlong - 01a Articolo in rivista
rivista: KNOWLEDGE-BASED SYSTEMS (Butterworth Heinemann Publishers:Linacre House Jordan Hill, Oxford OX2 8DP United Kingdom:011 44 1865 314569, EMAIL: bhmarketing@repp.co.uk, INTERNET: http://www.laxtonsprices.co.uk, Fax: 011 44 1865 314569) pp. 109205- - issn: 0950-7051 - wos: WOS:000827395000007 (4) - scopus: 2-s2.0-85133801192 (8)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma