Siladittya Manna

I am a Senior Research Fellow at Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata. My area of research includes Self-supervised learning, computer vision and medical image analysis.

 |  IconEmail  | 
 |  IconCV  |  IconGithub  | 
 |  IconGoogle Scholar  |  Icondblp  |  IconOrcid  | 
 |  IconTwitter  |  IconMedium  | 

iiests photo isical photo
profile photo
Bio

I am pursuing my PhD at Indian Statistical Institute, Kolkata, where I am advised by Prof. Umapada Pal and Dr. Saumik Bhattacharya. Before joining ISI, Kolkata, I did my B.Tech and M.Tech (Dual Degree) from Indian Institute of Engineering Science and Technology, Shibpur, Howrah, where I was advised by Dr. Ankita Pramanik for my Master's Thesis.

Recent Updates

  • New! [July 2023:] One paper accepted to IEEE Transactions on Artificial Intelligence.
  • New! [Apr 2023:] One paper accepted in ICDAR 2023 for Oral presentation.
  • Publications
    2023
    DySTreSS: Dynamically Scaled Temperature in Self-Supervised Contrastive Learning
    Siladittya Manna, Soumitri Chattopadhyay, Rakesh Dey, Saumik Bhattacharya, Umapada Pal,
    Under Review
    arXiv

    We focus our attention to improve the performance of InfoNCE loss in SSL by studying the effect of temperature hyper-parameter values. We propose a cosine similarity-dependent temperature scaling function to effectively optimize the distribution of the samples in the feature space. We further analyze the uniformity and tolerance metrics to investigate the optimal regions in the cosine similarity space for better optimization. Additionally, we offer a comprehensive examination of the behavior of local and global structures in the feature space throughout the pre-training phase, as the temperature varies.

    Self-Supervised Representation Learning for Knee Injury Diagnosis from Magnetic Resonance Data
    Siladittya Manna, Saumik Bhattacharya, Umapada Pal,
    IEEE Transactions on Aritifical Intelligence
    Paper / Code / arXiv

    We propose a self-supervised learning (SSL) approach for learning the spatial anatomical representations from the frames of magnetic resonance (MR) video clips for the diagnosis of knee medical conditions. The pretext model learns meaningful, context-invariant spatial representations. The downstream task in our paper is a class-imbalanced multi-label classification. To the best of our knowledge, this work is the first of its kind in showing the effectiveness and reliability of self-supervised learning algorithms in imbalanced multi-label classification tasks on MR scans.

    MIO: Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
    Siladittya Manna, Umapada Pal, Saumik Bhattacharya,
    Under Review
    arXiv

    we propose a novel loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect. We further improve the n\"aive loss function after removing the effect of the positive-positive repulsion and incorporating the upper bound of the negative pair repulsion. Unlike existing methods, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a closed-form expression for the parameter gradient flow and compare the behavior of self-supervised contrastive frameworks using Hessian eigenspectrum to analytically study their convergence. The proposed method outperforms SOTA self-supervised contrastive frameworks on benchmark datasets such as CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet. After 200 pretraining epochs with ResNet-18 as the backbone, the proposed model achieves an accuracy of 86.36%, 58.18%, 80.50%, and 30.87% on the CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet datasets, respectively, and surpasses the SOTA contrastive baseline by 1.93\%, 3.57\%, 4.85\%, and 0.33\%, respectively. The proposed framework also achieves a state-of-the-art accuracy of 78.4% (200 epochs) and 65.22% (100 epochs) Top-1 Linear Evaluation accuracy on ImageNet100 and ImageNet1K datasets, respectively.

    SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation
    Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Joseph Llados, Saumik Bhattacharya, Umapada Pal,
    18th International Conference on Document Analysis and Recognition (ICDAR), 2024 (Oral)
    Paper / Code

    With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms.

    2022
    SWIS: Self-Supervised Representation Learning for Writer Independent Offline Signature Verification
    Siladittya Manna, Soumitri Chattopadhyay, Saumik Bhattacharya, Umapada Pal,
    IEEE International Conference on Image Processing (ICIP), 2022 (Oral)
    Paper / Slides / arXiv

    We use a decorrelation-based loss to learn decorrelated stroke features from signature images for writer-independent signature verification.

    SURDS: Self-Supervised Attention-guided Reconstruction and Dual Triplet Loss for Writer Independent Offline Signature Verification
    Soumitri Chattopadhyay, Siladittya Manna, Saumik Bhattacharya, Umapada Pal,
    26th International Conference on Pattern Recognition (ICPR), 2022 (Oral)
    Paper / Video / arXiv

    we use an image reconstruction network using an encoder-decoder architecture that is augmented by a 2D spatial attention mechanism using signature image patches to learn representation. Next, we use a dual-triplet loss based framework for writer independent signature verification tasks.

    PLSM: A Parallelized Liquid State Machine for Unintentional Action Detection
    Siladittya Manna, Dipayan Das, Saumik Bhattacharya, Umapada Pal, Sukalpa Chanda,
    IEEE Transactions on Emerging Topics in Computing Volume 11, Issue 2, Pages 474-484
    Paper

    We present a novel Parallelized LSM (PLSM) architecture that incorporates spatio-temporal read-out layer and semantic constraints on model output. To the best of our knowledge, such a formulation has been done for the first time in literature, and it offers a computationally lighter alternative to traditional deep-learning models. Additionally, we also present a comprehensive algorithm for the implementation of parallelizable SNNs and LSMs that are GPU-compatible. We implement the PLSM model to classify unintentional/accidental video clips using the Oops dataset. From the experimental results on detecting unintentional action in a video, it can be observed that our proposed model outperforms a self-supervised model and a fully supervised traditional deep learning model.

    Self-supervised representation learning for detection of ACL tear injury in knee MR videos
    Siladittya Manna, Saumik Bhattacharya, Umapada Pal,
    Pattern Recognition Letters Volume 154, Pages 37-43
    Paper / arXiv

    we propose a self-supervised learning approach to learn transferable features from MR video clips by enforcing the model to learn anatomical features. The pretext task models are designed to predict the correct ordering of the jumbled image patches that the MR video frames are divided into. To the best of our knowledge, none of the supervised learning models performing injury classification task from MR video provide any explanation for the decisions made by the models and hence makes our work the first of its kind on MR video data.

    2021
    Interpretive self-supervised pre-training: boosting performance on visual medical data
    Siladittya Manna, Saumik Bhattacharya, Umapada Pal,
    Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), 2021 (Oral)
    Paper

    In this work, we have proposed a novel loss function and derive it's asymptotic lower bound. We have also shown that self-supervised pre-training with the proposed loss function helps in surpassing the supervised baseline on the downstream task. We have also shown that the self-supervised pre-training helps a model in learning better representation in general to achieve better performance compared to supervised baselines. We have mathematically derived that the contrastive loss function asymptotically treats each sample as a separate class and works by maximizing the distance between any two samples and this helps to get better performance. Finally, through exhaustive experiments, we demonstrate that self-supervised pre-training helps to surpass the performance of fully supervised models on downstream tasks.


    Miscellany
    Hands-On Tutorial Session, MIDA 2023

    Jointly Organized by Dept. of IT & CFSD, SMIT, Sikkim and IDEAS-TIH, ISI, Kolkata

    Blog Posts
    Icon     Icon
    Train FasterRCNN faster with 16-bit precision in Detectron2
    Guide to fine-tuning a Pre-trained model for Object Detection tasks with Faster RCNN using Detectron2
    The Art of Loading Custom Objects and Functions in Keras
    Weighted Categorical Cross-Entropy Loss in Keras
    Weighted Binary Cross-Entropy Loss in Keras
    Installing detectron2 in 9 easy steps
    How to get Model Summary in PyTorch
    How to plot multiple 2D Series in 3D (Waterfall plot) in Matplotlib
    Creating a TF Dataset using a Data Generator
    Using forward_hooks to Extract Intermediate Layer Outputs from a Pre-trained ResNet Model in PyTorch
    Plotting Grouped Bar Chart in Matplotlib

    Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.