SPIE The international society for optics and photonics

Transformers: A Powerful Tool for Image Analysis and Generation

Lauge Hansen

Level: Intermediate Length: 7 hours Format: In-Person Lecture Intended Audience: Students, researchers, and engineers from academia and industry with a good background of deep learning architectures like CNNs and LSTMs seeking to broaden and deepen their knowledge on state-of-the-art architectures. Description: Compared to the tremendous success models like GPT4, Bard, or Llama, which are all based on transformers, had in the field of text analysis, machine translation, and de-novo text generation, it took longer for transformers to enter the scene of image analysis and image generation. Reasons being the high demands in terms of compute and data. More successful approaches in training transformers have led to a change that will likely be as impactful as the introduction of CNNs for image classification. Already, first (still limited) “image foundation models” have been published, and also in medical image analysis, several attempts are being made to parallel the semantic understanding and emergence seen in large language models for image models. In this course, we explain the thought model and elementary mathematics of the attention mechanism underlying transformers. You will learn in theory and explore in hands-on work the reason for their modeling capacity and understand why this creates the need of larger training datasets. The course traces the development of transformers for image analysis tasks and shows ways to pre-train transformers on weak or unlabeled data. The course concludes with examples of applications used in medical image analysis tasks.
This is an interactive course and participants will need to bring their own laptops. Learning Outcomes: This course will enable you to: - describe the benefits transformer architectures offer over CNNs and their limitations - reproduce the elementary building blocks (self- and cross-attention) of transformer architectures - sketch approaches to modify transformers to perform image analysis tasks as well as generative modeling efficiently - understand and reproduce the implementation of a simple Visual Transformer network - give examples of transformers applied to medical image analysis Instructor(s): Hans Meine is a senior scientist who has been using machine learning for image analysis since 2002, and focused on various medical applications at Fraunhofer MEVIS since 2011. Since early 2016, he is organizing the internal training and coaching of Fraunhofer MEVIS staff for the new methodologies in Deep Learning, and now leads the "Image and Data Analysis“ competence area that incorporates both image and non-image data. Recently, his team scored top positions in the “Liver and Tumor Segmentation” challenges at ISBI and MICCAI 2017 using Deep Learning. Felix Thielke has been researching machine learning applications for medical image analysis at Fraunhofer MEVIS since 2019. He studied computer science with a focus on AI, cognition, and robotics at the University of Bremen, where he also gained years of experience in image analysis and deep learning through successful participation in the international RoboCup competitions. He is currently conducting his PhD research on deep learning based solutions to support the diagnosis and treatment of liver diseases. Markus Wenzel Event: SPIE Medical Imaging 2025 Course Held: 16 February 2025

Issued on

March 3, 2025

Expires on

Does not expire

Transformers: A Powerful Tool for Image Analysis and Generation

Lauge Hansen

Issued on

Expires on

Issue Credentials

Credential Recipients