INSTITUTE COURSES




Cindy Xiaon Bearfield  (Georgia Institute of Technology, United States)   \    Mehul Bhatt  (Örebro University, Sweden)   \    Emmanuelle Dietz  (Airbus Hamburg, Germany)   \    Thomas Eiter  (Vienna University of Technology, Austria)   \    Paul Hemeren  (University of Skövde, Sweden)   \    Árni Kristjánsson  (University of Iceland, Sweden)    \    Vasiliki Kondyli  (Jagiellonian University, Poland)    \    Oliver Kutz  (Free University of Bozen-Bolzano, Italy)    \    Antonio Lieto  (University of Salerno, Italy)    \    Clayton Lewis  (University of Colorado - Boulder, United States)    \    Alessandra Russo  (Imperial College London, United Kingdom)    \    Jakob Suchan  (Constructor University Bremen, Germany)    \    Ilaria Tiddi  (Vrije Universiteit Amsterdam, Netherlands)    \    Barbara Tversky  (Stanford University + Columbia University, United States)     

INSTITUTE 2024   /   HANDBOOK

Complete course details including information about all aspects of Institute 2024 can be obtained in the: Institute Brochure (PDF)
Information about the Lectures and Tutorials by institute faculty are included below:


I.  LECTURES


Designs to Support Better Visual Data Communication

Prof. Cindy Xiaong Bearfield
GEORGIA INSTITUTE OF TECHNOLOGY, United States   /   

Well-chosen data visualizations can lead to powerful and intuitive processing by a viewer, both for visual analytics and data storytelling. When badly chosen, visualizations leave important patterns opaque or misunderstood. So how can we design an effective visualization? I will share several empirical studies demonstrating that visualization design can influence viewer perception and interpretation of data, referencing methods and insights from cognitive psychology. I leverage these study results to design natural language interfaces that recommend the most effective visualization to answer user queries and help them extract the ‘right’ message from data. I then identify two challenges in developing such an interface. First, human perception and interpretation of visualizations is riddled with biases, so we need to understand how people extract information from data. Second, natural language queries describing takeaways from visualizations can be ambiguous and thus difficult to interpret and model, so we need to investigate how people use natural language to describe a specific message. I will discuss ongoing and future efforts to address these challenges in the real world, providing concrete guidelines for visualization tools that help people more effectively explore and communicate data.



Visuospatial Commonsense
- On Neurosymbolic Reasoning and Learning about Space and Motion

Prof. Mehul Bhatt
Örebro University, SWEDEN   /   

This talk addresses computational cognitive vision and perception at the interface of (spatial) language, (spatial) logic, (spatial) cognition, and artificial intelligence. Summarising recent works, I present general methods for the semantic interpretation of dynamic visuospatial imagery with an emphasis on the ability to (neurosymbolically) perform abstraction, reasoning, and learning with cognitively rooted structured characterisations of commonsense knowledge pertaining to space and motion. I will particularly highlight:

The presented works -demonstrated in the backdrop of applications in autonomous driving, cognitive robotics, visuoauditory media, and cognitive psychology- are intended to serve as a systematic model and general methodology integrating diverse, multi-faceted AI methods pertaining Knowledge Representation and Reasoning, Computer Vision, and Machine Learning towards realising practical, human-centred, computational visual intelligence. I will conclude by highlighting a bottom-up interdisciplinary approach -at the confluence of Cognition, AI, Interaction, and Design Science- necessary to better appreciate the complexity and spectrum of varied human-centred challenges for the design and (usable) implementation of (explainable) artificial visual intelligence solutions in diverse human-system interaction contexts.



Hybrid Answer Set Programming and its Use for Visual Question Answering

Prof. Thomas Eiter
Vienna University of Technology, AUSTRIA   /   

Visual Question Answering (VQA) is concerned with answering a question, posed in natural language, about a visual scene shown in an image or possibly also in a video sequence. It is challenging task that requires processing multi-modal input and reasoning capabilities to obtain the correct answer, and it enables applications in a range of areas such as medicine, assistance for blind people, surveillance, and education. Neuro-symbolic approaches tackle the problem by employing a modular architecture in which components based on subsymbolic and symbolic AI take care of different subtasks such as object recognition, language parsing, scene representation, and inference, respectively, which may be accomplished by deep neural networks and symbolic reasoning components. Answer Set Programming (ASP), a well-known approach to declarative problem solving, is a versatile formalism for realizing the latter. In this lecture, we consider ASP for addressing VQA. To this end we discuss extensions of ASP towards subsymbolic AI, so called hybrid AI, comprising both reasoning and learning, for this purpose. Furthermore, we dscuss challenges that VQA poses to ASP for future research, in order to unleash its potential for developing transparent and explainable VQA.



Human Action recognition, cognition, and computational models in relation to formal and cognitive foundations for human-centered computing

Dr. Paul Hemeren
University of Skövde, SWEDEN   /   

This lecture will describe the key questions and results of the human attention of visual point-light action displays that include hand movement, walking, and social movement. The major factor of local and global movement processing is demonstrated to show the clear differences between low-level kinematics and global form features. This will also explain the necessity of using different kinds of stimuli to affect human action attention in relation to kinematics and semantics. In theoretical fields, contributions of visual familiarity and visual orientation towards action recognition and discrimination have implications in fields like cognitive neuroscience, cognitive psychology, and neurophysiology. In practical approaches, this knowledge of distinction could contribute to computational models of biological motion, like models for visual learning and action recognition, familiarity-based attention models, and action recognition for human-robot interaction. It can also help to make better interface designs for interactive games and augmented and virtual reality-based systems.
Understanding human action recognition by using point-light displays of biological motion allows us to then compare the accuracy of computational models in relation to human cognitive and perceptual factors. This area can be used to demonstrate some of the modality factors in human action recognition as well as the possible relationship between modality factors and levels of action and event perception. This lecture will present findings about different levels of action and event perception as well as direct comparisons between computational models and human cognition and perception using point-light displays of biological motion. A key question is then to evaluate the similarities and differences between human processing and computational models. To what extent should AI-development using multimodality computation in human-machine interaction be concerned about the relation between processes and results? What role should this comparison (computational models and human cognition) have in understanding human cognition?


Priming of probabilistic visual templates

Prof. Árni Kristjánsson
University of Iceland, ICELAND   /   

Attentional priming has a dominating influence on vision, speeding visual search, releasing items from crowding, reducing masking effects, and during free-choice, primed targets are chosen over unprimed ones. Many accounts postulate that templates stored in working memory control what we attend to and mediate the priming. But what is the nature of these templates (or representations)? Analyses of real-world visual scenes suggest that tuning templates to exact color or luminance values would be impractical since those can vary greatly because of changes in environmental circumstances and perceptual interpretation. Tuning templates to a range of the most probable values would be more efficient. Recent evidence does indeed suggest that the visual system represents such probability, gradually encoding statistical variation in the environment through repeated exposure to input statistics. This is consistent with evidence from neurophysiology and theoretical neuroscience as well as computational evidence of probabilistic representations in visual perception. I argue that such probabilistic representations are the unit of attentional priming and that priming of, say, a repeated single-color value simply involves priming of a distribution with no variance. This "priming of probability" view can be modelled within a Bayesian framework where priming provides contextual priors. Priming can therefore be thought of as learning of the underlying probability density function of the target or distractor sets in a given continuous task.


The effect of environmental complexity on everyday visual attention

Dr. Vasiliki Kondyli
Jagiellonian University - Kraków, POLAND   /   

Visuospatial attention is critical in many everyday activities, especially those involving embodied multimodal interaction between humans and their surrounding environment. Driving, cycling, or navigating an urban environment, are some of these complex activities which require maintenance of situational awareness of the surrounding environment while at the same time, people need to perform planning and execution of control actions (steering, braking). To investigate the dynamic environment where these everyday activities take place, we extract visuospatial characteristics including clutter, geometry, motion, etc. and construct a cognitive model of visuospatial complexity. In a series of behavioral studies, conducted in the real world and virtual environments, we combine qualitative and quantitive methods to explore the effect of visuospatial complexity on visual attention along these continuous everyday activities. The findings reveal a critical effect of visuospatial complexity on high-level visual processing, where an increase in complexity leads to a substantial increase in change blindness performance. However, the results also show mitigation strategies employed as a response to the load, by adjusting their focus and avoiding non-productive forms of attentional elaboration. These outcomes uncover implications for driving education, driving assistance technologies, as well as the design of immersive media.


Predictive modeling and human cognition: Theory and application

Prof. Clayton Lewis
University of Colorado - Boulder, UNITED STATES   /   

The unexpected success of predictive Large Language Models on a wide range of tasks adds support to the idea that prediction is a fundamental cognitive process. The fact that these models have no in-built structures, or even intended provisions for creating structures as we think of them, suggests that Harold Garfinkel's skeptical rejection of structures and rules in communication and cognition can be seen a new constructive light. On the other hand, the possibility that structures of more or less familiar kinds emerge during training also needs consideration. Possible applications of LLMs and allied technologies in supporting new interaction techniques, including enhanced support for people with disabilities, also deserve attention.



Using logic to represent and combine concepts

Prof. Oliver Kutz
Free University of Bozen-Bolzano, ITALY   /   

We will discuss a number of approaches to model concepts with the help of logic, and illustrate how such modelling approaches are capable (or not) of modelling certain psychological effects as they are displayed in the use of concepts by humans. We will then introduce in some more detail the recent framework of weighted description logics, also called perceptron or 'tooth' logic, and illustrate some of the features and benefits of such logics in modelling and reasoning with concepts and prototype concepts.




Cognitive Design for AI Systems with Human-Like Reasoning

Prof. Antonio Lieto
University of Turin and ICAR-CNR, ITALY   /   

Commonsense reasoning is one of the main open problems in the field of Artificial Intelligence (AI) while, on the other hand, seems to be a very intuitive and default reasoning mode in humans and other animals. In this lecture, I - via two different case studies concerning commonsense categorization and knowledge invention tasks - how cognitively inspired heuristics can help (both in terms of efficiency and efficacy) in the realization of intelligent artificial systems able to reason in a human-like fashion, with results comparable to human-level performances.



Neuro-Symbolic AI and its Role in Robust and Interpretable AI-driven Decision-Making

Prof. Alessandra Russo
Imperial College London, UNITED KINGDOM   /   

AI has recently seen rapid advances. But to bring positive transformative change in domains such as healthcare, AI-technologies need to be transparent and interpretable by humans, whilst capable of processing large quantities of (unstructured) data. Learning interpretable models from data is one of the main open challenges of AI. Symbolic Machine Learning, a field of Machine Learning, offers algorithms and systems for learning interpretable models that explain data in the context of a given domain knowledge. In this lecture, I will overview state-of-the-art symbolic machine learning systems capable of learning different classes of interpretable models for solving real-world problems from (structured) data, in a manner that is data efficient, scalable, and robust to noise. I will then present neuro-symbolic architectures that integrate such systems with machine learning to learn complex interpretable knowledge from multi-modal (unstructured) data. Finally, I will present a number of applications including domains such as healthcare and security.




Neurosymbolic Learning:
On Generalising Relational Visuospatial and Temporal Structure

Prof. Jakob Suchan
Constructor University, GERMANY   /   

We present recent and emerging research aimed at developing a general framework for structured spatio-temporal learning from multimodal human behavioural stimuli. The framework and its underlying general, modular methods serve as a model for the application of integrated (neural) visuo-auditory processing and (semantic) relational learning foundations for applications (primarily) in the behavioural sciences. Furthermore, the lecture will situate neurosymbolic learning within the broader context of cognitive vision and perception research aimed developing general methods for commonsense reasoning with cognitively rooted structured characterisations of knowledge pertaining to space and motion.


Knowledge Engineering methods for Hybrid Human-Artificial Intelligence

Prof. Ilaria Tiddi
Vrije Universiteit Amsterdam, NETHERLANDS   /   

Hybrid Human-Artificial Intelligence is a rapidly growing field aiming at creating collaborative systems where humans and intelligent machines cooperate in mixed teams towards shared goals. In this lecture, we will discuss how symbolic AI techniques (knowledge graphs and semantic technologies) can help solving the main challenges for a hybrid human-AI collaboration in combination with the most popular subsymbolic (machine learning) methods. We will start by learning how to model information using the RDF/RDFS/OWL languages and store it as knowledge graphs, then introduce methods to reason over and query such graphs, and finally discuss how ontologies and knowledge engineering methods can be used to design Hybrid Intelligence applications.


How Graphics and Gesture Work

Prof. Barbara Tversky
Stanford University, and Columbia University, UNITED STATES   /   

There are many ways to think and communicate. Language is quite popular. But babies gesture before they speak and visualizations of thought predate written language by millennia. We will show the many ways that these media use things in space, actions in space, and place in space spontaneously and naturally to think and communicate.




II. TUTORIALS


Reasoning in Cognitive Argumentation

Dr. Emmanuele Dietz
Airbus Hamburg, Germany   /   

Cognitive Argumentation is a computational framework for dialectic argumentation-based reasoning, built from a theoretical framework of argumentation in AI and grounded via cognitive principles from Cognitive Science. Starting from the observation that humans often deviate from classical logic whenreasoning in everyday life, these extra-logical patterns will be formalized as cognitive principles in Cognitive Argumentation. We will also show an integration of Cognitive Argumentation into ACT-R, a cognitive architecture, where the argumentation process is guided by the context through the spreading activation of chunks, bridging to lower levels of cognition.



Spatial Cognition and AI:
Methods for In-The-Wild Behavioural Research in Visual Perception

CoDesign Lab EU
Cognition. AI. Interaction. Design.

The tutorial on Spatial Cognition and Artificial Intelligence addresses the confluence of empirically-based behavioural research in the cognitive and psychological sciences with computationally-driven analytical methods rooted in artificial intelligence and machine learning. This confluence is addressed in the backdrop of human behavioural research concerned with naturalistic, in-the-wild, embodied multimodal interaction. The tutorial presents:

The main technical focus of the tutorial is to provide a high-level demonstration of general AI-based computational methods and tools that can be used for multimodal human behavioural studies. Of special focus are visuospatial, visuo-locomotive, and visuo-auditory cognitive experiences in the context of application areas such as architecture and built environment design, narrative media design, product design, cognitive media studies, and autonomous cognitive systems (e.g., robotics, autonomous vehicles). Presented methods are rooted in foundational research in artificial intelligence, spatial cognition and computation, spatial informatics, human-computer interaction, and design science. The tutorial utilises case-studies to demonstrate the application of the foundational practical methods and tools. This will also involve practical examples from large-scale experiments in domains such as evidence-based architecture design, communication and media studies, and cognitive film studies.