Charting trajectories of human thought using large language models

VECTOR – a computational framework which casts a participant's verbal reports as geometric trajectory through a cognitive map representation, revealing how thoughts flow from one idea to the next.

Pasted image 20250924232931.png

The "How": The Computational Pipeline

The framework transforms speech into a geometric trajectory through three primary steps:

  1. Segmentation: The narrative is first parsed into "utterances"—sentence-like units that define a single narrative concept (e.g., “Cinderella lives with her stepsisters”). This step identifies "mental event boundaries" where the person’s thought transitions from one idea to the next.

  2. Vector Embedding: Each utterance is mapped into a high-dimensional (1536D) semantic space using a pretrained LLM (such as OpenAI’s text-embeddings-3). While this captures the meaning of words, it lacks the "contextualising information" of how humans actually relate concepts in a specific task.

  3. Concept Decoding: This is the core alignment step. The high-dimensional AI vectors are transformed into a low-dimensional, sparse "schema space". This space is specifically built around "schema events" (8 for Cinderella, 11 for daily routines). By mapping the AI vectors onto these human-meaningful events, the framework creates a "map" that mirrors human cognitive organisation.

The geometric trajectory is formed by the sequence of these points in the schema space. The path shows how a participant "navigates" from one conceptual state (e.g., "the ball") to the next (e.g., "midnight") over time.

What Kind of Speech?

The speech is not entirely free-form; it is based on fixed task framings designed to evoke specific mental structures.

Fixed Themes: All 1,100 participants were asked to provide narratives on two specific themes: the story of Cinderella and typical daily routines.

Constraint: These themes "condition" participants to structure their knowledge in a predictable sequence (a "schema"), which provides a "ground truth" for researchers to compare different people’s thought paths.

Format: Participants provided typed narratives of at least 100 words.

Experimental Design and Setup

The experiment was designed to capture the "flow" of thought in a way that mirrors spoken language.

Participants: 1,100 human volunteers were recruited via Prolific Academic.

Unique Interface: To simulate the sequential nature of speech and thought, the researchers used an unconventional typing interface where participants could only see the current word on the screen. The word disappeared as soon as the participant pressed the space or return key.

Data Captured: The setup allowed researchers to measure inter-word response times (RTs). They found that people naturally slowed down at the "utterance boundaries" identified by the VECTOR framework, proving that the model's "jumps" in the geometric trajectory correspond to real-world human pauses in thinking.

Validation: The researchers tested the "meaningfulness" of these trajectories by checking for alignment (do different people follow similar paths?) and momentum (do the thoughts move forward in a directed way?).

source: https://arxiv.org/abs/2509.14455

#nlp #embeddings #semantic #thoughts #cognitive