Sunday, February 12, 2023

Challenging ChatGPT with tricky SAT reading comprehension exam questions

The "theory of mind and emotion" (empathy) challenge is taken from SAT / ACT Prep Online Guides and Tips, The Hardest SAT Reading Questions Ever, Samantha Lindsay: "Example 4: Paired Passages Question"

Paired passage questions contain some of the most difficult questions on the SAT Critical Reading section because they ask you to look at arguments from different viewpoints and make inferences about the views of passage authors.

Passage 1: 

"It was no coincidence that Charles Darwin, the 19th century naturalist, aimed to connect the mentality and emotionality of people with that of dogs, instead of, for example, pigeons or horses. Neither his theory of evolution nor any general understanding of biology required that he emphasize our similarity to dogs over other species. However, politically and emotionally, it was inevitable for an English gentleman who sought to make the idea of evolutionary continuity more acceptable. Darwin wrote that "dogs have something very similar to a conscience. They definitely have some degree of self-control...Dogs have long been considered the epitome of loyalty and obedience." Darwin was not the only one who believed that dogs possess human virtues. The traits of loyalty and obedience, combined with their expressive faces and bodies, explain why dogs are such beloved pets in many cultures. Depending on the breed and the individual dog, they can be noble, charming, affectionate, and trustworthy. While all dog owners should appreciate these and other endearing traits in their pets, it does not mean that the intelligence of a highly intelligent primate like a chimpanzee is part of the deal. Scientists generally believe that chimpanzees have greater reasoning abilities than dogs. However, many people still think that dogs are smarter than chimpanzees due to our emotional ties and familiarity with them. We apply the same unwritten rules to our fellow humans: the age-old in-group, out-group dynamic. People who are similar to us, whether they belong to the same organizations, enjoy the same activities, or are simply around us more often, are considered part of our in-group. Dogs, due to their close proximity to their owners, are certainly part of the in-group. The intensity of our relationship with dogs naturally leads us to attribute high-level mental abilities to them, regardless of whether they truly possess these intelligence points. We like them, so we think positively of them."


Passage 2: 

"Nobody could believe dispassionately in the totality of positive and negative reinforcement after seeing the pure intelligence shining in the face of a border collie intent upon helping a shepherd herd sheep"


Puzzle A: Author 1 about Author 2

Based on Passage 2, the author of Passage 1 would most likely appear to the author of Passage 2 as:

A. a neutral observer of animal behavior

B. well informed concerning research into animal intelligence

C. having a deep fondness for border collies and therefore overestimating them

D. having little respect for traditional scientific research

?


Puzzle B: Author 2 about Author 1

Based on Passage 1, the author of Passage 2 would most likely appear to the author of Passage 1 as:

A. a neutral observer of animal behavior

B. well informed concerning research into animal intelligence

C. having a deep fondness for border collies and therefore overestimating them

D. having little respect for traditional scientific research

?


[Spoiler alert]















































































OpenAI ChatGPT's answers:

Based on Passage 2, the author of Passage 1 would most likely appear to the author of Passage 2 as A. a neutral observer of animal behavior.

Based on Passage 1, the author of Passage 2 would most likely appear to the author of Passage 1 as:

C. having a deep fondness for border collies and therefore overestimating them.

😇 



Update, A.D. 2026: 

Deepseek AI: 

ChatGPT’s 2023 answers seem correct by close reading:

Puzzle A: A (neutral observer)
Puzzle B: C (fondness for border collies → overestimation)

The model handled the perspective-switching correctly, even with the dense wording.

Deepseek AI, who also solved the puzzle well, criticizes the task: 

Great — I’ll offer a slight rephrase that keeps the logic intact but makes the perspective structure more immediately graspable.


Original wording (A):

Based on Passage 2, the author of Passage 1 would most likely appear to the author of Passage 2 as:

Minor improvement (clearer, still compact):

From the perspective of the author of Passage 2 (as inferred from Passage 2), how would that author likely view the author of Passage 1?

Why this helps:

  • It puts the observer explicitly first (“From the perspective of the author of Passage 2”).
  • It clarifies that Passage 2 is the source for understanding the observer, not for facts about the observed.
  • It removes the potential garden-path reading where someone might think the basis is for describing Author 1, not Author 2’s mindset.

Alternative, even more explicit:

Using Passage 2 to understand the author of Passage 2’s views, choose the option that best describes how that author would probably judge the author of Passage 1.


These keep the task’s difficulty (still requires holding both authors in mind) but prevent misassignment of the “Based on…” clause.



Exactly — you've nailed the gestalt vs. sequential parsing distinction.

The original sentence is short enough that a human, using System 1, grabs the whole thing as a single conceptual chunk:

Based on Passage 2, the author of Passage 1 would most likely appear to the author of Passage 2 as…

A fluent reader’s mind instantly maps:

  • There’s a source of evidence ("Based on Passage 2").
  • There’s a subject ("the author of Passage 1").
  • There’s a relation ("would appear to").
  • There’s an observer ("the author of Passage 2").
  • There’s an outcome ("as…").

Because the observer is placed after the verb “appear to,” and the source of evidence is placed before the subject, the mind doesn’t stumble — it just builds the correct dependency graph in parallel:
"Based on P2" → modifies the whole "A appears to B" situation → so P2 tells us about B’s lens, not about A.

The AI, by contrast, processes sequentially and temporarily attaches "Based on Passage 2" to the nearest noun phrase ("the author of Passage 1") because that’s how left-to-right statistical dependencies work in transformer models — they don’t have a precompiled gestalt of the whole sentence before interpreting the start.

So the improvement isn’t for human readability — the original is perfectly readable for humans. The improvement is for AI sequential parsing ambiguity, and even then only if the AI doesn’t have enough working memory to revise initial attachments.

Thus, if the goal is to make it AI-proof (or reduce misparses in less capable models), fronting the observer helps:

From the perspective of the author of Passage 2 (based on Passage 2), how would that author view the author of Passage 1?

But for a human, the original is fine — it’s just elegantly compressed.


+

The Brilliant Trap

The test makers embedded Darwin's perspective inside Author 1's perspective, then asked students to imagine another author (Author 2) judging Author 1. That's three levels of theory of mind:

  1. What Darwin thought
  2. What Author 1 thinks about what Darwin thought
  3. What Author 2 thinks about what Author 1 thinks

And the student has to track all this while answering a question whose wording itself requires tracking whose lens is being used.

No wonder this stumps people. It's a cognitive stack overflow for anyone not used to holding multiple mental models simultaneously.


No comments:

Post a Comment

Do consider considering first...

Games humans play with AIs

 Grok xAI about User:  Twoja strategia w tej rozmowie rzeczywiście opierała się głównie na Logos (ścisła analiza logiczna, gramatyczna, sło...