<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://emanuele.marconato.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://emanuele.marconato.github.io/" rel="alternate" type="text/html" /><updated>2026-05-13T08:52:32+00:00</updated><id>https://emanuele.marconato.github.io/feed.xml</id><title type="html">Ema’s Personal Page</title><subtitle>&quot;All our knowledge begins with the senses, proceeds then to the understanding, and ends with reason.&quot; - Immanuel Kant</subtitle><author><name>Emanuele Marconato</name></author><entry><title type="html">We have an accepted paper at ICML 2026!</title><link href="https://emanuele.marconato.github.io/blog/update-icml/" rel="alternate" type="text/html" title="We have an accepted paper at ICML 2026!" /><published>2026-04-25T19:34:30+00:00</published><updated>2026-04-25T19:34:30+00:00</updated><id>https://emanuele.marconato.github.io/blog/update-icml</id><content type="html" xml:base="https://emanuele.marconato.github.io/blog/update-icml/"><![CDATA[<h3 id="logit-distance-bounds-representational-similarity">Logit Distance Bounds Representational Similarity</h3>
<p>”
For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. [2025] that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models’ identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the resulting bound fails to provide nontrivial control in practice. As a consequence, KL-based distillation can match a teacher’s predictions while failing to preserve linear representational properties, such as linear-probe recoverability of human-interpretable concepts. In distillation experiments on synthetic and image datasets, logit-distance distillation yields students with higher linear representational similarity and better preservation of the teacher’s linearly recoverable concepts.
“ <br />
https://arxiv.org/pdf/2602.15438</p>]]></content><author><name>Emanuele Marconato</name></author><category term="blog" /><category term="Published" /><summary type="html"><![CDATA[Logit Distance Bounds Representational Similarity ” For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. [2025] that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models’ identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the resulting bound fails to provide nontrivial control in practice. As a consequence, KL-based distillation can match a teacher’s predictions while failing to preserve linear representational properties, such as linear-probe recoverability of human-interpretable concepts. In distillation experiments on synthetic and image datasets, logit-distance distillation yields students with higher linear representational similarity and better preservation of the teacher’s linearly recoverable concepts. “ https://arxiv.org/pdf/2602.15438]]></summary></entry><entry><title type="html">I will start a MSCA Postdoc at KU!</title><link href="https://emanuele.marconato.github.io/blog/MSCA/" rel="alternate" type="text/html" title="I will start a MSCA Postdoc at KU!" /><published>2026-02-28T19:34:30+00:00</published><updated>2026-02-28T19:34:30+00:00</updated><id>https://emanuele.marconato.github.io/blog/MSCA</id><content type="html" xml:base="https://emanuele.marconato.github.io/blog/MSCA/"><![CDATA[<h3 id="ai-cure-an-identifiable-and-causal-understanding-for-shortcut-reduction">AI-CURE: An Identifiable and Causal Understanding for shortcut REduction</h3>

<p>I am happy to share that in September 2026 I will join the CoCaLa group at the University of Copenhagen for a Marie-Skodolska Curie Postdoctral Fellowship. I will be working side-by-side with Prof. Sebastian Weichwald. Expect many turbulent updates on shortcuts, identifiability, and causal abstractions!</p>]]></content><author><name>Emanuele Marconato</name></author><category term="blog" /><category term="news" /><summary type="html"><![CDATA[AI-CURE: An Identifiable and Causal Understanding for shortcut REduction]]></summary></entry></feed>