Websites

Essays, blog posts, and online resources on AI safety and related ideas.

Browse this category in the interactive library →

Situational Awareness

Leopold Aschenbrenner

Aschenbrenner's comprehensive analysis of near-term scaling dynamics, capability trajectories, and the strategic implications of rapid AI progress for labs and states.

Intermediate

World Models

Jürgen Schmidhuber

Research on how agents can learn internal world models to plan complex behavior, relevant to understanding how AI systems develop representations of their environment.

Advanced

Agent Models

Agent Models

Formal models of agents and decision theory with alignment-relevant curriculum, covering utility, planning, and the theoretical foundations of agent behavior.

Advanced

AI Alignment World

AI Alignment World

In-depth technical alignment resources—research, explainers, and references for the AI alignment problem.

Intermediate

AI Safety Info (Stampy's FAQ)

StampyAI

Community-maintained FAQ covering AI safety questions at every level, from basics to technical details, with links to source material.

Beginner

Alignment Forum

Center for Applied Rationality

The primary venue for technical AI alignment discussion, where researchers post and debate new ideas, proposals, and critiques.

Advanced

Alignment Newsletter

Rohin Shah

Weekly summaries of alignment research with commentary, the best way to stay current on the field's output without reading every paper.

Intermediate

Arbital

Arbital

Hyperlinked explainers on rationality, AI risk, and alignment concepts, designed for building understanding incrementally.

Intermediate

Eliezer Yudkowsky's blog

Eliezer Yudkowsky

Essays on rationality, decision theory, and AI risk from the researcher who shaped the field's early arguments and threat models.

Intermediate

Victoria Krakovna's blog

Victoria Krakovna

Research notes on specification gaming, side effects, and AI safety from a DeepMind safety researcher, including the widely-cited specification gaming examples list.

Advanced

OpenAI Research

OpenAI

OpenAI's research blog covering capabilities and safety, including superalignment updates, red teaming results, and governance thinking.

Intermediate

Transformer Circuits

Anthropic / community

The home of mechanistic interpretability research, publishing detailed analyses of how transformer models represent and process information internally.

Advanced

ML Safety Newsletter

ML Safety

Newsletter on ML safety covering robustness, monitoring, alignment, and systemic risk with links to recent papers and commentary.

Intermediate

Jacob Steinhardt's blog

Jacob Steinhardt

Research and commentary on ML safety, forecasting, and robustness from a Berkeley professor working on practical safety problems.

Advanced

Import AI

Jack Clark

Weekly newsletter by Anthropic's co-founder covering AI research, policy, and industry developments with consistent attention to safety implications.

Intermediate

Gwern Branwen's blog

Gwern Branwen

Deeply researched essays on ML, scaling, AI art, and technology forecasting, known for rigorous analysis and independent thinking.

Intermediate

generative.ink

generative.ink

Essays on AI, alignment, and the philosophical implications of language models and generative systems.

Advanced

EleutherAI Blog

EleutherAI

Open-source ML research covering language model training, evaluation, and the safety considerations of making powerful models widely available.

Advanced

DeepMind AI Safety Research

DeepMind

DeepMind's safety team blog covering specification gaming, reward modeling, scalable oversight, and their technical safety research agenda.

Advanced

DeepMind

DeepMind

DeepMind's main research site with publications on capabilities and safety, including Gemini evaluations, alignment research, and responsible scaling.

Intermediate

Cold Takes

Holden Karnofsky

Karnofsky's essays on AI risk, longtermism, and cause prioritization, including the influential Most Important Century series on transformative AI.

Beginner

carado.moe

carado

Technical AI safety writing and alignment research notes.

Advanced

AI Safety Camp

AI Safety Camp

Intensive research program for people entering AI safety, with project-based learning and mentorship from established researchers.

Intermediate

AI Impacts

AI Impacts

Empirical research on AI timelines, historical technology analogies, and quantitative estimates of AI progress and impact.

Intermediate

Distill

Distill

Pioneering interactive journal for ML interpretability and visualization, setting the standard for making neural network internals understandable.

Advanced

EA Forum

Centre for Effective Altruism

Forum for effective altruism with substantial AI risk discussion, including cause prioritization, career advice, and policy analysis.

Beginner

LessWrong

LessWrong

The original community blog on rationality and AI alignment, where many foundational safety arguments were first developed and debated.

Intermediate

StampyAI Alignment Research Dataset

StampyAI

Curated dataset of alignment and safety documents from papers, books, and blogs, useful for training and evaluating AI safety knowledge.

Advanced