framework

X Open-Sourced Their Algorithm. Here's What the Code Actually Reveals.

X open-sourced their algorithm architecture but kept the weight values proprietary. The code structure reveals what signals the system tracks—and strategic priorities most creators miss.

Chris Groves

29 Jan 2026 — 11 min read

On January 20, 2026, X (formerly Twitter) open-sourced their recommendation algorithm architecture. I analyzed the codebase and documentation to understand how content actually gets ranked in the For You feed.

X released the system architecture but kept the actual engagement weight values proprietary. The params module that contains specific multipliers for likes, replies, and retweets was excluded from the open-source release.

However, the code structure itself reveals which signals the algorithm tracks and how it categorizes them—and it's not what most creators optimize for.

What The Code Actually Reveals

The X algorithm source code shows that the system tracks 15 different engagement types, each with its own probability prediction:

Engagement signals tracked (from README.md):

P(favorite), P(reply), P(repost), P(quote), P(click),
P(profile_click), P(video_view), P(photo_expand), P(share),
P(dwell), P(follow_author), P(not_interested), P(block_author),
P(mute_author), P(report)

What's NOT in the code: The actual numerical weight values. The scoring module references a params configuration that was excluded from the open-source release.

What IS in the code: The architecture shows conversation-driving signals (replies, author responses, profile engagement, dwell time) are tracked as independent signal types, separate from passive signals (likes, video views). The structural separation suggests conversation signals receive higher priority.

Why Architecture Matters More Than Specific Weights

Even without specific numerical values, the code architecture reveals fundamental priorities:

Conversation signals are processed as distinct types. The system architecture tracks 15 engagement types independently, with clear structural separation between:

Active engagement (replies, profile visits, conversation clicks, dwell time)
Passive engagement (likes, retweets, video views)

The Grok-based transformer predicts engagement probability for each signal type independently. This means the algorithm doesn't treat "engagement" as a single metric—it evaluates whether you'll reply differently from whether you'll like.

Author participation signals are tracked distinctly. The code has specific handling for when tweet authors respond to replies, suggesting this behavior is weighted differently from standard replies.

How It Actually Works

X's Phoenix recommendation system uses a two-stage pipeline. Think of it like a restaurant kitchen: first, you decide what ingredients you have available (retrieval), then you cook them into the final dish (ranking).

Stage 1: Finding Candidate Posts

The system starts with millions of potential posts and narrows them down to a few thousand using two sources:

Thunder (In-Network): Posts from people you follow. This is your core feed—recent content from accounts you've chosen to see.

Phoenix (Out-of-Network): Discovered content from accounts you don't follow. The system finds these by comparing your engagement history to posts you've never seen. If you engage with security content frequently, Phoenix surfaces security posts from new accounts.

Here's what makes this interesting: new accounts get discovered through Phoenix. You don't need followers to get reach. You need engagement patterns that match what other users care about.

The system retrieves about 5,000 candidate posts. Thunder (followed accounts) typically provides 60%, Phoenix (discovery) provides 40%.

Stage 2: Ranking the Candidates

This is where the weights matter—and where X kept the values proprietary.

X uses a Grok-based transformer model (adapted from xAI's Grok-1 LLM architecture) to score each post. The model looks at your engagement history and predicts: "What's the probability this user will engage with this post?"

It doesn't read the tweet. It doesn't analyze the text. It doesn't look at images or hashtags.

From the code (recsys_model.py):

class RecsysModelOutput(NamedTuple):
    logits: jax.Array  # [B, num_candidates, num_actions]

It analyzes:

Your past behavior (what you've engaged with historically)
The author's track record (how often their posts generate engagement)
Post metadata (timestamp, post type, author ID)

Then it predicts probabilities for 15 different engagement types: P(like), P(reply), P(retweet), P(dwell_2min), P(profile_click), and so on.

Each probability gets multiplied by its engagement weight (excluded from release). The final score determines where the post appears in your feed.

Scoring formula (from README.md):

Weighted Score = Σ (weight_i × P(action_i))

The algorithm doesn't care about your tweet's content quality. It cares about predicted engagement probability. A thoughtful analysis gets the same treatment as a 10-word joke—both are scored based on whether the model thinks you'll engage, not whether the content is good.

This explains why the architecture prioritizes conversation. Conversation signals sustained interest. Likes signal... someone tapped a heart icon while scrolling.

Why New Accounts Struggle

The algorithm treats new and established accounts differently because of the cold start problem—a technical challenge where the system has no historical data to predict engagement.

Think of it like a restaurant review site. A restaurant with 500 reviews gets ranked confidently. A new restaurant with 0 reviews? The algorithm doesn't know if it's good or terrible, so it defaults to conservative estimates and low visibility.

For new accounts (0-30 days):

The transformer can't predict engagement confidently, so it regresses to "platform average" (what a typical user might do). This produces lower initial scores and limited organic reach.

But new accounts have one advantage: engagement velocity matters more. The algorithm monitors reply speed. A new account with 8 replies in 10 minutes significantly outranks one with 8 replies over 4 hours.

Why? Fast engagement velocity correlates with content quality. If people are rushing to reply immediately, the algorithm infers the post is valuable.

For established accounts (30+ days clean history):

The transformer has 30+ days of observed behavior. It predicts engagement confidently, which generates higher baseline scores. Engagement velocity matters less because popular accounts naturally accumulate slower responses as their audience scales.

Real-world impact:

Two accounts posting identical content with identical engagement patterns will score differently because the algorithm predicts engagement probability based on historical data.

New Account (5 days old):

Limited historical data
Algorithm predicts conservatively (low engagement probability)
Lower baseline scores
Limited initial reach

Established Account (30+ days):

Sufficient historical data
Algorithm predicts confidently (based on observed patterns)
Higher baseline scores
Broader initial reach

Same engagement pattern. Different algorithmic treatment based on historical data confidence.

Strategy Adjustments for New Accounts

If you're starting fresh, the algorithm disadvantage is real but counterable:

1. Prioritize rapid replies (first hour critical)

New accounts are judged on engagement speed because engagement speed correlates with post quality. Fast engagement velocity matters disproportionately when the algorithm lacks historical data.

2. Build followers through Phoenix (discovery) first

The algorithm shows your posts to people outside your follower base if replies come quickly, dwell time is high, and reply quality is strong. Once you hit 50-100 followers, Thunder (followed accounts) kicks in as a second discovery channel.

3. Avoid content that requires credibility

New accounts score lower on announcements ("I built X, check it out"), authority claims ("Here's what I've learned..."), and promotional content. Higher-converting content: questions ("What's your experience?"), disagreement ("I think everyone's wrong about..."), and process content ("Here's what I'm experimenting with...").

4. Engage with established accounts strategically

Replying to very popular accounts gets you lost among hundreds of other replies with minimal visibility.

Replying to medium-sized accounts (100-500 followers) with thoughtful responses makes you visible among fewer competing replies. If the account owner responds to your reply (author participation signal), your profile gains visibility among their engaged audience.

The 30-day threshold:

Most accounts see algorithm treatment shift around day 30-45 once the system has enough observed behavior. Conservative estimates → confident estimates. Lower baseline scores → standard baseline scores. Discovery friction → frictionless discovery.

This is why new accounts often see a sudden engagement cliff at 30 days if they haven't built momentum—the algorithm's training wheels come off.

What Doesn't Work Anymore

Based on the code architecture, these tactics are low-ROI:

Optimizing For Likes

Why it's low-value: Likes are tracked as passive engagement signals. The code structure suggests they receive lower weight than active conversation signals.

Strategy: Attention-grabbing but shallow content—inspirational quotes, feel-good statements, consensus opinions.

Problem: High likes don't correlate with high reach based on how the algorithm categorizes engagement types.

Short, Punchy Announcements

Why it's low-value: Dwell time (time spent reading) is tracked separately from simple views.

Strategy: Brief update tweets, product launches, quick thoughts.

Problem: Users scroll past in seconds. No dwell time accumulated, minimal depth signals.

High Posting Frequency

Why it's low-value: The code includes an author diversity penalty to prevent feed domination.

Strategy: Multiple tweets per day to "stay top of mind."

Problem: The algorithm applies progressive discounts to multiple posts from the same creator in one feed. Quality over quantity is structurally enforced.

Hashtag Stuffing

Why it's low-value: The algorithm doesn't analyze text content or hashtags.

Strategy: Adding #AI #Tech #Development #Innovation to every tweet.

Problem: The Grok-based transformer uses zero hand-engineered features—no text parsing, no image analysis, no hashtag detection. It learns purely from engagement behavior patterns.

Video Content Without Conversation

Why it's low-value: Video completion is tracked but categorized as passive engagement.

Strategy: "Video-first" content approach without driving replies or discussion.

Problem: Unless video drives conversation (replies, discussion), it generates passive engagement signals only.

What Works Now

The algorithm architecture rewards conversation, not broadcasting. Here's what actually drives reach:

Conversation Hooks

Strategy: End tweets with questions, controversial takes, or incomplete thoughts that invite replies.

Examples:

"Am I wrong about this?"
"What's your experience been?"
"Here's where it gets interesting..."
"Three approaches exist. Which do you use?"

Why it works: Replies are categorized as active engagement. The architecture treats them as stronger signals than passive metrics like likes.

Replying To Your Replies

Strategy: Respond to replies in the first hour. Even short responses count.

"Exactly."
"Interesting—tell me more."
"I hadn't considered that angle."

Why it works: Author replies are tracked as a distinct signal type. The code has specific handling for creator participation in their own threads, suggesting this behavior is weighted separately.

Long-Form Threads

Strategy: Multi-tweet threads that take 2+ minutes to read.

Why it works: Dwell time is tracked separately from views. The algorithm measures how long users spend reading, not just whether they scrolled past. Depth signals matter.

Debate-Worthy Positions

Strategy: Take clear, defensible stances that prompt profile visits and follow-up engagement.

"Most AI tools are productivity theater."
"Open-source algorithms don't actually improve transparency."
"The best security advice is advice no one follows."

Why it works: Profile clicks and conversation clicks are both tracked as active engagement signals. When users click through to your profile, it signals high-intent interest.

Incomplete Hooks That Require Click-Through

Strategy: Start with a hook, require clicking into the thread to see the full argument.

Example:

X open-sourced their algorithm on January 20, 2026.

What I found surprised me.

The architecture reveals what actually drives reach: 🧵

Why it works: Conversation clicks signal curiosity-driven exploration. Users who expand threads are actively seeking depth, not passively scrolling.

Strategic, Spaced Posting (Author diversity optimization)

Strategy: 1-2 tweets per day maximum. Focus on quality over quantity.

Why it works: Avoid the author diversity penalty. Each tweet gets full algorithmic consideration instead of competing with your own content for feed slots.

The New Strategy

Here's how content structure needs to change.

Before (Optimized For Likes)

I built a framework for AI automation.

• Automates research
• Generates reports
• Integrates with Ghost CMS
• Open source

Check it out: [link]

Expected signals:

Hook: Low dwell time (~10 seconds to scan)
Bullets: Easy to skim, no depth
CTA: Passive link drop
Expected engagement: High likes (passive), low replies (active)

After (Optimized For Conversation)

X open-sourced their algorithm. The architecture revealed something surprising.

The system tracks 15 engagement types but categorizes them differently.

Conversation signals (replies, author responses, profile clicks) are tracked separately from passive signals (likes, views).

Do you optimize for conversation depth or vanity metrics?

Full breakdown: [link]

Expected signals:

Hook: Incomplete, prompts curiosity
Controversial framing: Invites debate
Direct question: Prompts replies (active engagement)
Author replies: Opportunity to participate (distinct signal type)
Dwell time: Takes 30+ seconds to parse

Same information. Different structure. Different signal profile.

Five Tactical Changes

Based on the algorithm architecture:

1. End Every Tweet With A Question

Questions drive replies. Replies drive reach.

"What's your take?"
"Has anyone else seen this?"
"Am I missing something obvious?"

The question doesn't need to be profound. It needs to be answerable.

2. Respond To Replies Within The First Hour

Early engagement velocity matters. The algorithm monitors how quickly replies arrive and how actively the author participates.

Set notifications. Block time. Treat reply management as content creation, not audience service.

A tweet that gets 10 replies in 10 minutes outperforms a tweet that gets 100 replies over 24 hours.

3. Write Longer Threads (2+ Minute Dwell Time)

Long-form content outranks short viral quips because dwell time is tracked as an independent signal type.

Structure threads as:

Hook (incomplete information)
Data (surprising insight)
Implication (counterintuitive claim)
Question (invite debate)
Link (full breakdown)

This format maximizes both dwell time and reply likelihood.

4. Take Controversial Positions

Consensus opinions generate likes. Controversial positions generate replies.

"Everyone's wrong about X."
"The conventional advice doesn't work."
"Here's why the popular tool is overrated."

Controversy drives profile visits (active signal), conversation clicks (active signal), and extended debate (reply chains with author participation).

5. Post Less Frequently (1-2x Per Day)

High posting frequency triggers the author diversity penalty. Your later posts compete with your earlier posts for the same feed slots.

Quality over quantity. One strategic tweet that generates substantial conversation and author participation creates more algorithmic value than multiple mediocre tweets with minimal engagement.

The architecture structurally rewards depth over volume.

X's decision to open-source their algorithm is unprecedented, but the architecture itself reveals something deeper about how platforms think about engagement.

Likes are engagement theater. They're visible, public, and emotionally satisfying. They make users feel validated. But architecturally, they're categorized as passive signals.

Conversation is active engagement. Replies, dwell time, and profile visits are tracked separately as active signals. They predict deeper interest and sustained platform usage.

Platforms optimize for what keeps users on the platform, not what makes users feel good.

The architectural separation between passive and active signals isn't arbitrary. It's a direct statement about what kind of content platforms want to promote: content that creates discussion, not content that generates approval.

This explains why:

Inspirational quotes get likes but no reach
Controversial takes get fewer likes but massive reach
Long-form threads outperform short quips
Active creators outperform passive broadcasters

The algorithm doesn't reward you for being likeable. It rewards you for being engaging.

Transparency Limitations

While X open-sourced the architecture, several elements remain closed:

Actual numerical weight values (params module excluded)
Internal parameters of the Grok transformer model
Training data composition
Exact threshold values for penalties
Fine-tuning methodology

What we know: System architecture, signal types tracked, how signals are categorized (active vs. passive).

What we don't know: Specific multiplier values, exact formulas for penalties, absolute scoring thresholds.

This is strategic opacity. X reveals enough to help creators optimize (focus on conversation signals) but not enough to enable systematic gaming (can't reverse-engineer exact point values).

Still, the architecture alone is actionable. You don't need specific multipliers to understand that conversation signals are prioritized over passive engagement.

Key Insights From The Open Source Release

The algorithm's public release in January 2026 exposed something fascinating: X values authenticity signals over growth hacking. The system rewards genuine engagement that predicts long-term user retention, not viral mechanics.

1. The algorithm learns from behavior, not content

Since the transformer uses zero hand-engineered features (no text analysis, image detection, hashtag parsing), it evaluates posts purely through who engages and how. Thoughtful analysis gets the same score as a 10-word joke if engagement probability is identical. The only content differentiation comes from engagement patterns.

2. Established accounts have built-in amplification, but new accounts have speed advantages

The cold start problem is real (established accounts score higher with identical engagement), but it expires around day 30 when the algorithm has sufficient historical data. Until then, engagement velocity is disproportionately important for new creators.

3. Author replies are tracked as a distinct signal type

The code has specific handling for when creators respond to replies in their own threads. This architectural choice suggests X treats creator participation differently from general replies—likely as a stronger signal of sustained discussion value.

4. The algorithm explicitly punishes high-frequency posting

The author diversity penalty prevents accounts from dominating the feed, which protects against spam and ensures content diversity. This is a deliberate choice—X could have removed this, but didn't.

Conclusion

X's algorithm architecture reveals a clear priority: conversation signals over passive metrics.

The architectural decisions are explicit:

Active engagement signals (replies, author responses, profile clicks, dwell time) are tracked separately from passive signals
Conversation-driving behavior is categorized distinctly in the code
The Grok transformer predicts engagement probability for each signal type independently

If you're still optimizing for likes, you're optimizing for a signal the architecture categorizes as passive engagement.

The strategy shift is architectural, not speculative:

Focus on signals the code prioritizes:

Start conversations (replies = active engagement)
Respond to replies (author participation = distinct signal type)
Drive profile clicks (high-intent interest = active signal)
Build dwell time (depth = separate tracking)

The architecture reveals priorities. Optimize for what the system is designed to detect.

The Intelligence Adjacent framework is free and open source. If this helped you, consider joining as a Lurker (free) for methodology guides, or becoming a Contributor ($5/mo) for implementation deep dives and to support continued development.

Sources

Primary Source

X Algorithm Source Code (GitHub) - Official open-source repository with architecture, but params module (containing weight values) excluded
Phoenix README - Technical Architecture - Two-stage pipeline documentation
Main README - System Overview - Engagement types and scoring formula

Algorithm Analysis (Architecture-Based)

Analysis of 2025 X Algorithm - GitHub Gist - Technical breakdown of code structure, documents weight exclusion
X's Algorithm Source Code Drops - PPC Land - Analysis of signal types
X Cracks Open Feed Secret - Medium - Architecture deep dive