How VCs Are Using AI to Find Deep-Tech Deals 18 Months Before Formation

The best deep-tech deals don't come from cold inbound. They don't come from accelerator demo days. They emerge from university basements, DoE national labs, and PhD thesis defenses — years before anyone files a Series A.

By the time a company appears on Crunchbase, the seed round has closed, the lead investor has a board seat, and the rest of the cap table is locked. The signal you're reading is 18 months old. You're not sourcing deals — you're reading history.

The forward-looking funds have figured this out. They've moved sourcing upstream, into the research layer. And increasingly, they're using AI to do it at scale.

18mo

Avg. lag from lab output to Crunchbase listing

5,200+

New ArXiv papers published per day across CS, physics, bio

Of high-impact papers that eventually yield a funded startup

The Problem: Deep Tech Doesn't Come Through the Front Door

Consumer startups start with distribution. A founder builds a product, runs ads, gets users, raises money. The signal is visible early — social following, App Store rank, growth charts. Investors can track it in near real-time.

Deep tech works differently. A fusion energy company starts as a plasma physics paper. A novel materials startup begins as a PhD thesis on metallic glass. A quantum computing fund manager should be reading Physical Review Letters, not TechCrunch.

The formation sequence looks like this: paper → patent → lab spin-out → angel round → stealth seed → public seed → Series A. By the time the Series A hits AngelList, the best investors have already been on the cap table for 18 to 36 months.

This creates a structural advantage for funds that can operate at the research layer. The problem is that the research layer is enormous, fast-moving, and written in technical language that requires domain expertise to parse. That's the bottleneck AI is now eliminating.

The Old Way: Conferences, Cold Email, and Luck

Ask any deep-tech GP how they sourced their best deal, and you'll get a variant of the same story: a professor introduction, a conference hallway conversation, a tip from a postdoc they'd met once at a workshop in 2019.

The traditional playbook involves:

Manual paper reading. A partner or associate reads ArXiv daily in their thesis area. Maybe 50–100 papers a week. Maybe. Anything outside their exact domain gets missed — and deep-tech's most interesting opportunities often come from intersections between fields (synthetic biology + materials science, photonics + AI inference, etc.).

Conference attendance. NeurIPS, ICLR, SPIE, MRS, ASM. Important signal, but expensive and biased toward whoever is attending. The most promising research often comes from smaller workshops, regional conferences, or labs that don't travel internationally.

Professor networks. Top funds have cultivated relationships with academic entrepreneurs and research office staff at MIT, Stanford, Caltech, CMU. Warm intros still matter. But this only covers a fraction of global output — and misses entirely the research emerging from China, Singapore, Germany, and Israel.

The result: deal sourcing is narrow, geographically biased, and bottlenecked by the bandwidth of a few senior people.

The AI Approach: Systematic Coverage at Research Velocity

The emerging approach treats research outputs — papers, patents, lab announcements, grant awards — as structured data. Machine learning can read a paper and estimate its commercial viability far faster than a human, across many more papers than any human team.

The four core capabilities:

1. Automated Paper Scanning

ArXiv publishes roughly 5,200 papers per day. Add bioRxiv, medRxiv, SSRN, IEEE Xplore, and you're looking at close to 10,000 new documents daily. A machine can read all of them. It can apply scoring heuristics — novel technique, clear application domain, university with strong TTO track record, authors with prior commercialization history — and surface the top 0.5% for human review.

2. Patent Signal Monitoring

Patent filings are often the first hard signal that a research group is moving toward commercialization. A new patent from a university lab — particularly one with a narrow, commercially-oriented claim rather than a broad academic claim — suggests a startup is being organized. AI can monitor USPTO and international filings, correlate them with prior research output, and flag when a research trajectory shows both novelty and commercial intent.

3. Lab and PI Tracking

The same principal investigator who published a breakthrough paper in 2023 may be three months from spinning out a company today. Tracking PIs across their publication record, grant awards (NIH, DARPA, DOE), and any prior commercialization activity creates a forward-looking picture of who's likely to spin something out — and when.

4. Commercial Viability Scoring

Not all research is investable. The scoring question is: does this paper describe a defensible technical advance, in a large market, that could be commercialized within a VC time horizon? AI can evaluate this across four dimensions — market size, IP novelty, team reputation, and commercial viability — and produce a ranked list that prioritizes analyst time.

The key insight: AI doesn't replace the judgment call. It eliminates the coverage problem. A human can read 100 papers a week. AI can process 10,000. The job of the human analyst shifts from triage to conviction.

A Real Example: From Paper to Funded Startup

Consider how this plays out in practice using Conduit's scoring system.

Case Study

In Q3 2024, Conduit flagged a cluster of papers from a Stanford lab working on solid-state electrolytes for lithium-sulfur batteries. The papers scored 82/100 on commercial viability — driven by strong IP novelty and a high market size score in the EV battery supply chain.

The lead PI had one prior commercialization: a company acquired by a Tier 1 automotive supplier in 2019. The patent filings were narrow and filed with a top university licensing firm, suggesting the lab was actively preparing for spin-out.

Conduit's Time to First Startup Tracker estimated a 6–12 month window before external investors would see a formal raise. That's the window that matters.

Eighteen months later, the company raised a $22M seed round led by a well-known climate tech fund. Funds that acted on the early signal had first-mover access to the cap table.

This is not an edge case. It's the repeatable version of how the best deep-tech deals have always been sourced — compressed from a lucky conference encounter into a systematic, daily workflow.

What This Means for How You Run Sourcing

The operational shift is significant. Instead of organizing your sourcing workflow around people you know, you organize it around signals you track.

A modern deep-tech sourcing stack looks like:

Daily input layer: 5+ research databases scanned automatically. Papers scored and triaged. Only the top 1–3% surfaced for human review.

Deal tracking layer: A pipeline of labs and PIs in your conviction areas, updated continuously as new papers and patents emerge. Think of it as a CRM for research groups.

Memo generation: For any paper scoring above your threshold, a one-page investment memo — market context, comparable bets, commercialization path, suggested next steps — generated automatically and ready for partner discussion.

Relationship timing: The system flags when a lab's signal suggests a 6–12 month window before formal fundraising. That's when you reach out to the PI. Not when the deck hits your inbox.

The funds that build this workflow gain two durable edges: earlier access to the deals that matter, and more systematic coverage of the global research base. Both compound over time.

The Bottom Line

Deep-tech investing has always rewarded patience and proprietary access. AI doesn't change the thesis — it industrializes the sourcing motion that the best funds have always run manually.

The question isn't whether to add an AI layer to your sourcing stack. It's whether you want to be the fund that figured this out in 2026 or the fund that reads about the deals you missed on Crunchbase in 2028.

See It Working on Real Research

Conduit scans ArXiv, bioRxiv, IEEE Xplore, and patent filings daily. See what your deal flow looks like with AI sourcing upstream.

Request Demo → See Live Dashboard