Skip to content

How to pick your firm's first AI workflow without getting it wrong

The hardest decision in an AI install at an Indian law firm is which workflow to build first. Here is the framework we use, the three criteria that matter, and the workflows we steer firms away from in week one.

Rohan Malik
Founder, Matter Labs
6 min read

TL;DR

The first AI workflow at a law firm is the one that proves the rest are worth doing. Pick wrong and the partners lose interest by week three. Pick right and the firm asks for the next two by week six. This post is the framework we use to pick. It comes down to three criteria: frequency, low strategic stakes, and an existing corpus.

Why this matters more than people think

When an Indian law firm decides to install AI, the first decision is which workflow to build. That decision is usually made by a managing partner in a 20-minute conversation, often with a vendor, often based on which capability sounds most impressive in the demo.

This is the wrong way to make the decision. The first workflow is not just a workflow. It is the proof that the firm should keep going. Everyone in the firm (partners, associates, ops) is watching the first install to decide whether they should believe in this. If the first install lands well, the firm asks for two more in a month. If it does not, the entire AI initiative goes back into the freezer for a year.

The choice of workflow is the single biggest determinant of outcome.

The three criteria

We use three criteria. They are not negotiable. We have tested the framework across 14 installs.

Criterion one: high frequency

The workflow must apply to a document or task that the firm produces at least 20 times a month, ideally 50 or more.

Why: low frequency means low signal. If the firm does five of something a quarter, the workflow has to be perfect on the first attempt. If the firm does 200 of something a quarter, the workflow can be 80% right and the firm will use it. The economic case is also stronger.

This is why we usually do not start with judgment-writing or with strategy documents. They are high-value, but they are low-frequency, which means the workflow has too few iterations to learn from.

Criterion two: low strategic stakes

The workflow must apply to documents or tasks where the cost of an AI mistake is low and the human review path is well-defined.

Why: every AI workflow has a non-zero error rate, especially in the first month. If the cost of an error is "client trust is permanently damaged" or "we lose the case", the firm will not actually rely on the workflow. They will keep doing the work the old way. The workflow becomes a parallel process, not a replacement, and the productivity gain never materialises.

This is why we usually do not start with court submissions or with strategic advice memos. They are high-frequency at large firms, but the cost of error is too high for the workflow to actually be trusted in week one.

Criterion three: an existing corpus

The workflow must apply to a document type where the firm has at least 20 past examples in a usable format.

Why: AI workflows that are tuned to the firm's voice need the firm's voice as input. Without a corpus, the workflow output sounds generic, and partners reject it as "not how we write." With a corpus of 20 to 50 prior examples, the workflow output is indistinguishable from the firm's house style.

This is why we usually do not start with practice areas the firm has just launched. The firm's first matter in a new area has no corpus to draw from. We start in the firm's bread-and-butter area.

The workflows that match all three

For a typical Indian commercial litigation firm, the workflows that satisfy all three criteria are:

Section 138 NI Act reply notices. High frequency (most firms do 20 to 50 a month), low strategic stakes (mostly defensive procedural work), strong corpus (the firm has hundreds of prior replies in its archive).

NDA drafting. High frequency (corporate and transactional work generates these constantly), low strategic stakes (NDAs are routine), strong corpus.

Hearing transcripts for routine matters. High frequency (every hearing), low strategic stakes (the transcript is a working tool, not a final document), and the corpus is the firm's audio archive.

First-draft replies to routine notices (S.91 CrPC objections, S.34 Arb Act notices, S.151 CPC applications). High frequency in litigation firms, mostly procedural, strong corpus.

Routine status updates to clients. High frequency, low stakes (always partner-reviewed before sending), strong corpus (the firm has thousands of past status emails).

If your firm has any of these as a top-ten time consumer, that is your first workflow.

The workflows we steer firms away from in week one

These are common requests that fail one or more of the criteria.

Drafting full court briefs. Low frequency, very high stakes. Bad first install.

General legal research. Low frequency in a structured way (most research questions are bespoke), strong corpus is hard to define, hallucination risk is high. We do this in month two or three, after the firm trusts the framework.

Client-facing chatbots. Low frequency of the same query, high reputational risk if the bot says something wrong. Almost never the right first install.

Judgment summarisation. Low frequency, no firm-specific corpus (judgments are public), and partners disagree on what a good summary looks like. Bad first install.

"AI legal assistant" for the whole firm. This is a vendor pitch, not a workflow. Almost always wrong.

What the diagnostic looks like

When a firm engages us for a teardown, the first three days look like this.

Day 1. We sit with the managing partner and the operations head. We list every document type the firm produces. We mark the top ten by frequency. We do not look at strategic value yet.

Day 2. We sit with two senior associates and review the corpus for each top-ten document type. We tag the corpus by quality (which prior drafts are good, which are not, which are firm-house-style, which are partner-specific). We end with a list of document types ranked by usable-corpus depth.

Day 3 to 5. We score each top-ten document type against all three criteria. We produce a written memo with the top two or three candidates, ranked, with the reasoning. We send it to the partners.

The decision is the firm's. We have a recommendation in the memo, but the decision is the partners'. Most firms agree with our recommendation. Some do not. Both are fine.

What "right" looks like in week three

If the first workflow is right, week three at the firm looks like this:

  • Associates open the workflow voluntarily, without being prompted.
  • The rejection rate is below 30% (associates accept most outputs with light editing).
  • Partners ask "can we do something similar for X?", where X is the next document type.
  • The managing partner stops calling it "AI" and starts calling it "the workflow" or by its name.

If those four things are happening at week three, the install is on track and the firm is ready for workflow number two. If they are not, we slow down, diagnose what is breaking, and fix it before adding more.

If you want to know which document types at your firm score highest against the three criteria, book a teardown and we will run the diagnostic with you.

Frequently asked

We listen, but we do not always agree. About a third of firms come in with a strong preference (often hearing transcription or research) that turns out to be the wrong place to start. The criteria matter more than the preference. We talk it through and either we change their mind or they change ours. Rarely do we just install what they asked for.

The Journal

New posts, every other week.

Workflows, compliance notes and case studies for partners building the AI-native firm. No spam, unsubscribe anytime.