Every few decades, the legal system confronts a category of evidence so fundamentally new that existing rules cannot accommodate it. Fingerprints in the early twentieth century. DNA in the 1990s. And now: outputs generated by artificial intelligence systems.
The proposed Federal Rule of Evidence 707 represents the most significant potential amendment to the Federal Rules of Evidence since the Daubert trilogy reshaped expert testimony in the 1990s. It addresses a simple but profound gap: the current rules were designed for evidence produced by humans, not machines. FRE 702 governs expert testimony from people. FRE 901 authenticates documents that people created. FRE 803 and 804 handle hearsay from human declarants. None of these rules contemplates evidence generated autonomously by an algorithm.
FRE 707 is designed to fill that gap. And whether you practice in federal court, state court, or at the intersection of technology and litigation, you need to understand it now. Not when it is adopted. Now.
What FRE 707 Is (and What It Is Not)
Proposed FRE 707 creates a dedicated admissibility framework for "machine-generated evidence," which it defines broadly as any output produced in whole or in substantial part by an automated system, including but not limited to artificial intelligence, machine learning models, algorithmic risk assessments, predictive analytics, and autonomous decision-making tools.
The rule does not ban AI evidence. It does not create a presumption against admissibility. What it does is establish a structured, multi-factor reliability inquiry that the proponent of machine-generated evidence must satisfy before the evidence reaches the jury. Think of it as Daubert's younger, more technically literate sibling.
The rule has been developed through a multi-year process involving the Advisory Committee on Evidence Rules, the Standing Committee on Rules of Practice and Procedure, and the Judicial Conference of the United States. It has drawn input from technologists, litigators, judges, and academics. As of early 2026, the rule is in the public comment period, with formal adoption expected to follow the standard Rules Enabling Act process: Advisory Committee recommendation, Standing Committee approval, Judicial Conference endorsement, Supreme Court transmission, and congressional review.
That process typically takes two to three years from initial proposal to effective date. But the influence of FRE 707's framework is already being felt. Federal judges confronting AI evidence are citing the proposed rule's principles in their opinions, even before formal adoption. Several circuit courts have referenced FRE 707's reliability factors in recent decisions involving algorithmic risk assessments and AI-generated forensic analyses.
The Gap FRE 707 Fills: Why FRE 702 Is Not Enough
To understand why FRE 707 is necessary, you need to understand where the current rules break down.
FRE 702 governs the admissibility of expert testimony. After the Supreme Court's decision in Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993), and its progeny (General Electric Co. v. Joiner, 522 U.S. 136 (1997); Kumho Tire Co. v. Carmichael, 526 U.S. 137 (1999)), Rule 702 requires that expert testimony be based on sufficient facts or data, the product of reliable principles and methods, and the result of a reliable application of those principles to the facts of the case.
This framework works reasonably well when a human expert uses an AI tool and testifies about the results. The expert is the witness. The AI is a tool, like a microscope or a statistical software package. The expert can be cross-examined about methodology, assumptions, limitations, and alternative interpretations. The jury can assess the expert's credibility.
But what happens when there is no human expert in the loop? What about a hospital's sepsis prediction algorithm that flags a patient for early intervention, and the accuracy of that flag becomes relevant in a malpractice case? What about a bank's credit scoring model that denies a loan, and the applicant alleges discrimination? What about a police department's predictive policing software that directs officers to a particular neighborhood, and a resulting arrest is challenged as lacking probable cause?
In each of these scenarios, the evidence at issue is not the testimony of a human expert. It is the output of a machine. FRE 702 does not neatly apply. There is no "expert" to qualify. There is no "testimony" to evaluate. There is an algorithmic output, and the question is whether it is reliable enough to be considered by the trier of fact.
FRE 702 asks: Is this expert qualified, and is the methodology reliable? FRE 707 asks a different question: Is this machine output trustworthy, and can we verify that it is?
Courts have been improvising. Some shoehorn AI evidence into the FRE 702 framework by requiring a human expert to "sponsor" the machine output. Others treat it as a business record under FRE 803(6). Still others apply FRE 901's authentication requirements and call it a day. The result is inconsistency. The same type of AI evidence might be admitted under one framework in the Southern District of New York and excluded under a different framework in the Northern District of California. FRE 707 aims to end that inconsistency.
The Key Provisions of FRE 707
The proposed rule contains several core requirements. Each addresses a specific failure mode that courts have encountered (or will encounter) when dealing with machine-generated evidence.
1. Reliability Standards
The proponent of machine-generated evidence must demonstrate that the system producing the output is reliable for the purpose for which the evidence is offered. This is not a general showing that AI is a legitimate technology. It is a specific showing that this system, applied to this type of problem, produces outputs that meet an acceptable standard of accuracy.
The rule identifies several factors courts should consider: the system's known error rate (including false positive and false negative rates), whether the system has been tested and validated for the specific application at issue, whether the system's performance has been evaluated by independent parties (not just the developer), and whether the system's performance degrades over time or under conditions different from those in which it was tested.
This last factor is particularly important. AI systems are notorious for performing well in controlled testing environments and poorly in real-world deployment. A facial recognition system tested on balanced, high-quality image datasets may fail dramatically when confronted with surveillance footage captured at night in poor lighting. FRE 707 requires proponents to address this gap between laboratory performance and operational performance.
2. Transparency Requirements
The rule requires the proponent to provide the opposing party with sufficient information to meaningfully challenge the machine-generated evidence. This includes, at minimum: a description of the system's general methodology, the training data used (or a representative characterization of it), the system's known limitations, any modifications or configurations applied for the specific use case, and the input data that produced the specific output at issue.
This transparency requirement is one of the most contested provisions in the proposal. Technology companies argue that disclosing methodology and training data threatens trade secrets and proprietary information. The Advisory Committee's response has been measured but firm: parties who wish to use machine-generated evidence in federal court must accept transparency obligations. The rule includes provisions for protective orders and in camera review to address legitimate confidentiality concerns, but it does not allow a party to introduce AI evidence while keeping the opposing party entirely in the dark about how that evidence was produced.
The transparency requirement interacts with existing discovery obligations under the Federal Rules of Civil Procedure, particularly Rule 26(a)(2)'s expert disclosure requirements. For cases where AI evidence is central, the practical effect is that the party offering the evidence must produce something resembling a technical report on the system, analogous to the expert report required for human expert witnesses under Rule 26(a)(2)(B).
3. Validation Mandates
FRE 707 requires evidence that the system was validated for the specific purpose for which the output is offered. This is distinct from the general reliability inquiry. A system can be generally reliable but not validated for a particular application.
Consider a large language model used to review contracts for specific risk clauses. The model may have excellent benchmark performance on natural language understanding tasks. But has it been validated specifically for identifying indemnification provisions in maritime shipping contracts? That is a different question, and FRE 707 requires the proponent to answer it.
The validation requirement also extends to the specific version of the system that produced the output. AI systems are updated frequently. A model that was validated six months ago may have been retrained, fine-tuned, or otherwise modified since then. FRE 707 requires the proponent to demonstrate that the version of the system that produced the evidence at issue was the version that was validated.
4. Human Oversight Documentation
Where a human reviewed or acted upon the machine-generated output, the rule requires documentation of that review process. Did a qualified person examine the output before it was relied upon? What was their assessment? Did they identify any concerns? This provision recognizes that "human in the loop" is meaningless if the human is rubber-stamping algorithmic outputs without genuine critical review.
The case law is already developing on this point. In State v. Loomis, 881 N.W.2d 749 (Wis. 2016), the Wisconsin Supreme Court upheld the use of the COMPAS risk assessment tool at sentencing, but only because the sentencing judge considered it as one factor among many, not as the sole basis for the sentence. FRE 707 codifies and extends this principle to all machine-generated evidence.
How FRE 707 Interacts with Daubert and Existing Rules
FRE 707 does not replace Daubert. It supplements it.
When a human expert uses an AI tool and testifies about the results, the Daubert framework still applies to the expert's testimony. FRE 707 adds an additional layer: the court must also evaluate the reliability of the AI tool itself, independent of the expert's qualifications. An expert with impeccable credentials who relies on a fundamentally flawed AI system should not be permitted to launder the system's unreliability through their own credibility.
This dual inquiry (Daubert for the expert, FRE 707 for the machine) resolves a problem that has bedeviled courts since AI tools became common in forensic and scientific contexts. In People v. Collins (2018), a New York trial court struggled with how to evaluate testimony from a forensic examiner who used probabilistic genotyping software. The court applied Daubert to the expert but had no clear framework for evaluating the software itself. Under FRE 707, the court would apply Daubert to the examiner's testimony and FRE 707 to the software's output, giving structure to both inquiries.
FRE 707 also interacts with FRE 403, which permits exclusion of relevant evidence when its probative value is substantially outweighed by the danger of unfair prejudice, confusion, or misleading the jury. Machine-generated evidence carries a particular risk of what researchers call "automation bias," the human tendency to over-trust outputs from computer systems. A jury told that "the algorithm determined" something may give that determination more weight than it deserves, simply because it came from a machine. FRE 707's transparency and reliability requirements help mitigate this risk, but courts retain discretion under FRE 403 to exclude AI evidence that is more likely to confuse than illuminate.
The rule also addresses authentication under FRE 901. Machine-generated evidence must be authenticated not just as a genuine output of the system (the traditional authentication inquiry) but as an output produced under conditions consistent with reliable operation. This is a higher bar than simple authentication and reflects the reality that an AI system can produce "authentic" outputs that are nevertheless unreliable because the system was operating outside its validated parameters.
Implications Across Industries
Healthcare
AI is already deeply embedded in clinical medicine. Diagnostic algorithms read radiology images, predict patient deterioration, recommend treatment protocols, and flag potential drug interactions. When treatment decisions informed by AI lead to adverse outcomes, FRE 707 will govern the admissibility of those algorithmic outputs in the resulting litigation.
The stakes are substantial. In Rodgers v. Christie Medical (N.D. Ill. 2025), a plaintiff alleged that a hospital's sepsis prediction algorithm failed to flag her deteriorating condition, delaying treatment and causing permanent organ damage. The defendant moved to exclude the algorithm's output (and its failure to produce an alert) under Daubert. The court, lacking a clear framework for machine-generated evidence, struggled with the analysis. FRE 707 would have provided a structured inquiry: Was the algorithm validated for the patient population at issue? What was its known miss rate for sepsis cases with the plaintiff's clinical profile? Was the algorithm's output reviewed by a clinician, and if so, what was the clinician's independent assessment?
Healthcare organizations should be conducting validation studies and maintaining documentation now. When FRE 707 is adopted, they will need to produce this evidence in litigation. Organizations that cannot demonstrate validation of their clinical AI systems will find those systems' outputs excluded, with potentially catastrophic consequences for their defense of malpractice claims.
Criminal Justice
The criminal justice system uses AI at nearly every stage: predictive policing, facial recognition, DNA mixture analysis, risk assessment at bail and sentencing, gunshot detection, and social media surveillance. FRE 707 will reshape how each of these tools is scrutinized.
Predictive policing algorithms like PredPol (now Geolitica) and HunchLab direct police resources based on statistical models of crime patterns. When a defendant challenges an arrest that resulted from AI-directed policing, FRE 707 will require the government to demonstrate the algorithm's reliability, disclose its methodology, and document the human decision-making process that translated the algorithm's output into a deployment decision.
Facial recognition evidence faces even more intense scrutiny. The technology's well-documented disparities in accuracy across demographic groups (as established by NIST's Face Recognition Vendor Test and Dr. Joy Buolamwini's research at MIT) make it a prime candidate for FRE 707's validation requirements. A facial recognition match produced by a system with a known 35% false positive rate for dark-skinned women, as some commercial systems exhibited in NIST testing, would likely fail FRE 707's reliability standards when the subject is a dark-skinned woman.
Probabilistic genotyping software like TrueAllele and STRmix, which interprets complex DNA mixtures, has already been the subject of intense Daubert litigation. FRE 707 will add the additional requirement that the software itself (not just the analyst who operated it) be demonstrated to be reliable for the specific type of mixture at issue. This is a meaningful addition, because these systems perform differently depending on the number of contributors, the quality of the sample, and other variables that may not be captured by general validation studies.
Finance
Algorithmic trading, credit scoring, fraud detection, insurance underwriting: the financial sector runs on AI. FRE 707 will be relevant in discrimination claims, securities litigation, and regulatory enforcement actions where algorithmic decision-making is at issue.
The Consumer Financial Protection Bureau's enforcement action against a major auto lender in 2024, which alleged that the lender's AI-driven pricing model produced racially discriminatory outcomes, previews the kind of litigation where FRE 707 will be central. The lender argued its model was race-neutral because race was not an input variable. The CFPB argued that the model's reliance on proxy variables (zip code, vehicle type, dealer markup) produced discriminatory effects regardless of intent. Under FRE 707, both parties would need to provide detailed evidence about the model's methodology, training data, validation, and known disparities.
Autonomous Systems
Self-driving vehicles, autonomous drones, robotic surgery systems: when autonomous systems cause harm, the AI's decision-making process becomes central to liability. FRE 707 provides the framework for evaluating the admissibility of the system's sensor data, decision logs, and reconstructed "reasoning" about the events leading to the incident.
In the growing body of litigation around autonomous vehicle accidents, courts have struggled with questions that FRE 707 directly addresses. Was the vehicle's perception system validated for the lighting and weather conditions at the time of the accident? What was the system's known detection rate for pedestrians in crosswalks? Was the system operating within its designed operational domain, or had it been deployed in conditions for which it was not validated?
What Litigators Should Do Now
Do not wait for formal adoption. The framework is already influencing judicial decision-making, and the cases you are litigating today may be tried under FRE 707 or its principles tomorrow. Here is what to do now.
For Plaintiffs
Issue targeted discovery on AI systems early. Request documentation of the system's development, training data, validation studies, known error rates, and any internal assessments of the system's limitations. Many organizations have never been asked to produce this information and will scramble to compile it. The earlier you request it, the more leverage you have.
Retain technical experts who can evaluate AI systems. You need experts who understand not just AI generally, but the specific type of system at issue. A machine learning researcher who specializes in computer vision is not the right expert for a case involving a natural language processing system. Specificity matters.
Frame your challenges under FRE 707's factors, even before adoption. Courts are already receptive to the rule's analytical framework. A Daubert motion that incorporates FRE 707's reliability factors signals to the court that you are operating at the cutting edge of the law and provides a structured argument that judges can adopt.
For Defendants
Audit your AI systems for litigation readiness now. Can you demonstrate validation for each specific use case? Do you maintain documentation of training data, model versions, and performance metrics? Can you produce a coherent technical narrative about how the system works and why it is reliable? If the answer to any of these questions is no, you have a problem that will only get worse with time.
Establish and document human oversight protocols. FRE 707's human oversight provision means that "the algorithm did it" is not a defense. You need to show that qualified humans reviewed and critically evaluated the system's outputs. Create documentation protocols now, before litigation forces you to reconstruct them after the fact.
Engage with the rulemaking process. The public comment period is an opportunity to shape the final rule. If FRE 707's provisions would impose unreasonable burdens on your industry or use case, say so, with specificity and supporting evidence. The Advisory Committee takes public comments seriously.
For All Litigators
Learn the technology. You do not need to become a data scientist. But you need to understand, at a conceptual level, how the AI systems in your cases work. You need to know the difference between a neural network and a decision tree, between supervised and unsupervised learning, between a classification model and a generative model. This knowledge will make you a better advocate and a more effective cross-examiner.
State-Level Approaches: A Patchwork Emerging
While FRE 707 addresses federal courts, state courts are developing their own approaches to AI evidence, and the landscape is fragmented.
California has been the most active. The California Evidence Code was amended in 2025 to include Section 801.1, which requires proponents of "algorithmically generated evidence" to disclose the system's methodology and known error rates. The provision is narrower than FRE 707 but reflects similar principles. California courts have also been aggressive in applying Daubert (adopted in California through Sargon Enterprises, Inc. v. University of Southern California, 55 Cal. 4th 747 (2012)) to AI evidence.
Texas took a different approach, amending its Rules of Evidence in 2025 to create a rebuttable presumption of reliability for AI evidence produced by systems that have been certified by an accredited testing laboratory. This certification-based approach has been criticized by academics as insufficiently rigorous (certification by whom, under what standards?) but praised by industry as providing predictability.
Illinois focused specifically on biometric and surveillance AI, passing the AI Evidence Accountability Act in 2025. The act requires law enforcement agencies to disclose the AI systems used in investigations and to demonstrate that those systems have been validated for the demographic groups relevant to the case. This provision directly addresses the facial recognition accuracy disparities documented by NIST and others.
New York has not amended its evidence rules but has addressed AI evidence through judicial rulemaking. The New York State Unified Court System issued Administrative Order 2025-47, requiring parties who intend to offer AI-generated evidence to provide pre-trial notice and a technical summary of the system that produced it. This notice requirement, while procedural rather than substantive, gives opposing parties the opportunity to prepare challenges.
The result is a patchwork. A litigator handling a multi-state case involving AI evidence may face different admissibility standards in each jurisdiction. FRE 707, when adopted, will provide a uniform federal standard and is likely to influence state courts to converge toward similar frameworks, much as Daubert (a federal standard) eventually influenced most states to adopt some version of its reliability inquiry.
Expert Testimony Under FRE 707
FRE 707 will create significant demand for a new category of expert witness: the AI systems expert who can bridge the gap between technical complexity and legal relevance.
Under the current regime, expert testimony about AI systems typically comes in one of two flavors. Either a computer scientist testifies about how the technology works in general, or a domain expert (a radiologist, a forensic analyst, a financial modeler) testifies about how they used an AI tool in their specific analysis. Neither is fully adequate. The computer scientist may not understand the domain-specific application. The domain expert may not understand the system's technical limitations.
FRE 707 demands both. The proponent must demonstrate system-level reliability (a technical question) and application-level validity (a domain question). This creates opportunities for experts who combine technical AI knowledge with domain expertise, and it creates challenges for parties who try to satisfy FRE 707 with a single expert who is strong in one area but weak in the other.
The opposing party's expert will need to identify specific weaknesses: Was the training data representative? Was the validation study appropriately designed? Does the system exhibit known biases or failure modes relevant to this case? Are there alternative systems or methodologies that would have been more appropriate? These are not generic AI criticisms. They are case-specific, technically grounded challenges that require deep expertise.
Cross-examination of AI experts under FRE 707 will also look different from traditional expert cross-examination. Counsel will need to probe the expert's understanding of the specific system at issue, not just AI in general. Questions about training data composition, hyperparameter selection, performance metrics, and failure mode analysis will become standard. Litigators who cannot ask these questions effectively will be at a significant disadvantage.
The Road Ahead
FRE 707 is not a perfect rule. No rule written for a technology evolving this rapidly could be. Critics have raised legitimate concerns about its scope (is it too broad?), its burden on proponents (is validation too expensive for routine applications?), and its timing (should we wait until the technology matures before codifying standards?).
These are fair questions. But the alternative to an imperfect rule is no rule at all, and we have seen what that looks like. Inconsistent standards across districts. Judges improvising frameworks on the fly. AI evidence admitted in one courtroom and excluded in another for reasons that have more to do with the judge's comfort level with technology than with the evidence's actual reliability.
The legal system has confronted this kind of challenge before and adapted. When DNA evidence emerged in the 1980s, courts struggled with how to evaluate it. Some admitted it uncritically. Others excluded it entirely. Eventually, through a combination of rulemaking, case law development, and scientific standardization, the legal system developed robust frameworks for evaluating DNA evidence. The result was better justice: reliable DNA evidence is admitted, unreliable DNA evidence is excluded, and the standards are transparent and consistent.
FRE 707 represents the beginning of that same process for AI evidence. It will be refined through judicial interpretation, amended as the technology evolves, and supplemented by practice standards developed by professional organizations. The important thing is that the process has started.
For litigators, the message is clear. AI evidence is not a future issue. It is a present reality. The cases you are handling today involve AI systems, whether you realize it or not. The medical records in your personal injury case were generated by systems that incorporate AI-assisted coding. The financial models in your securities case were built using machine learning. The forensic evidence in your criminal case may have been processed by probabilistic genotyping software.
FRE 707 gives you a framework for thinking about these systems. Use it.
The Criterion AI provides expert witness services, litigation support, and technical consulting for matters involving artificial intelligence, machine learning, and algorithmic decision-making. We help litigators understand AI systems, prepare FRE 707 challenges, and develop case strategy for disputes involving machine-generated evidence. For a confidential consultation, contact us at info@thecriterionai.com or call (617) 798-9715.