Core to many court cases is the timeline of events and actions backed up by evidence. What many attorneys need is a precise and relevant virtual timeline consisting of the digital footprints everyone in this modern age leaves. While this in the past was done with a small army of people, the challenge is that with the avalanche of data that is produced and collected by smartphones, smartlocks, cameras, routers, social media, smart watches, etc., filtering through to get the relevant information is harder than finding a needle in the haystack even with large budgets using only human labor. What is worse, the opposition in a case many times will overproduce by copying and delivering irrelevant data to “bankrupt” the other side during production requests. With the amount of discovery data exponentially increasing and many cases being won or lost based on who can do better e-discovery, what can a legal team do practically?
The need to develop a concise provable virtual timeline is what is driving the adoption of Artificial Intelligence (AI) in e-discovery. The key advantages that AI provides are the ability to distill terabytes of data, make relevant connections between important data, and create a virtual timeline in a fraction of the time and effort vs. human based discovery coupled with non-AI based e-discovery software tools.
Finding relevant data in a discovery dump is the first task an e-discovery team has to do. This is extremely difficult due the wide variety of formats information is in from text, photos, video, audio, log files, paper files, etc.
For text based information, traditional non-AI software will be used to search for keywords in documents. This simplistic approach produces a lot of false positives and false negatives of supposed relevant information that a human needs to manually review and sort through. AI-approaches allow teams to automatically classify documents based on a much smaller set of documents (typically called “training data”) that have been manually determined to be relevant or not. This can reduce the need to read through 100,000 documents to only looking at 100 documents. What’s more, with Natural Language Processing (NLP), a subset of AI, e-discovery teams can browse through adjacent concepts in a document production which is important because sometimes what to search for isn’t exactly known. For example, for a car accident case, an attorney might want to find information of issues that could have caused the car accident like medical or mechanical issues. Manually searching through terabytes of documents with possible keywords is very tedious and expensive, and likely not to find the actual cause. With NLP, an attorney can automatically see all possible causes (e.g. prior DUIs) in the documents actually connected to that car accident. AI also can be used to take large volumes of text and automatically put in a tabular spreadsheet format under the proper column headings. This can be very useful in establishing a virtual timeline quickly and automatically. For example, for the sentences: “This morning I ate breakfast” “I didn’t sleep until 2am” If there are two columns in a spreadsheet like “time” and “action”, it would automatically produce tabular data like this:<
- Ate breakfast 2am
This can work even if the text is structured vastly different from each other. Another useful capability AI brings is sentiment analysis. As an example, this technology can be used to quickly surface all communication that was negative around a topic. With AI, uncovering truths becomes faster and easier than ever even with large volumes of documents.
On the other hand, when it comes time to produce documents, sometimes non-relevant Personally Identifiable Information (PII) needs to be redacted ahead of time. Eliminating this manually is error prone and expensive. Fortunately, with AI, this can be done automatically and reliably with just a few clicks even for PII that doesn’t match search strings (e.g. different social security numbers formatted in different ways). In addition, sometimes responses to discovery requests have to be generated. To bankrupt the other side, AI text generators can be used to create realistic and legally correct but voluminous amounts of text answers to force the opposition to waste attorney time without spending their own attorney time. AI has created this spy vs. spy arms race that is forcing everyone to use AI for ediscovery.
With all the security, traffic lights and especially smartphone cameras, a visual footprint is created for everyone nearly all the time. This has created tens of thousands of hours of photos and video that ediscovery teams have had to sift through by actually looking at the content. To solve this problem, automatic face, scene (e.g. find a certain person when they are on the ski slope) and text recognition (e.g. license plates) have become essential tools to find relevant video and photos in minutes. This automated visual search can also be done for custom images like tattoos, jewelry, clothing, etc. even for bad angles, poor quality or partial images. In addition, objects and people can also be tracked in video which can be useful for finding video proof that a neighbor poisoned their tree without having to watch week’s worth of Nest Cam video. AI has allowed finding the needle in the proverbial haystack in a fraction of the time and expense vs. human review.