A transcript is a written document created during a deposition, trial, arbitration, or other legal proceeding that records all the questions and answers by the attorney, self-represented party, or witness.
Transcripts are typically produced by a certified court reporter. There are a number of different file types and formats.
The most familiar file type for the vast majority of attorneys is a PDF because it is a readable, printable version of the transcript that is always made available. However, most deposition review software, including Case Builder, depends upon having the exact text of the transcript.
Case Builder requires a file format that can reliably provide this information in a consistent format.
Transcripts
Accepted file formats
- The transcript files accepted by Case Builder are commonly referred to as “ASCII files.”
- Transcript files must have a .txt extension.
Page numbers
- All pages (except the first page and last pages) must have a page number at the beginning or end of the page.
- Page numbers must be consecutively numbered (1, 2, 3, 4, and so on). However, the transcript is not required to start on page 1.
- The page number should be on its own line, or on the same line as the first line of the page. If the page number is on the same line as the page number, the page and line numbers must be separated with a colon (:).
- The page number can be preceded by the word Page or with up to four leading 0s (e.g., 0001). If the word Page precedes the page number, there may also be any number of whitespace characters (or none) preceding the word Page. If not, there may be 0, 2, 3, or greater than or equal to 25 whitespace characters preceding the page number.
2
1 P R O C E E D I N G S
2 (10:03 a.m.)
3 CHIEF JUSTICE ROBERTS: We'll hear argument
4 first this morning in Case 12536, McCutcheon v. The
5 Federal Election Commission.
6 Ms. Murphy.
An example of a transcript with page numbers on their own line. Each page number is preceded by leading spaces. Note that we expect the same number of characters on each page line, such that a leading space would be removed when the page number has two digits (when compared to page numbers with a single digit), and two leading spaces removed when the page number has three digits.
Page 2
1 P R O C E E D I N G S
2 (10:03 a.m.)
3 CHIEF JUSTICE ROBERTS: We'll hear argument
4 first this morning in Case 12536, McCutcheon v. The
5 Federal Election Commission.
6 Ms. Murphy.
An example of a transcript with page numbers preceded by whitespace and the word “Page.”
002
1 P R O C E E D I N G S
2 (10:03 a.m.)
3 CHIEF JUSTICE ROBERTS: We'll hear argument
4 first this morning in Case 12536, McCutcheon v. The
5 Federal Election Commission.
6 Ms. Murphy.
An example of a transcript with no leading spaces before the page number. The page numbers are instead preceded by no more than three zeros.
00001:01 SUPREME COURT OF THE STATE OF CALIFORNIA
02 COUNTY OF SAN FRANCISCO
03 Case No. CGC-16-553972
An example of a transcript with page numbers on the same line as the first line number of the page. The page numbers are preceded by up to four leading 0s. All page numbers in this transcript will have 5 digits (e.g., 00001, 00010, 00100, and so on).
Transcript rules for line numbers
- There must be at least 19 and no more than 27 numbered lines per page. A numbered line has a line number preceding the page content for that line.
- If there are no line numbers on a small number of pages (this is most commonly found on the first few first pages and the last few pages), Case Builder will automatically attempt to add line numbers.
- After each numbered line, an additional unnumbered line may be used. This is most commonly used on cover pages and are referred to in Case Builder error messages as “sublines.”
- On each page, line numbers must start with 1 and be consecutively numbered. Case Builder’s transcript processor may attempt to fill in a small number of line numbers when line numbers are missing.
- There must be at least one whitespace character between the line number and the page content for the line. Consistent spacing between line numbers and page content on all numbered lines is required.
- Line numbers may be preceded by whitespace. Lines 1-9 will typically have an additional space preceding them on each line. For example, if there are two whitespace characters before lines 10-25, there would be three whitespace characters before lines 1-9.
- The spacing before the line numbers must be consistent on all pages.
Rules for timestamp information
- Timestamp information may be contained on spoken lines. The timestamp information will be displayed in the application after processing.
- All timestamp information must be either left-aligned or right-aligned.
- Left-aligned timestamps are located before the line number on all or some of the lines.
- Right-aligned timestamps are located after the page content. The timestamp information will be a consistent number of characters from the beginning of the line (e.g., starting on column 64).
- Format must be HH:MM:SS. Seconds are optional. AM/PM are optional. Leading zeros are optional.
Note: There is a common issue in transcripts produced by court reporters with timestamp information. When a time is contained in the spoken text, there will often be a formatting issue where a new line is created in the wrong location.
Common issues
- Transcripts with missing information, such as a line number.
- Transcripts with a line break in the wrong place (often found after a videographer has announced the time).
- A page within a page (another odd issue where there are two page numbers on the same page).
- Transcripts copied and pasted out from PDF to TXT. This will cause a number of formatting issues that will require significant remediation.
Transcript options not currently supported
- Page headers and footers (i.e., with phone numbers or court reporter information) must be omitted. These are commonly found on the same line as the page number or at the beginning or end of the page.
- Transcripts with omitted pages (i.e., for confidentiality reasons).
Converting a PTX to a TXT
Thomson Reuters makes and distributes a free product, E-Transcript Bundle Reader, for viewing PTX files. E-Transcript Bundle Reader is only available for Windows PCs.
PTX files can be quickly converted to TXT in E-Transcript Bundle Reader by exporting as a text document.
Converting a PDF to TXT
There is no single way to convert a PDF to a TXT document in a supported format because PDFs vary in a number of ways.
- Some PDFs will have an attached file that can be downloaded and uploaded directly.
- Some PDFs will have a text layer. Depending on the way that the text is structured, it may be easy to extract (copy) the text out of the PDF and put it into a supported format.
If you require assistance with a PDF import, you can contact discodesk@csdisco.com for assistance and options.
Videos
A deposition video can be uploaded to Case Builder with a matching transcript.
Video length
Videos that are longer than four hours without a provided sync file will fail to process. We currently have a four hour limit for individual video processing, which will require video splitting before upload for videos to be viewable and synchronized. If you require assistance wtih video splitting, you can contact discodesk@csdisco.com for assistance and options.
Video transcript syncing
When a video is uploaded in Case Builder, it is synchronized with the transcript. A user can provide their own sync file in smi format in which case there must be one smi per video. If no sync file is provided, Case Builder will attempt to synchornize the transcript and video file independently.