I have tested Adobe Acrobat Reader DC (v.2023.003.20269) on macOS Monterey 12.6.8, and Ventura 13.5. This is the current version as of 2023-08-18. The identical PDF files were used on both instances of macOS.
What I discovered on both platforms is that the Adobe product may generate an empty text file, whether its PDF origin is Pages, LibreOffice Writer, or TexShop. It may decide to split the text file as one word per line, or it may actually get the text word extraction right, but with random concatentation of text words. This is a bug for Adobe to fix.
Worse, on macOS Monterey, one PDF when saved as text, the result was concatenated words mixed with individual words with a trailing space and a carriage return (^M). Looks like this:

and although one can remove the carriage returns, it does beg the question what Adobe is doing injecting carriage returns on a UNIX machine where linefeeds are the norm. This particular PDF was generated by TeXShop and appeared normally in Acrobat Reader and Preview.
So I got fed up with this nonsense and wrote a brief Swift script that just generates a correct text file regardless of the PDF it ingests. No concatenation of words, and none of the Adobe misdeeds. Works perfectly with all of the PDFs I tested against Adobe's product, and no concatenated words. Can process multiple PDFs on the command line into their text file equivalents.
#!/usr/bin/swift
/*
Script to read n-tuple PDF provided on command line and extract text to
file[s] in the same location with the ".txt" extension. Works on PDF
documents correctly where Adobe Acrobat Reader DC mangles the result.
Tested: Ventura 13.5 (Swift 5.8.1), Monterey 12.6.8 (Swift 5.7.2)
Compiled: swiftc -Osize -o pdf2text pdf2text.swift -framework Foundation -framework AppKit -framework PDFKit
Usage: ./pdf2text.swift ~/Desktop/foo.pdf ~/Desktop/bar.pdf
Author: VikingOSX, 2023-08-18, Apple Support Communities, No warranties of any kind.
*/
import Foundation
import AppKit
import PDFKit
func readPDF(urlpath: URL) -> String {
let pdf = PDFDocument(url: urlpath)
return pdf!.string!
}
let fileManager = FileManager.default
var inputArgs: Array<URL>
inputArgs = CommandLine.arguments.dropFirst().map { URL(fileURLWithPath: $0).absoluteURL }
inputArgs.forEach { elem in
guard fileManager.fileExists(atPath: elem.path) else {
let notFound = NSString.init(string:elem.path).abbreviatingWithTildeInPath
print("Error: \(notFound): File not found.")
return // the .forEach equivalent of continue
}
// print("File: \(elem.path)")
var outfile: URL!
var text: String
// write the outfile text contents to the same location as the PDF
outfile = elem.deletingPathExtension().appendingPathExtension("txt")
text = readPDF(urlpath: elem)
do {
try text.write(to: outfile, atomically: true, encoding: String.Encoding.utf8)
} catch {
print("Error: unable to write text file.")
return
}
let tildeFile = NSString.init(string: outfile.path).abbreviatingWithTildeInPath
print("Written: \(tildeFile)")
}
exit(EXIT_SUCCESS)
If Swift is not installed, one can install the Apple Command Line Tools for Xcode (~/3GB) and Swift/Swiftc will be in /usr/bin which is already in your Terminal PATH.
Launch the Terminal application and at the Terminal prompt, enter the following (not the # lines) and then press the return key after each entry:
# make the swift script that you saved executable
chmod +x ./pdf2text.swift
# now install the Xcode command line tools
xcode-select --install
That will not install Xcode and will then prompt with the following installer dialogs as shown in this article. When this is done you can invoke pdf2text.swift from the Terminal and specify one or multiple PDFs that you want to extract text as shown in the script comments.