[SOLVED] How to concurrently & efficiently remove repeated instances of specific text in a PDF?

scherz0

Distinguished
Jun 18, 2013
6
0
18,510
This purchased PDF (that I didn't create) repeats the text, underlined in green, under each diagram. Undeniably, deleting each instance one by one is too inefficient! How can I delete this text productively, synchronously, at one fell swoop?

W2nOk_1_bviawc.jpg
 
Solution
If the objects are identical, and also searchable, then there are only one method I can possible think of:
  • Install Libre Office and open the PDF file in the Draw application.
  • Make a script (python or visual basic) that search for text snippets that are identical.
However, there are several possible problems that can make this undoable, at least as I know of:
  • There must be possible to traverse document objects in Libre Office Draw.
  • The objects themselfs have to be equally formatted. My experience with PDF files (at least vector graphic on pages imported via Inkscape) is that even when the content looks very similar, it's a huge pile of mess when trying to tell different parts/objects from each other)...

Ralston18

Titan
Moderator
What is the requirement or reason for removing the underlined green text?

You probably cannot do so.

The "printable version" link may be part of the image(s) and not directly editable.

Plus the fact, that even being a purchased .pdf, the .pdf is likely copyright protected and making changes is not permitted without the necessary permissions.
 
  • Like
Reactions: Phillip Corcoran
There's a reason these documents are delivered as PDFs. Your question is like "How can I remove the eggs from cooked ham-and-eggs dish" - you don't. You don't own the freedom to do whatever you like with this publication - you have just purchased the rights to read it.
 
If the objects are identical, and also searchable, then there are only one method I can possible think of:
  • Install Libre Office and open the PDF file in the Draw application.
  • Make a script (python or visual basic) that search for text snippets that are identical.
However, there are several possible problems that can make this undoable, at least as I know of:
  • There must be possible to traverse document objects in Libre Office Draw.
  • The objects themselfs have to be equally formatted. My experience with PDF files (at least vector graphic on pages imported via Inkscape) is that even when the content looks very similar, it's a huge pile of mess when trying to tell different parts/objects from each other). This may be an issue with text as well.
  • And last - The content may actually be nothing more than a static image, a jpg file embedded in the pdf.
 
Solution