this post was submitted on 20 Jun 2024
957 points (98.9% liked)
Science Memes
11047 readers
3242 users here now
Welcome to c/science_memes @ Mander.xyz!
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.
Rules
- Don't throw mud. Behave like an intellectual and remember the human.
- Keep it rooted (on topic).
- No spam.
- Infographics welcome, get schooled.
This is a science community. We use the Dawkins definition of meme.
Research Committee
Other Mander Communities
Science and Research
Biology and Life Sciences
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- !reptiles and [email protected]
Physical Sciences
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
Humanities and Social Sciences
Practical and Applied Sciences
- !exercise-and [email protected]
- [email protected]
- !self [email protected]
- [email protected]
- [email protected]
- [email protected]
Memes
Miscellaneous
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Unfortunately that wouldn't work as this is information inside the PDF itself so it has nothing to do with the file hash (although that is one way to track.)
Now that this is known, It's not enough to remove metadata from the PDF itself. Each image inside a PDF, for example, can contain metadata. I say this because they're apparently starting a game of whack-a-mole because this won't stop here.
There are multiple ways of removing ALL metadata from a PDF, here are most of them.
It will be slow-ish and probably make the file larger, but if you're sharing a PDF that only you are supposed to have access to, it's worth it. MAT or exiftool should work.
Edit: as spoken about in another comment thread here, there is also pdf/image steganography as a technique they can use.
Wouldn't printing the PDF to a new PDF inherently strip the metadata put there by the publisher?
it’s possible using steganographic techniques to embed digital watermarks which would not be stripped by simply printing to pdf.
This is a great point. Image watermarking steganography is nearly impossible to defeat unless you can obtain multiple copies of the 'same' file from multiple users to look for differences. It could be a change of a single 5-15 pixels from one rgb code off.
to
Which would be imperceptable to the human eye. Depending on the number of users it may need to change more or less pixels.
There is a ton of work in this field and its very interesting, for anyone considering majoring in computer science / information security.
Another 'neat' technology everyone should know about is machine identification codes, or, the tiny ~~secret~~ tracking dots that color printers print on every page to identify the specific make, model, and serial number (I think?) of the printer the page was printed from. I don't believe B&W printers have tracking dots, which were originally used to track creators of counterfeit currency. EFF has a page of color printers which do not include tracking dots on printed pages. This includes color LaserJets along with InkJets, although I would not be surprised if there was a similar tracking feature in place now or in the future "for safety and privacy reasons," but none that I am aware of.
I wonder if it's common for those steganography techniques to have some mechanism for defeating the fairly simple strategy of getting 2 copies of the file from different sources, and looking at the differences between them to expose all the watermarks.
(I'd think you would need sections of watermark that are the same for any 2 or n combinations of copies of the data, which may be pretty easy to do in many cases, though the difference makes detecting the general watermarking strategy massively easier for the un-watermarkers)