mercoledì 28 novembre 2018

Automatic check of media files integrity (Project)

check-media-integrity is an open-source Python script that decodes media files in order to check integrity. You can also check all the media files in a folder recursively.
I released the sources on GitGub, here
Any feedback or suggestion is appreciated, I look for ways to force stricter checks for Pillow, Wand and FFmpeg.

Typical damaged Jpeg (Public domain)

check-mi help:

Checks integrity of Media files (Images, Video, Audio).

positional arguments:
  P                     path to the file or folder

optional arguments:
  -h, --help            show this help message and exit
  -c X, --csv X         Save bad files details on csv file X
  -v, --version         show program's version number and exit
  -r, --recurse         Recurse subdirs
  -i, --disable-images  Ignore image files
  -m, --enable-media    Enable check for audio/video files
  -p, --disable-pdf     Ignore pdf files
  -e, --disable-extra   Ignore extra image extensions (psd, xcf,. and rare
                        ones)
  -x E, --err-detect E  Execute ffmpeg decoding with a specific err_detect
                        flag E, 'strict' is shortcut for
                        +crccheck+bitstream+buffer+explode

- Single file check ignores options -i,-m,-p,-e,-c

- With 'err_detect' option you can provide the 'strict' shortcut or the flags
supported by ffmpeg, e.g.: crccheck, bitstream, buffer, explode, or their
combination, e.g., +buffer+bitstream

- Supported image formats/extensions: ['jpg', 'jpeg', 'jpe', 'png', 'bmp',
'gif', 'pcd', 'tif', 'tiff', 'j2k', 'j2p', 'j2x', 'webp']

- Supported image EXTRA formats/extensions:['eps', 'ico', 'im', 'pcx', 'ppm',
'sgi', 'spider', 'xbm', 'tga', 'psd', 'xcf']

- Supported audio/video extensions: ['avi', 'mp4', 'mov', 'mpeg', 'mpg',
'm2p', 'mkv', '3gp', 'ogg', 'flv', 'f4v', 'f4p', 'f4a', 'f4b', 'mp3', 'mp2']

- Output CSV file, has the header raw, and one line for each bad file,
providing: file name, error message, file size


check-media-integrity is an open-source Python script that decodes
 media files in order to check integrity. You can also check all the
media files in a folder recursively.


Warning

check-mi is not the definitive crystal ball. Thanks to my tool I performed an experimental campaign
For images, file truncation and damage on vital parts (headers) are always detected.
check-mi is not always able to detect minor damages--e.g. small portion of file overwritten with a different value--in fact media files (and codec) are resilient to this type of damage.
From my (few) experiments,  with "zero fill" you need an extended damage to get a chance to see the problem, while with random noise you always get 85% chance of detection.

A movie or audio needs severe damage for the check-mi to detect the problem.

Nessun commento:

Posta un commento