Convert PDF Pages to Images from the Linux Command Line

I read a lot of preview comics and I help out with a popular online comic book website. From time to time we want to post preview pages of new books coming out and using a PDF plugin can be resource intensive, particularly if you're hosting your site on a smaller cloud instance at Digital Ocean or the like.



So how do you get around from the website loading a PDF viewer on your website? The simple method is to convert the pages to images, which allows you to preserve the quality but lower the overall size of the files being served to the end users.

You can find free online websites to convert PDF pages to images but I found this process tedious. You had to upload, have it convert, then download.

So if you run Linux and have ImageMagick installed (comes with most standard distributions), here's a nifty command line way to convert specific pages of the PDF to images (JPG, PNG, etc).

First make sure you have ImageMagick installed though.

On Fedora/CentOS:

$ sudo yum install ImageMagick

For Debian/Ubuntu:

$ sudo apt install imagemagick

Now to convert a PDF as an image, simply run the following (this will convert the entire PDF as one big image file so if the PDF is more than 2 pages long, it's going to be huge):

$ convert --density 200 filename.pdf filename.jpg

So let's break this down. Convert is the command used by ImageMagick to perform the conversion. The --density is the level of density. I've found 200 is a pretty reasonable density to use where you don't lose quality in the conversion. If you don't specify, the image might result in looking very choppy and in poor quality. If you go too high, it's only going to result in a very large image file. filename.pdf is just the name of the PDF file and filename.jpg is converting the PDF as a JPG image. You can specify PNG and other formats as long as ImageMagick supports the others.



Now, if you have a specific page you want to convert and not the whole PDF, you can specify what page to convert. Do note, PDF's start at page 0 and not 1 when it comes to ImageMagick and using the convert command. So if you want to convert page 5, you'll specify it as 4. A lot of PDF readers will not start at 0 when viewing in their application.

Here is the command to specify page 5 of a PDF:

$ convert --density 200 filename.pdf[4] filename.jpg

If you wanted to extract a few pages, you could throw these in a for loop:

$ for PG in 2 5 6 9; do convert --density 200 filename.pdf[$PG] filename_$PG.jpg; done

This would convert pages 3, 6, 7 and 10 of the PDF. Remember, pages start at 0.

If you wanted to do all pages from 0 through 9 as separate images, you can do it like this:

$ for PG in `seq 0 9`; do convert --density 200 filename.pdf[$PG] filename_$PG.jpg; done

Now, depending on your PDF, you may end up with a bunch of 1MB or bigger image files.

Usually the next step I take is I do a quick convert to resize so all the images are a certain width in pixels I'll want to use when displaying on a website or whatever size is needed for any other reason.

To do a quick resize of all the images, if say they're all in their own directory, you can do the following:

$ for IMG in `ls *.jpg`; do convert $IMG -resize 800x1024^ $IMG; done

The command above would convert all jpg images in the present directory and resize them to 800 pixels width and retaining their height accordingly. I've made it a habit that most of my PDF files are already over 1024 pixels in height. So you can theoretically change the height to 0^ and it would keep the ratio. If you wanted to convert their height to all the same height, keeping the width to ratio, do the following:

$ for IMG in `ls *.jpg`; do convert $IMG -resize 0^x1024 $IMG; done

That would update all the images to all have a height of 1024px.



Lots of different options. Find the best for you. You can also use convert to retain the quality of the image but lowering the overall size of the image files to save on space and load times. I've been able to get 2MB to 4MB size PDF pages down to 100k or less JPG images without any noticeable difference in viewing quality.