Take Off Labs | Ruby on Rails and iOS for web and mobile apps
x
11224742 10207744386832061 1134591581572583218 n

Rails: Printing PDFs in the background with Doc Raptor

over 7 years ago by Alex

Popup

DocRaptor is a great service that converts HTML to PDF and is also available as a Heroku add-on. The initial integration is straightforward in a Rails app using the docraptor gem.

However, things get complicated in the case of very large PDFs. Recently we encountered a case where the PDF took more than 10 seconds to generate. We naturally considered moving the printing process to a background job.

We split the printing process in multiple steps, and we found out that the DocRaptor step is the most time-intensive:

  • User makes a request to download report
  • Our app generates the HTML code of the report
  • Our app sends the HTML code to DocRaptor
  • DocRaptor converts to PDF
  • We serve the PDF back to the user

Initially, we had a controller (reports) with an action (print) that handled the entire process synchronously. To move this to a delayed job:

Step 1

Create a new action on the reports controller called schedule_print. This simply sets up a file name, a path that generates the HTML code and creates a delayed job instance. We are using a ParametrizedJob model in order to issue progress notification to the user while he waits for the delayed job to finish running. We found out this alleviates most of the user pain and is a better alternative than simply waiting.

name = "PDF Report"
path = reports_url

pjob = ParametrizedJob.create
pjob.update_attributes(title: "Preparing PDF")
j = pjob.schedule({
  type: "docraptor",
  options: options
})
redirect_to job_path(j) + "?#{Time.now.to_i}" and return

Step 2

User is redirected to the show page of the newly created job instance. Here, they’ll subscribe to a pusher channel, and, once they successfully subscribe, the job is scheduled for processing in the background. For more information, see our blog post about Pusher. The user is kept on a waiting page:

Step 3

When the job is first in line, we start processing it. This code is the most complicated, as it handles setting up the DocRaptor parameters and getting the HTML content of the reports#print page.

prs = parameters[:options]

options = { 
  name:             prs[:name],
  document_type:    :pdf,
  test:             !Rails.env.production?,
  strict:           "none",
  prince_options: {
    baseurl: Rails.application.config.action_controller.asset_host
  }
}

job.generate_notification("Preparing PDF Content") # This issues a pusher notification to the user on the wait page
response = get(prs[:path])
options[:document_content] ||= response.body

job.generate_notification("Content Ready. Converting to PDF")
response = DocRaptor.create(options)

# Save the PDF on S3
job.generate_notification("Saving PDF")
AWS::S3::Base.establish_connection!(
  :access_key_id     => AWS_KEY_ID,
  :secret_access_key => AWS_SECRET_KEY
)

bucket = "pdfs"
AWS::S3::S3Object.store(pdf_name, response.body, bucket)

Step 4

The user is then notified, via Pusher, of the state of the document. The user browser will receive messages at each status update (a simple text notification or the fact that the document was completed). JavaScript is responsible for handling these. This part should be straightforward and the end result looks like this:

Users are much happier to see a spinning gear and updates than waiting for a single request to load up. Also, this releases some of the load on our heroku dynos as it’s not necessary to keep them occupied while DocRaptor is working.

Did you like this post? Share it with your friends!