Pandoc HTML to DOCX images load locally but not on server

I am currently using Rails 4.2 and pandoc-ruby to convert my HTML documents to DOCX for user download. Unfortunately, the docx created by the server doesn’t display images. Locally it does.

Edit: I’m running pandoc 1.19 locally (on Mac). I got docker to install the new version of pandoc without issue. Running pandoc -v or pandoc --version gives me the following output from docker:

  • pg_dump server and pg_dump version mismatch in docker
  • Serve large dataset w/ Docker, nginx, & django
  • Dockerfile vs Docker image
  • Persistent easily-accessible storage in Docker
  • node js os.networkInterfaces() not returning any results when called through docker entrypoint on AWS
  • Accessing virtual com ports inside a WS2016 docker container
  • root@3dd9b57878f1:~# pandoc -v
    pandoc 1.19.2.1
    Compiled with pandoc-types 1.17.0.4, texmath 0.9, skylighting 0.1.1.4
    

    Here’s what I get on my mac (installed by brew):

    ~ ❯❯❯ pandoc -v
    pandoc 1.19
    Compiled with pandoc-types 1.17.0.4, texmath 0.9, highlighting-kate 0.6.3
    

    My rails app creates a preview page that contains images pulled down from S3, so the img src is the standard s3 URL, e.g. “https://xyz.s3.amazonaws.com/xyz/xyz/image.jpg?1234567890”

    On my machine, when I have pandoc-ruby convert the page to docx, the images all get pulled down without issue, and the docx looks just like the preview page. The image isn’t stored in the rails directory — only through s3. On docker, that isn’t the case. It instead references the ID of my image. According to the generated XML, the docker one doesn’t even have the same XML:

    locally:

    <w:pPr><w:pStyle w:val="Compact" /></w:pPr><w:r><w:drawing><wp:inline><wp:extent cx="3048000" cy="2286000" />
    <wp:effectExtent b="0" l="0" r="0" t="0" /><wp:docPr descr="4" title="" id="1" name="Picture" />
    <a:graphic><a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
    <pic:pic>
    <pic:nvPicPr>
    <pic:cNvPr descr="http://xyz.s3.amazonaws.com/development/assets/xyz/2345585/4.jpg?123456789" id="0 ...
    

    the docker version:

    <w:r><w:t xml:space="preserve">Image</w:t></w:r></w:p><w:p><w:pPr>
    <w:pStyle w:val="Compact" /></w:pPr><w:r><w:t xml:space="preserve">475</w:t></w:r></w:p><w:sectPr /></w:body></w:document>
    

    (the 475 is the id of the image)

    According to this SO question as well as this SO question it could be that the URLs aren’t pointing to images on disk. And that seemed to be an issue with an older version of Pandoc. It doesn’t explain why it works locally, since the XML is using the URL.

    Could it be that the docker container is still running the 1.16 version despite me manually installing 1.19.x?

  • How do I leverage package maintainers' experience with Docker?
  • Docker & nginx - Internal links not working
  • How to install docker in docker container?
  • Cannot access files inside Vagrant Sync Folder
  • Docker/Jupyter notebook setting Base URL
  • crontab not running in docker container shell
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.