There are a lot of good paid conversion libraries for .NET, but in this example we will see how to do this conversion for free (bear with me, there is nothing free in this world 😉 but at least you will pay only for hosting) using Azure Functions v2, Docker and LibreOffice.
LibreOffice is a free, open source office suite, capable of manipulating Microsoft Office document formats. It has a command line functionality that converts any office document (Word, PowerPoint...) into PDF.
On Linux, the command line is:
/usr/bin/libreoffice --norestore --nofirststartwizard --headless --convert-to pdf source-file.docx
On Windows:
C:\Program Files\LibreOffice\program\soffice.exe -norestore -nofirststartwizard -headless -convert-to pdf source-file.docx
This will create a .PDF document in the current location with the same name of the source file.
So, ¿Why not try to wrap LibreOffice command line and put it as a service?
As a side note, this is similar as using the not supported Server Automation in Microsoft Office when using Windows, but with the difference that this is more lightweight and works with Linux. In any case, take into consideration that each conversion HTTP would open a new LibreOffice process so, you will have a limited number of parallel conversions depending on your CPU and RAM.
The idea is to create an Azure function that do the following:
- Receives a URL and downloads the original document (Word, PowerPoint, etc)
- Calls LibreOffice and executes the conversion
- Store the converted PDF into a Azure Blob Storage
- Returns a secure URL for downloading the converted PDF
You can check the full function source code here. However, the most important part is this:
// Convert file to PDF using libreoffice
var pdfProcess = new Process();
pdfProcess.StartInfo.FileName = LIBRE_OFFICE_BIN;
pdfProcess.StartInfo.Arguments = $"-norestore -nofirststartwizard -headless -convert-to pdf \"{sourceFileName}\"";
pdfProcess.StartInfo.WorkingDirectory = Path.GetDirectoryName(sourceFileName); //This is really important
pdfProcess.Start();
// Wait while document is converting
while (pdfProcess.IsRunning())
{
await Task.Delay(500);
}
// Check if file was converted properly
var destinationFileName = $"{Path.Combine(Path.GetDirectoryName(sourceFileName), Path.GetFileNameWithoutExtension(sourceFileName))}.pdf";
if (!File.Exists(destinationFileName))
{
return new BadRequestObjectResult("Error converting file to PDF");
}
Even, we can use the same function for Windows and Linux (Debugging purposes) using conditional compiling (Supposing that you Debug on Windows and deploy on Linux):
#if DEBUG
const string LIBRE_OFFICE_BIN = @"C:\Program Files\LibreOffice\program\soffice.exe";
#else
const string LIBRE_OFFICE_BIN = "/usr/bin/libreoffice";
#endif
So far so good... but, ¿How we install LibreOffice in Azure Serverless?
Remember that in Serverless mode you have absolutely no control of the server programs. Your function may execute "somewhere" in Azure, even in different servers each time.
Luckly, Azure Functions v2 runs on Linux and we can create a docker container with all the tools we need ❤️. Azure Functions v2 Linux runtime is based on Debian stretch so, we can install LibreOffice without fuss there:
FROM mcr.microsoft.com/azure-functions/dotnet:2.0 AS base
WORKDIR /app
EXPOSE 80
FROM mcr.microsoft.com/dotnet/core/sdk:2.1-stretch AS build
WORKDIR /src
COPY ["PdfConverterFunction.csproj", ""]
RUN dotnet restore "./PdfConverterFunction.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "PdfConverterFunction.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "PdfConverterFunction.csproj" -c Release -o /home/site/wwwroot
FROM base AS final
WORKDIR /home/site/wwwroot
COPY --from=publish ["/home/site/wwwroot", "/home/site/wwwroot"]
ENV AzureWebJobsScriptRoot=/home/site/wwwroot
RUN mkdir -p /usr/share/man/man1
RUN apt-get update && apt-get upgrade -y
RUN apt-get -y install default-jre-headless libreoffice
To test it locally, do the following:
docker build . -t pdfconverter
docker run -p 80:80 -e AzureWebJobsStorage="<your-blob-connection-string>" --name pdfconverter pdfconverter
This will host the function. You can test the function doing a POST with the following JSON body (make sure that the url is valid):
POST http://localhost/api/PdfConverterFunction
{
"url": "https://server/path/document.docx"
}
If all goes well, then you should receive an URL as a response with the converted PDF document.
Once you have the docker image ready, you can publish it without fuss using the Web App for Containers service, and following these high level steps:
- Upload the customized image into Azure Container Registry or Docker Hub
- Create a Function App and follow the wizard (you will be able to select Linux and directly deploy the container)