fixEncoding

Jody Winter Ralf D. Müller Alexander Schwartz

1 minute to read

About This Task

Whenever Asciidoctor has to process a file that is not UTF-8 encoded, Ruby tries to read it, then throws an error similar to this one:

asciidoctor: FAILED: /home/demo/test.adoc: Failed to load AsciiDoc document - invalid byte sequence in UTF-8

Unfortunately, finding the incorrectly encoded file is difficult if a lot of includes:: are used, and Asciidoctor will only show the name of the main document. This is not Asciidoctor’s fault. The fault lies with the Ruby interpreter that sits underneath.

The fixEncoding task crawls through all *.ad and *.adoc files and checks their encoding. If it comes across a file which is not UTF-8 encoded, it will rewrite it with the UTF-8 encoding.

Source

scripts/fixEncoding.gradle
import groovy.util.*
import static groovy.io.FileType.*

task fixEncoding(
        description: 'finds and converts non UTF-8 adoc files to UTF-8',
        group: 'docToolchain helper',
) {
    doLast {
        File sourceFolder = new File("${docDir}/${inputPath}")
        println("sourceFolder: " + sourceFolder.canonicalPath)
        sourceFolder.traverse(type: FILES) { file ->
            if (file.name ==~ '^.*(ad|adoc|asciidoc)$') {
                CharsetToolkit toolkit = new CharsetToolkit(file);
                // guess the encoding
                def guessedCharset = toolkit.getCharset().toString().toUpperCase();
                if (guessedCharset!='UTF-8') {
                    def text = file.text
                    file.write(text, "utf-8")
                    println("  converted ${file.name} from '${guessedCharset}' to 'UFT-8'")
                }
            }
        }
    }
}