1 minute to read
fixEncoding
About This Task
Whenever Asciidoctor has to process a file that is not UTF-8 encoded, Ruby tries to read it, then throws an error similar to this one:
asciidoctor: FAILED: /home/demo/test.adoc: Failed to load AsciiDoc document - invalid byte sequence in UTF-8
Unfortunately, finding the incorrectly encoded file is difficult if a lot of includes::
are used, and Asciidoctor will only show the name of the main document. This is not Asciidoctor’s fault. The fault lies with the Ruby interpreter that sits underneath.
The fixEncoding task crawls through all *.ad
and *.adoc
files and checks their encoding.
If it comes across a file which is not UTF-8 encoded, it will rewrite it with the UTF-8 encoding.
Source
import groovy.util.*
import static groovy.io.FileType.*
task fixEncoding(
description: 'finds and converts non UTF-8 adoc files to UTF-8',
group: 'docToolchain helper',
) {
doLast {
File sourceFolder = new File("${docDir}/${inputPath}")
println("sourceFolder: " + sourceFolder.canonicalPath)
sourceFolder.traverse(type: FILES) { file ->
if (file.name ==~ '^.*(ad|adoc|asciidoc)$') {
CharsetToolkit toolkit = new CharsetToolkit(file);
// guess the encoding
def guessedCharset = toolkit.getCharset().toString().toUpperCase();
if (guessedCharset!='UTF-8') {
def text = file.text
file.write(text, "utf-8")
println(" converted ${file.name} from '${guessedCharset}' to 'UFT-8'")
}
}
}
}
}
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.