docx4使用方案

一、前言

docx4j 是一个开源(ASLv2)Java 库,用于创建和处理 Microsoft Open XML(Word docx,Powerpoint pptx 和 Excel xlsx)文件

docx4j 官网:传送门

docx4j 示例代码 :传送门

二、理解docx文件

1.认识 Open XML

docx4j 主要是针对 docx 文件进行操作,操作的对象的 Microsoft Open XML 文件。

Office Open XML,也称为 OpenXML 或 OOXML,是 Office 文档的基础 XML 的格式,应用于 word,execl,ppt,图表等。该规范由 Microsoft 开发,被 ISO 和 IEC 采纳。现在是所有 Microsoft Office 文档(.docx,.xlsx 和. pptx)的默认格式。

Microsoft Open XML官网:传送门

2.docx 文件的结构

docx 我们可以理解为一个前端项目的压缩包,里面有样式文件,dom 格式的 xml 文件,我们解压一个 docx 的文件看一下它的目录结构:

排序算法分类

我们先看根目录下的 [Content_Types] .xml 文件,这个是整个 docx 文件的内容组件的配置文件,整个压缩包中用到的文件都在里面配置,可以简单理解为,我们写前端时 html 文件的 head 部分关于 js,css 引用的部分,但这样理解有点不明确,想象一下,jsp 文件 include 各部分的总页面,有点类似。

_rels 这个目录是配置定义各部分间的关系,了解就行,

看里面的核心 word 目录,可以看到,这个目录结构跟我们前端项目的 html、css 结构很相似,

media 目录下放多媒体元素,图片之类的,了解就行

theme 目录,顾名思义,word 的主题,了解就行

word 目录下最重要的是里面的 document.xml 文件

其他的 word 部分文件,settings.xml 和 styles.xml 是 docx 文件都有的配置和样式文件,footTable 是字体表,footnotes 是词汇表,其他的还有 foot 开头的脚表,head 开头的是页眉文件之类的文件

3.docx 的核心 OpenXML 文件格式

我们主要看 document.xml 文件,这个文件是整个 docx 的骨架,类似于前端页面的 html 文件,是最基础、也是最重要的文件。

理解 document.xml 的文本结构,这个文本结构类似 html 格式

1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document>
<w:body>
<w:p></w:p>
<w:tbl></w:tbl>
<w:sectPr></w:sectPr>
</w:body>
</w:document>

三、使用docx4j

1.maven 导入

1
2
3
4
5
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j</artifactId>
<version>6.1.2</version>
</dependency>

2.简单使用

2.1 基础创建

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/**
* 基础创建:创建一个新的docx文档
* 获取文档可操作对象
*/
@GetMapping("/createDocx")
public String createDocx(HttpServletResponse response){
if(!StringUtils.isEmpty(template01Path)) {
OutputStream outs = null;
try {
wordMLPackage = WordprocessingMLPackage.createPackage();
// wordMLPackage.save(new File(template01Path));
// Docx4J.save(wordMLPackage, new File(docxPath));
String fileName = URLEncoder.encode("模板表", "UTF-8");
response.setContentType("application/octet-stream;charset=UTF-8");
response.setCharacterEncoding("utf-8");
response.setHeader("Content-Disposition", "attachment; filename=" + fileName + ".docx");
response.setHeader("Access-Control-Expose-Headers", "Content-Disposition");

outs = response.getOutputStream();
wordMLPackage.save(outs);
outs.flush();
} catch (Exception e) {
logger.error(e.getMessage());
}
}else {
return "路径不存在";
}
return "基础创建成功";
}

2.2 追加文档内容

2.2.1 向文件中增加段落

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* 追加文档内容
* 向文档中追加内容(默认支持中文)
* 先清空,再生成,防重复
*/
@GetMapping("/addParagraph")
public String addParagraph() {
try {
//先加载word文档
wordMLPackage = WordprocessingMLPackage.load(new File(template01Path));
// wordMLPackage = Docx4J.load(new File(docxPath));

//增加内容
wordMLPackage.getMainDocumentPart().addParagraphOfText("你好!");
wordMLPackage.getMainDocumentPart().addStyledParagraphOfText("Title", "这是标题!");
wordMLPackage.getMainDocumentPart().addStyledParagraphOfText("Subtitle", " 这是二级标题!");

wordMLPackage.getMainDocumentPart().addStyledParagraphOfText("Subject", "试一试");
//保存文档
wordMLPackage.save(new File(template01OutPath));
} catch (Docx4JException e) {
logger.error("addParagraph to docx error: Docx4JException", e);
}
return "追加文档内容成功";
}

2.2.2 采用工厂类增加段落的方法(工厂类的使用,工厂类也是一种通用的方法)

先创建一个工厂,(需要导入的包是org.docx4j.wml,导错的的话下面全错)。

R是一个运行块,负责便于将多个属性相同的Object对象统一操作,通过其内部的content成员变量可以添加内容,RPr是运行块的属性(属于类R的一个成员变量),可以对R对象进行操作。R通过被作为其他对象的content内容。所以通过R在A元素中加一个B元素的操作的一般步骤是:(1)创建R;(2)将内容元素B加到R中;(3)将R增加到A元素中;(4)将A元素加到mainDocumentPart内容中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/**
* 增加一个段落,增加完成记得保存,否则不生效
*/
@GetMapping("/addParagraph2")
public String addParagraph2() {

String simpleText = "addParagraph2";
try {
wordMLPackage = WordprocessingMLPackage
.load(new File(template01Path));
factory = Context.getWmlObjectFactory();
P para = factory.createP();
if (simpleText != null) {
Text t = factory.createText();
t.setValue(simpleText);
R run = factory.createR();
run.getContent().add(t);
para.getContent().add(run);
}
wordMLPackage.getMainDocumentPart().getContent().add(para);
wordMLPackage.save(new File(template01OutPath));
} catch (Exception e) {
logger.error("addParagraph to docx error: Docx4JException", e);
}
return "追加文档内容成功";
}

2.3 图片

2.3.1 图片工具类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
public class DocImageHandler {
/**
* 将图片加入到包中
* @param wordMLPackage
* @param bytes
* @throws Exception
*/
public static void addImageToPackage(WordprocessingMLPackage wordMLPackage, byte[] bytes) throws Exception {
BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wordMLPackage, bytes);

int docPrId = 1;
int cNvPrId = 2;
Inline inline = imagePart.createImageInline("Filename hint",
"Alternative text", docPrId, cNvPrId, false);

P paragraph = addInlineImageToParagraph(inline);

wordMLPackage.getMainDocumentPart().addObject(paragraph);
}

/**
* 将图片加入到 段落中去
*
* @param inline
* @return
*/
public static P addInlineImageToParagraph(Inline inline) {
// 添加内联对象到一个段落中
ObjectFactory factory = new ObjectFactory();
P paragraph = factory.createP();
R run = factory.createR();
paragraph.getContent().add(run);
Drawing drawing = factory.createDrawing();
run.getContent().add(drawing);
drawing.getAnchorOrInline().add(inline);
return paragraph;
}

/**
* 将图片从文件对象转换为字节数组
*
* @param file 将要转换的文件
* @return 包含图片字节数据的字节数组
* @throws FileNotFoundException
* @throws IOException
*/
public static byte[] convertImageToByteArray(File file) throws FileNotFoundException, IOException {
InputStream is = new FileInputStream(file);
long length = file.length();
// 不能使用long类型创建数组, 需要用int类型.
if (length > Integer.MAX_VALUE) {
System.out.println("File too large!!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead = is.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
// 确认所有的字节都没读取
if (offset < bytes.length) {
System.out.println("Could not completely read file " + file.getName());
}
is.close();
return bytes;
}
}

2.3.2 插入图片

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
* 插入图片
*/
@GetMapping("/wordInsertImage")
public String wordInsertImage() {
try {
wordMLPackage = WordprocessingMLPackage.load(new File(docxPath));
byte[] bytes = DocImageHandler.convertImageToByteArray(new File(picPath));
DocImageHandler.addImageToPackage(wordMLPackage, bytes);
wordMLPackage.save(new File(docxOutPath));
} catch (Exception e) {
logger.error(e.getMessage());
}
return "插入图片成功";
}

2.4 创建表格

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
/**
* 创建表格
*/
@GetMapping("addTable")
public String addTable() {
try {
wordMLPackage = WordprocessingMLPackage.load(new File(template01Path));
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();

factory = Context.getWmlObjectFactory();
// 1 创建表格元素
Tbl table = factory.createTbl();
//2 显示表格的边框
addBorders(table);

//3 添加表格内容(创建行和列)
for (int i = 0; i < 3; i++) {
Tr tr = factory.createTr();
for (int j = 0; j < 3; j++) {
Tc tc = factory.createTc();

//3.1 查看createParagraphOfText(str)的源码
// 3.1.1.创建一个text,并设置其值,
// 3.1.2.创建一个R并将text增加到R中,
// 3.1.3.创建一个P将R加到P中
// P p = mainDocumentPart.createParagraphOfText("---row" + i + "---column" + j + "---");

//3.2 第二种创建P并设置样式的方法
P p = factory.createP();
R r = factory.createR();
Text text = factory.createText();
text.setValue("---row" + i + "---column" + j + "---");
r.getContent().add(text);
p.getContent().add(r);
//3.2.1 通过R设置字体加粗等属性
setRStyle(r);
//3.2.2 设置列宽
if (j == 1) {
setCellWidth(tc, 1250);
} else {
setCellWidth(tc, 2500);
}

tc.getContent().add(p);
tr.getContent().add(tc);
}
table.getContent().add(tr);
}

//4 将新增表格加到主要内容中
mainDocumentPart.addObject(table);
wordMLPackage.save(new File(template01Path));
} catch (Docx4JException e) {
logger.error("createDocx error: Docx4JException", e);
}
return "创建表格成功";
}


/**
* 设置边框样式
* 需要设置表格边框的单元格
* @param table
*/
private static void addBorders(Tbl table) {
// 必须设置一个TblPr,否则最后会报空指针异常
table.setTblPr(new TblPr());

// 创建一个默认颜色(黑色)、粗细尺寸为4、间距为0的单线边框的边框组件(Border component)
CTBorder border = new CTBorder();
border.setColor("auto");
border.setSz(new BigInteger("4"));
border.setSpace(new BigInteger("0"));
border.setVal(STBorder.SINGLE);

// 边框组件被应用到表格的四周以及表格内部水平和垂直的边框
TblBorders borders = new TblBorders();
borders.setBottom(border);
borders.setLeft(border);
borders.setRight(border);
borders.setTop(border);
borders.setInsideH(border);
borders.setInsideV(border);

// 获取其内部的TblPr属性设置属性,边框应用到表格
table.getTblPr().setTblBorders(borders);
}

/**
* 通过设置R设置表格中属性字体加粗,大小为25
* @param
*/
private static void setRStyle(R r) {
// 1.创建一个RPr
RPr rpr = new RPr();

// 2.设置RPr
// 2.1设置字体大小
HpsMeasure size = new HpsMeasure();
size.setVal(new BigInteger("25"));
rpr.setSz(size);
// 2.2设置加粗
BooleanDefaultTrue bold = new BooleanDefaultTrue();
bold.setVal(true);
rpr.setB(bold);

// 3.将RPr设置为R的属性
r.setRPr(rpr);
}

/**
* 设置列宽
* @param tc
* @param width
*/
private static void setCellWidth(Tc tc, int width) {
TcPr tableCellProperties = new TcPr();
TblWidth tableWidth = new TblWidth();
tableWidth.setW(BigInteger.valueOf(width));
tableCellProperties.setTcW(tableWidth);

tc.setTcPr(tableCellProperties);
}

补充:

  • 工厂类的一些通用方法

  • 格式化样式的操作

2.5 读取表格内容(解析docx4j的树结构—获取指定类型的元素)

有时候我们调用getContent()获取的元素类型是Tr之类的直接元素,可以强转;有时候不可以直接强转,其类型是JAXBElement,需要进行提取—getAllElementFromObject方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
/**
* 读取表格内容
*/
@GetMapping("readTable")
public String readTable(){
try {
wordMLPackage = WordprocessingMLPackage.load(new File(template01Path));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

// 1. ClassFinder构造类型查询器获取指定元素
ClassFinder find = new ClassFinder(Tbl.class);
new TraversalUtil(documentPart.getContent(), find);

// 获取到第一个表格元素
Tbl table = (Tbl) find.results.get(0);
List<Object> trs = table.getContent();
logger.info("{}", trs);

System.out.println("=====================");

for (Object obj : trs) {
Tr tr = (Tr) obj;// 获取到tr
List<Object> content = tr.getContent();
logger.info("{}", content);
List<Object> objList = getAllElementFromObject(tr, Tc.class);// 获取所有的Tc元素
for (Object obj1 : objList) {
Tc tc = (Tc) obj1;
logger.info("{}", tc.getContent());
}
System.out.println("===============");
}
} catch (Docx4JException e) {
logger.error(e.getMessage());
}
return "读取表格内容";
}

private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
List<Object> result = new ArrayList<Object>();
if (obj instanceof JAXBElement)
obj = ((JAXBElement<?>) obj).getValue();
if (obj.getClass().equals(toSearch))
result.add(obj);
else if (obj instanceof ContentAccessor) {
List<?> children = ((ContentAccessor) obj).getContent();
for (Object child : children) {
result.addAll(getAllElementFromObject(child, toSearch));
}
}
return result;
}

2.6 读取docx文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
* 读取docx文件(这里不支持doc文件)
* 读取word文件,这里没有区分 word中的样式格式
*/
@GetMapping("readParagraph")
public String readParagraph() {
List<Object> list = new ArrayList<>();
try {
wordMLPackage = WordprocessingMLPackage.load(new File(template01Path));

String contentType = wordMLPackage.getContentType();
logger.info("contentType:"+contentType);
MainDocumentPart part = wordMLPackage.getMainDocumentPart();
logger.info("content -> body -> "+part.getContents().getBody().toString());
list = part.getContent();
for(Object o :list) {
logger.info("info:"+o);
}
}catch(Exception e) {
logger.error(e.getMessage());
}
String jsonString = JSON.toJSONString(list);
return jsonString;
}

2.7 docx转html文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
/**
* docx转html文件
* 样式表可以自行修改
* 转xls,再转html
*/
@GetMapping("/wordToHtml")
public String wordToHtml() {
boolean nestLists = true;
boolean save = true;
try {
wordMLPackage = WordprocessingMLPackage.load(new File(template01Path));
HTMLSettings html = Docx4J.createHTMLSettings();
//设置图片的目录地址
html.setImageDirPath(template01Path + "_files");
html.setImageTargetUri(template01Path.substring(template01Path.lastIndexOf("/") + 1 ) + "_files");
html.setWmlPackage(wordMLPackage);
String userCSS = null;
//userCSS是生成的html的样式,可以手动设置,使用此参数可以灵活的设置边距字体等信息
if (nestLists) {
userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img, table, caption, tbody, tfoot, thead, tr, th, td "
+ "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";
} else {
userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img, ol, ul, li, table, caption, tbody, tfoot, thead, tr, th, td "
+ "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";
}
html.setUserCSS(userCSS);
OutputStream os = null;
if (save) {
os = new FileOutputStream(template01Path + ".html");
} else {
os = new ByteArrayOutputStream();
}
//设置输出
Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);

Docx4J.toHTML(html, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

if (save) {
System.out.println("Saved: " + template01Path + ".html ");
} else {
System.out.println(((ByteArrayOutputStream) os).toString());
}
if (wordMLPackage.getMainDocumentPart().getFontTablePart() != null) {
wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
}
html = null;
wordMLPackage = null;

} catch (Exception e) {
logger.error(e.getMessage());
}

return "docx转html文件成功";
}

2.8 按指定变量替换docx中的内容 ${var}替换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
 /**
* 按指定变量替换docx中的内容 ${var}替换
*/
@GetMapping("replaceTableByVariable")
public String replaceTableByVariable(){
boolean save = true;
try {
wordMLPackage = WordprocessingMLPackage.load(new File(template01Path));
VariablePrepare.prepare(wordMLPackage);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
Docx4jUtil.cleanDocumentPart(documentPart);

//需要替换的map
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("name", "张三");
mappings.put("age", "25");
mappings.put("sex", "男");

long start = System.currentTimeMillis();
documentPart.variableReplace(mappings);
// // unmarshallFromTemplate requires string input
// String xml = XmlUtils.marshaltoString(documentPart.getJaxbElement(), true);
// // Do it...
// Object obj = XmlUtils.unmarshallFromTemplate(xml, mappings);
// // Inject result into docx
// documentPart.setJaxbElement((Document) obj);
long end = System.currentTimeMillis();
long total = end - start;
logger.info("Time: " + total);

// Save it
if (save) {
// 输出word文件

//1
// SaveToZipFile saver = new SaveToZipFile(wordMLPackage);
// saver.save("/home/person-project/helloworld_1.docx");

//2
// OutputStream outputStream = new FileOutputStream(new File(docxOutPath));
// wordMLPackage.save(outputStream);
// outputStream.flush();

//3
// wordMLPackage.save(new File(docxOutPath));

//4
Docx4J.save(wordMLPackage, new File(template01OutPath));
} else {
logger.info(XmlUtils.marshaltoString(documentPart.getJaxbElement(), true, true));
}

} catch (Exception e) {
logger.error(e.getMessage());
}
return "按指定变量替换docx中的内容成功";
}

2.9 替换模板里面的表格(循环替换标签)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/**
* 替换模板里面的表格(循环替换标签)
*/
@GetMapping("replaceTableByLoop")
public String replaceTableByLoop(){
factory = Context.getWmlObjectFactory();
try {
wordMLPackage = WordprocessingMLPackage.load(new File(template01Path));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
Docx4jUtil.cleanDocumentPart(documentPart);

// 构造循环列表的数据
ClassFinder find = new ClassFinder(Tbl.class);
new TraversalUtil(documentPart.getContent(), find);
// 获取到第一个表格元素
Tbl table = (Tbl) find.results.get(0);
// 第一行约定为模板,获取到第一行内容
Tr dynamicTr = (Tr) table.getContent().get(0);
// 获取模板行的xml数据
String dynamicTrXml = XmlUtils.marshaltoString(dynamicTr);

List<Map<String, Object>> dataList = getDataList();
for (Map<String, Object> dataMap : dataList) {
Tr newTr = (Tr) XmlUtils.unmarshallFromTemplate(dynamicTrXml, dataMap);// 填充模板行数据
table.getContent().add(newTr);
}

// 删除模板行的占位行
table.getContent().remove(0);

Docx4J.save(wordMLPackage, new File(template01OutPath));
} catch (Exception e) {
logger.error(e.getMessage());
}
return "替换模板里面的表格成功";
}

private static List<Map<String, Object>> getDataList() {
List list = new ArrayList();
for (int i = 0; i < 3; i++) {
Map map = new HashMap();
map.put("name", "name" + i);
map.put("sex", "sex" + i);
map.put("age", "age" + i);
list.add(map);
}
return list;
}

2.10 按占位符替换内容(替换变量、表格等格式数据)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
/**
* 按占位符替换内容(替换变量、表格等格式数据)
* 注意:1 占位符在word转换为xml被分离问题:
* 1.1 原因:
* 1.1.1 打开word模板,单词底下有红线标注,这个就是word文档的单词校验,一般组装的标识符不符合单词校验规则,在转换的过程中,会单独分开(因为底下有标注),所以就会产生占位符被分开的情况。
* 1.2 解决方案:
* 1.2.1 docx中先不写变量,将docx另存为xml,然后用docx打开这个xml,这时候加变量就好了,${variable}就不会被分离了,之后再另存为docx即可
* 1.2.2 先建一个txt文本,将${variable}编辑到文本,然后复制到docx即可(推荐)
* 1.2.3 调用Docx4jUtil.cleanDocumentPart(MainDocumentPart documentPart)清扫 docx4j 模板变量字符,通常以${variable}形式
*/
@GetMapping("/placeholderTable")
public String placeholderTable(){
try {
wordMLPackage = WordprocessingMLPackage.load(new File(templatePath));
Map<String, String> mappings = new HashMap<String, String>();
//构造非循环格子的表格数据
mappings.put("name", "马参军");
mappings.put("sex", "男");
mappings.put("skill", "散谣:三人成虎事多有");

//构造循环列表的数据
ClassFinder find = new ClassFinder(Tbl.class);
new TraversalUtil(wordMLPackage.getMainDocumentPart().getContent(), find);
Tbl table = (Tbl) find.results.get(1);
Tr dynamicTr = (Tr) table.getContent().get(1);//第二行约定为模板
String dynamicTrXml = XmlUtils.marshaltoString(dynamicTr);//获取模板行的xml数据
List<Map<String , Object>> dataList = dataList();
for (Map<String, Object> dataMap : dataList) {
Tr newTr = (Tr) XmlUtils.unmarshallFromTemplate(dynamicTrXml, dataMap);//填充模板行数据
table.getContent().add(newTr);
}
//删除模板行的占位行
table.getContent().remove(1);
wordMLPackage.getMainDocumentPart().variableReplace(mappings);//设置全局的变量替换
Docx4J.save(wordMLPackage, new File(outPath));
} catch (Exception e) {
logger.error(e.getMessage());
}
return "按占位符替换内容成功";
}

//构造循环数据
private static List<Map<String , Object>> dataList() {
List<Map<String , Object>> dataList = new ArrayList<Map<String , Object>>();
Map<String , Object> m1 = new HashMap<String , Object>();
m1.put("number", "1");m1.put("company", "阿里巴巴");
m1.put("slogan", "让天下没有难做的生意");
dataList.add(m1);
Map<String , Object> m2 = new HashMap<String , Object>();
m2.put("number", "2");m2.put("company", "腾讯");
m2.put("slogan", "连接你我 共生未来");
dataList.add(m2);
Map<String , Object> m3 = new HashMap<String , Object>();
m3.put("number", "3");m3.put("company", "字节跳动");
m3.put("slogan", "激发创造 丰富生活");
dataList.add(m3);
return dataList;
}

2.11 按书签替换内容(替换变量、表格、图片等格式数据)

这种方式比基于变量的方式灵活,而且操作简单,我们只用在word中插入书签

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
/**
* 按书签替换内容(替换变量、表格、图片等格式数据)
*/
@GetMapping("/booknameReplaceVar")
public String booknameReplaceVar(){
try {
wordMLPackage = WordprocessingMLPackage.load(new File(template02Path));
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
factory = Context.getWmlObjectFactory();

Document wmlDoc = (Document) mainDocumentPart.getJaxbElement();
Body body = wmlDoc.getBody();
// 提取正文中所有段落
List<Object> paragraphs = body.getContent();
// 提取书签并创建书签的游标
RangeFinder rt = new RangeFinder("CTBookmark", "CTMarkupRange");
new TraversalUtil(paragraphs, rt);
// 遍历书签
for (CTBookmark bm : rt.getStarts()) {
logger.info("标签名称:" + bm.getName());
// 这儿可以对单个书签进行操作,也可以用一个map对所有的书签进行处理
// List<Map<String, Object>> dataList = getDataList();
// for (Map<String, Object> map : dataList) {
// replaceText(bm, map);
// }
if (bm.getName().equals("name0")) {
replaceText(bm, "zhangsan");
}
if (bm.getName().equals("pic01")) {
addImage(wordMLPackage, bm, picPath);
}
}
Docx4J.save(wordMLPackage, new File(template02outPath));

} catch (Exception e) {
logger.error(e.getMessage());
}
return "按书签替换内容成功";
}

/**
* 在标签处插入内容
*
* @param bm
* @param object
* @throws Exception
*/
public static void replaceText(CTBookmark bm, Object object) throws Exception {
if (object == null) {
return;
}
// do we have data for this one?
if (bm.getName() == null)
return;
String value = object.toString();
try {
// Can't just remove the object from the parent,
// since in the parent, it may be wrapped in a JAXBElement
List<Object> theList = null;
ParaRPr rpr = null;
if (bm.getParent() instanceof P) {
PPr pprTemp = ((P) (bm.getParent())).getPPr();
if (pprTemp == null) {
rpr = null;
} else {
rpr = ((P) (bm.getParent())).getPPr().getRPr();
}
theList = ((ContentAccessor) (bm.getParent())).getContent();
} else {
return;
}
int rangeStart = -1;
int rangeEnd = -1;
int i = 0;
for (Object ox : theList) {
Object listEntry = XmlUtils.unwrap(ox);
if (listEntry.equals(bm)) {

if (((CTBookmark) listEntry).getName() != null) {

rangeStart = i + 1;

}
} else if (listEntry instanceof CTMarkupRange) {
if (((CTMarkupRange) listEntry).getId().equals(bm.getId())) {
rangeEnd = i - 1;

break;
}
}
i++;
}
int x = i - 1;
// if (rangeStart > 0 && x >= rangeStart) {
// Delete the bookmark range
for (int j = x; j >= rangeStart; j--) {
theList.remove(j);
}
// now add a run
R run = factory.createR();
Text t = factory.createText();
// if (rpr != null)
// run.setRPr(paraRPr2RPr(rpr));
t.setValue(value);
run.getContent().add(t);
// t.setValue(value);

theList.add(rangeStart, run);
// }
} catch (ClassCastException e) {
logger.error(e.getMessage());
}
}

/**
* 插入图片
*
* @param wPackage
* @param bm
* @param file
*/
public static void addImage(WordprocessingMLPackage wPackage, CTBookmark bm, String file) {
logger.info("addImage :->{},{},{}", wPackage, bm,file);
try {
// 这儿可以对单个书签进行操作,也可以用一个map对所有的书签进行处理
// 获取该书签的父级段落
P p = (P) (bm.getParent());
// R对象是匿名的复杂类型
R run = factory.createR();
// 读入图片并转化为字节数组,因为docx4j只能字节数组的方式插入图片
byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));

// 开始创建一个行内图片
BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wPackage, bytes);
// 创建内联对象,createImageInline函数最后一个参数是限制图片的宽度,缩放的依据
Inline inline = imagePart.createImageInline(null, null, 0, 1, false, 0);
// 获取该书签的父级段落
Drawing drawing = factory.createDrawing();
drawing.getAnchorOrInline().add(inline);
run.getContent().add(drawing);
p.getContent().add(run);
} catch (Exception e) {
logger.error(e.getMessage());
}
}

2.12 综合使用

2.12.1 工具类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
/**
* @author zhangy
* @Time 2021-07-12 16:08
* @Description: 关于文件操作的工具类
*/
public class Docx4jUtil {

private static final Logger logger = LoggerFactory.getLogger(Docx4jUtil.class);

private static WordprocessingMLPackage wordMLPackage;
private static ObjectFactory factory;

/**
* 替换变量并下载word文档
* @param inputStream
* @param map
* @param dataList
* @param fileName
* @param response
*/
public static void downloadDocxUseDoc4j(InputStream inputStream,
Map<String, String> map,
List<CompositeDocxReq.DataList> dataList,
List<Map<String, Object>> picList,
String fileName,
HttpServletResponse response) {

OutputStream outs = null;
try {
response.reset();
fileName = URLEncoder.encode(fileName, "UTF-8");
response.setContentType("application/octet-stream;charset=UTF-8");
response.setCharacterEncoding("utf-8");
response.setHeader("Content-Disposition", "attachment; filename=" + fileName + ".docx");
response.setHeader("Access-Control-Expose-Headers", "Content-Disposition");

outs = response.getOutputStream();
Docx4jUtil.replaceDocUseDoc4j(inputStream,map,dataList,picList,outs);
} catch (Exception e) {
logger.error(e.getMessage(), e);
}
}

/**
* 替换变量并输出word文档
* @param inputStream
* @param map
* @param dataList
* @param picList
* @param outputStream
*/
public static void replaceDocUseDoc4j(InputStream inputStream,
Map<String, String> map,
List<CompositeDocxReq.DataList> dataList,
List<Map<String, Object>> picList,
OutputStream outputStream) {
MainDocumentPart mainDocumentPart = null;
try {
wordMLPackage = WordprocessingMLPackage.load(inputStream);
VariablePrepare.prepare(wordMLPackage);
mainDocumentPart = wordMLPackage.getMainDocumentPart();
factory = Context.getWmlObjectFactory();

if (!CollectionUtils.isEmpty(map) || !CollectionUtils.isEmpty(dataList) || !CollectionUtils.isEmpty(picList)) {
// 将${}里的内容结构层次替换为一层,清扫docx4j模板变量字符
Docx4jUtil.cleanDocumentPart(mainDocumentPart);

//构造循环列表的变量数据
for (int i = 0; i < dataList.size(); i++) {
Integer num = dataList.get(i).getNum();
Integer trNum = dataList.get(i).getTrNum();
ClassFinder find = new ClassFinder(Tbl.class);
new TraversalUtil(mainDocumentPart.getContent(), find);
//获取到第几个表格元素
Tbl table = (Tbl) find.results.get(num.intValue());
//第二行约定为模板
Tr dynamicTr = (Tr) table.getContent().get(trNum.intValue());
//获取模板行的xml数据
String dynamicTrXml = XmlUtils.marshaltoString(dynamicTr);
for (Map<String, Object> dataMap : dataList.get(i).getDataInnerList()) {
//填充模板行数据
Tr newTr = (Tr) XmlUtils.unmarshallFromTemplate(dynamicTrXml, dataMap);
table.getContent().add(newTr);
}
//删除模板行的占位行
table.getContent().remove(trNum.intValue());
}

//插入图片
//书签方式
Document wmlDoc = (Document) mainDocumentPart.getJaxbElement();
Body body = wmlDoc.getBody();
// 提取正文中所有段落
List<Object> paragraphs = body.getContent();
// 提取书签并创建书签的游标
RangeFinder rt = new RangeFinder("CTBookmark", "CTMarkupRange");
new TraversalUtil(paragraphs, rt);
// 遍历书签
for (CTBookmark bm : rt.getStarts()) {
logger.info("标签名称:" + bm.getName());
for (int i = 0; i < picList.size(); i++) {
Map<String, Object> stringObjectMap = picList.get(i);
Set<String> keys = stringObjectMap.keySet();
for (String key : keys) {
if(bm.getName().equals(key)){
addImage(wordMLPackage, bm, (String) stringObjectMap.get(key));
}
}
}
}

// 设置全局的变量替换
mainDocumentPart.variableReplace(map);
}

// 输出word文件
wordMLPackage.save(outputStream);
outputStream.flush();
//输出文件到保存地点
// wordMLPackage.save(new File("/home/person-project/template02_out.docx"));
} catch (Exception e) {
logger.error(e.getMessage(), e);
}finally {
if(null != outputStream){
try {
outputStream.close();
} catch (IOException e) {
logger.error(e.getMessage());
}
}
}
}


/**
* cleanDocumentPart
*
* @param documentPart
*/
public static boolean cleanDocumentPart(MainDocumentPart documentPart) throws Exception {
if (documentPart == null) {
return false;
}
Document document = documentPart.getContents();
String wmlTemplate =
XmlUtils.marshaltoString(document, true, false, Context.jc);
document = (Document) XmlUtils.unwrap(DocxVariableClearUtils.doCleanDocumentPart(wmlTemplate, Context.jc));
documentPart.setContents(document);
return true;
}

/**
* 清扫 docx4j 模板变量字符,通常以${variable}形式
* <p>
* XXX: 主要在上传模板时处理一下, 后续
*
* @author zhangy
* @Time 2021-07-12 16:08
*/
private static class DocxVariableClearUtils {

/**
* 去任意XML标签
*/
private static final Pattern XML_PATTERN = Pattern.compile("<[^>]*>");

private DocxVariableClearUtils() {
}

/**
* start符号
*/
private static final char PREFIX = '$';

/**
* 中包含
*/
private static final char LEFT_BRACE = '{';

/**
* 结尾
*/
private static final char RIGHT_BRACE = '}';

/**
* 未开始
*/
private static final int NONE_START = -1;

/**
* 未开始
*/
private static final int NONE_START_INDEX = -1;

/**
* 开始
*/
private static final int PREFIX_STATUS = 1;

/**
* 左括号
*/
private static final int LEFT_BRACE_STATUS = 2;

/**
* 右括号
*/
private static final int RIGHT_BRACE_STATUS = 3;


/**
* doCleanDocumentPart
*
* @param wmlTemplate
* @param jc
* @return
* @throws JAXBException
*/
private static Object doCleanDocumentPart(String wmlTemplate, JAXBContext jc) throws JAXBException {
// 进入变量块位置
int curStatus = NONE_START;
// 开始位置
int keyStartIndex = NONE_START_INDEX;
// 当前位置
int curIndex = 0;
char[] textCharacters = wmlTemplate.toCharArray();
StringBuilder documentBuilder = new StringBuilder(textCharacters.length);
documentBuilder.append(textCharacters);
// 新文档
StringBuilder newDocumentBuilder = new StringBuilder(textCharacters.length);
// 最后一次写位置
int lastWriteIndex = 0;
for (char c : textCharacters) {
switch (c) {
case PREFIX:
// TODO 不管其何状态直接修改指针,这也意味着变量名称里面不能有PREFIX
keyStartIndex = curIndex;
curStatus = PREFIX_STATUS;
break;
case LEFT_BRACE:
if (curStatus == PREFIX_STATUS) {
curStatus = LEFT_BRACE_STATUS;
}
break;
case RIGHT_BRACE:
if (curStatus == LEFT_BRACE_STATUS) {
// 接上之前的字符
newDocumentBuilder.append(documentBuilder.substring(lastWriteIndex, keyStartIndex));
// 结束位置
int keyEndIndex = curIndex + 1;
// 替换
String rawKey = documentBuilder.substring(keyStartIndex, keyEndIndex);
// 干掉多余标签
String mappingKey = XML_PATTERN.matcher(rawKey).replaceAll("");
if (!mappingKey.equals(rawKey)) {
char[] rawKeyChars = rawKey.toCharArray();
// 保留原格式
StringBuilder rawStringBuilder = new StringBuilder(rawKey.length());
// 去掉变量引用字符
for (char rawChar : rawKeyChars) {
if (rawChar == PREFIX || rawChar == LEFT_BRACE || rawChar == RIGHT_BRACE) {
continue;
}
rawStringBuilder.append(rawChar);
}
// FIXME 要求变量连在一起
String variable = mappingKey.substring(2, mappingKey.length() - 1);
int variableStart = rawStringBuilder.indexOf(variable);
if (variableStart > 0) {
rawStringBuilder = rawStringBuilder.replace(variableStart, variableStart + variable.length(), mappingKey);
}
newDocumentBuilder.append(rawStringBuilder.toString());
} else {
newDocumentBuilder.append(mappingKey);
}
lastWriteIndex = keyEndIndex;

curStatus = NONE_START;
keyStartIndex = NONE_START_INDEX;
}
default:
break;
}
curIndex++;
}
// 余部
if (lastWriteIndex < documentBuilder.length()) {
newDocumentBuilder.append(documentBuilder.substring(lastWriteIndex));
}
return XmlUtils.unmarshalString(newDocumentBuilder.toString(), jc);
}
}

/**
* 插入图片
*
* @param wPackage
* @param bm
* @param file
*/
public static void addImage(WordprocessingMLPackage wPackage, CTBookmark bm, String file) {
logger.info("addImage :->{},{},{}", wPackage, bm,file);
try {
// 这儿可以对单个书签进行操作,也可以用一个map对所有的书签进行处理
// 获取该书签的父级段落
P p = (P) (bm.getParent());
// R对象是匿名的复杂类型
R run = factory.createR();

// 读入图片并转化为字节数组,因为docx4j只能字节数组的方式插入图片
byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));
// 开始创建一个行内图片
BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wPackage, bytes);
// 创建内联对象,createImageInline函数最后一个参数是限制图片的宽度,缩放的依据
Inline inline = imagePart.createImageInline(null, null, 0, 1, false, 0);
// 获取该书签的父级段落
Drawing drawing = factory.createDrawing();
drawing.getAnchorOrInline().add(inline);
run.getContent().add(drawing);
p.getContent().add(run);
} catch (Exception e) {
logger.error(e.getMessage());
}
}

/**
* docx转html文件
* @param inputPath
*/
public static void docx2Html(String inputPath){
boolean nestLists = true;
try {
wordMLPackage = WordprocessingMLPackage
.load(new File(inputPath));
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath(inputPath + "_files");
htmlSettings.setImageTargetUri(inputPath.substring(inputPath.lastIndexOf("/") + 1) + "_files");
htmlSettings.setWmlPackage(wordMLPackage);

String userCSS = null;
//userCSS是生成的html的样式,可以手动设置,使用此参数可以灵活的设置边距字体等信息
if (nestLists) {
userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img, table, caption, tbody, tfoot, thead, tr, th, td "
+ "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";
} else {
userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img, ol, ul, li, table, caption, tbody, tfoot, thead, tr, th, td "
+ "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";

}
htmlSettings.setUserCSS(userCSS);

Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);
Docx4J.toHTML(htmlSettings, new FileOutputStream(new File(inputPath + ".html")),
Docx4J.FLAG_EXPORT_PREFER_XSL);

if (wordMLPackage.getMainDocumentPart().getFontTablePart() != null) {
wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
}
htmlSettings = null;
wordMLPackage = null;
} catch (Exception e) {
logger.error(e.getMessage());
}
}
}

2.12.2 下载docx请求参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
/**
* @author zhangy
* @Time 2021-07-13 16:10
* @Description: 下载docx请求参数
*/
public class CompositeDocxReq {
//非循环列表的变量数据
private Map<String, String> mappings;
//多个循环列表
private List<DataList> dataList;
//图片数据
private List<Map<String, Object>> picList;
//导出文件名
private String fileName;

public static class DataList{
//循环列表的变量数据
private List<Map<String , Object>> dataInnerList;
//处于docx中的表格序号,从0开始
private Integer num;
//模板占位符变量行数,从0开始
private Integer trNum;

public List<Map<String, Object>> getDataInnerList() {
return dataInnerList;
}

public void setDataInnerList(List<Map<String, Object>> dataInnerList) {
this.dataInnerList = dataInnerList;
}

public Integer getNum() {
return num;
}

public void setNum(Integer num) {
this.num = num;
}

public Integer getTrNum() {
return trNum;
}

public void setTrNum(Integer trNum) {
this.trNum = trNum;
}
}

public Map<String, String> getMappings() {
return mappings;
}

public void setMappings(Map<String, String> mappings) {
this.mappings = mappings;
}

public List<DataList> getDataList() {
return dataList;
}

public void setDataList(List<DataList> dataList) {
this.dataList = dataList;
}

public List<Map<String, Object>> getPicList() {
return picList;
}

public void setPicList(List<Map<String, Object>> picList) {
this.picList = picList;
}

public String getFileName() {
return fileName;
}

public void setFileName(String fileName) {
this.fileName = fileName;
}
}

2.12.3 使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/**
* 综合使用
* @return
*/
@PostMapping("/composite")
public String composite(@RequestBody CompositeDocxReq compositeDocxReq, HttpServletResponse response){
MainDocumentPart mainDocumentPart = null;
try {
//工具类下载
Docx4jUtil.downloadDocxUseDoc4j(new FileInputStream(new File(template02Path)),
compositeDocxReq.getMappings(),
compositeDocxReq.getDataList(),
compositeDocxReq.getPicList(),
compositeDocxReq.getFileName(),
response);
} catch (Exception e) {
logger.error(e.getMessage());
}
return "综合使用调用成功";
}
打赏
  • 版权声明: 著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处!

请我喝杯咖啡吧~

支付宝
微信