[已收纳]IDL复合案例之把docx文档中的多图做成gif

15195775117 · 发表于 2020-6-9 14:59:00

登录后查看更多精彩内容~

您需要登录才可以下载或查看，没有帐号？立即注册

x

本帖最后由 15195775117 于 2021-2-13 13:56 编辑

一、问题由来

今天领导给我发了2个docx文档，打开一看里面有很多截图，领导让我把里面的图连起来做成gif

二、图片读取

首先的问题就是------图怎么搞出来？

一个文档39张图，一个文档72张图，我是不情愿手动搞出来的，又累又蠢。

我想着用python-docx包来读图，找到了这个帖子：
https://www.cnblogs.com/zhanghongfeng/p/7043412.html

这个帖子给了文字段落和表格的读取方法，但是图的读取却是另一种操作：解压docx！
原来，docx本质是个zip文件！

帖子中介绍了docx的缘起：

docx是Microsoft Office2007之后版本使用的，用新的基于XML的压缩文件格式取代了其目前专有的默认文件格式，在传统的文件名扩展名后面添加了字母“x”，

即“.docx”取代“.doc”、“.xlsx”取代“.xls”、“.pptx”取代“.ppt”。docx格式的文件本质上是一个ZIP文件。将一个docx文件的后缀改为ZIP后是可以用解压工具打开或是解压的。

所以，我将XXX.docx文件重命名为XXX.zip后，解压------

图片就在.\XXX\word\media文件夹中，.\XXX\word\document.xml应该就是正文了

15195775117 · 发表于 2020-6-9 15:09:50

三、调整大小

文档中的图由于是截图的，大小不一样，用于制作gif的一系列图必须是相同大小，所以我要把这些图多调整为同一大小。

IDL代码：

pro same_size
;图片文件夹路径:
path='C:\Users\Administrator\Desktop\ReadFinInDoc\XXX\word\media\'
;检索出图片，以下写法兼顾jpg和png:
file=file_search(path+'*.*g')

;读取第一个图的大小，后面的图都按第一个图的尺寸调整:
read_png,file[0],ima
sz=size(ima)
m=sz[2];宽度
n=sz[3];高度

;开始调整每个图:
foreach i,file do begin
layer=bytarr(3,m,n);最后需要输出的图片数据
read_png,i,ima;读取图数组
sz=size(ima);该图大小
;准备一个二维数组用于存放该图的RGBa层:
;(直接用三维的，使用congrid函数有点绕)
y=bytarr(sz[2],sz[3])

y[*,*]=ima[0,*,*];把红色通道赋给y
layer[0,*,*]=congrid(y,m,n);红色通道调整为m*n尺寸

y[*,*]=ima[1,*,*];把绿色通道赋给y
layer[1,*,*]=congrid(y,m,n);绿色通道调整为m*n尺寸

y[*,*]=ima[2,*,*];把蓝色通道赋给y
layer[2,*,*]=congrid(y,m,n);蓝色通道调整为m*n尺寸

;如果是png格式，第4层是透明度层，不用管，因为我们只需要RGB即可

;绘图并输出:
fig=image(layer,position=[0,0,1,1],$
dimensions=[100,60],/buffer)
fig.save,i
fig.close
endforeach

end

15195775117 · 发表于 2020-6-9 15:19:14

四、输出gif

IDL代码

PRO create_gif_animation
  COMPILE_OPT IDL2
;gif的间隔时间，50大约是0.5秒：
delay_time=50
;输出gif文件路径：
outfname='C:\Users\Administrator\Desktop\aaa.gif'
;用于制作gif的多图文件夹:
figpath='C:\Users\Administrator\Desktop\XXX\word\media\**.png'
;读取图路径:
in_filenamelist=file_search(figpath)

;由于IDL是按字符串大小排序的，image10会排在image9前面，所以我要重新调整
;(IDL是按字符串大小排序的:1,10,11,...,19,2,20,21,...)
;读取图的文件名(可以批量哟):
names=file_basename(in_filenamelist)
book=-99;存放image叙述的向量
foreach i,names do begin
cut=strsplit(i,'.',/extract)
cut=strsplit(cut[0],'image',/extract)
book=[book,long(cut[0])];序数追加到数组
endforeach
book=book[1:-1];切除起始值
p=sort(book);对序数进行排序
in_filenamelist=in_filenamelist[p];对图名进行重新排序

;以下代码不用动!!!

file_nums = N_ELEMENTS(in_filenamelist)
IF (file_nums GT 0) AND ~STRCMP(in_filenamelist[0], '') THEN BEGIN
  FOR i = 0, file_nums - 1 DO BEGIN
img = READ_IMAGE(in_filenamelist, red, green, blue)
img_s = SIZE(img)
;If the dimension of the img is 3-D, then convert it to a index image first.
IF (img_s[0] EQ 3) THEN BEGIN
   img_idx = COLOR_QUAN(img[0, *, *], img[1, *, *], img[2, *, *], tbl_r, tbl_g, tbl_b)
   ;Reverse array in the second dimension.
   img_idx = REFORM(img_idx)
   WRITE_GIF, outfname, img_idx, tbl_r, tbl_g, tbl_b, $
      DELAY_TIME = delay_time, /MULTIPLE, REPEAT_COUNT = 0
ENDIF
;If the dimension of the img is 2-D, then write it to the gif file directly.
IF (img_s[0] EQ 2) THEN BEGIN
   img = REFORM(img)
   IF (N_ELEMENTS(red) GT 0) AND (N_ELEMENTS(green) GT 0) AND (N_ELEMENTS(blue) GT 0) THEN BEGIN
      WRITE_GIF, outfname, img, red, green, blue, DELAY_TIME = delay_time, /MULTIPLE, REPEAT_COUNT = 0
   ENDIF
ENDIF
  ENDFOR
  WRITE_GIF, outfname, /CLOSE
ENDIF
END

15195775117 · 发表于 2020-6-9 16:18:03

本帖最后由 15195775117 于 2020-6-9 16:24 编辑

python版处理方法

原帖地址：https://www.cnblogs.com/51python/p/11033002.html

该贴除了提供提供以上各楼的处理手段，最后面还提供了doc转docx的代码

import docx
import zipfile
import os
import shutil

'''读取word中的文本'''
def gettxt():
file=docx.Document("gao.docx")
print("段落数:"+str(len(file.paragraphs)))#段落数为13，每个回车隔离一段

#输出每一段的内容
# for para in file.paragraphs:
#    print(para.text)

#输出段落编号及段落内容
for i in range(len(file.paragraphs)):
      if len(file.paragraphs.text.replace(' ',''))>4:
         print("第"+str(i)+"段的内容是："+file.paragraphs.text)


'''读取word中的table'''
def gettable():
doc = docx.Document('word.docx')
for table in doc.tables:  # 遍历所有表格
      print ('----table------')
      for row in table.rows:  # 遍历表格的所有行
         # row_str = '\t'.join([cell.text for cell in row.cells])  # 一行数据
         # print row_str
         for cell in row.cells:
            print (cell.text, '\t')

'''获取解压后的文件信息'''
def getinfo(wordfile):
f=zipfile.ZipFile(wordfile,'r')
for filename in f.namelist():
      f.extract(filename)
      print(filename)

'''
输出解压后的信息：
_rels/
_rels/.rels
customXml/
customXml/_rels/
customXml/_rels/item1.xml.rels
customXml/_rels/item2.xml.rels
customXml/item1.xml
customXml/item2.xml
customXml/itemProps1.xml
customXml/itemProps2.xml
docProps/
docProps/app.xml
docProps/core.xml
docProps/custom.xml
docProps/thumbnail.wmf
word/
word/_rels/
word/_rels/document.xml.rels
word/document.xml
word/fontTable.xml
word/media/
word/media/image1.jpeg
word/numbering.xml
word/settings.xml
word/styles.xml
word/theme/
word/theme/theme1.xml
'''

'''
------获取图：
word文档的路径
zip压缩文件的路径
临时解压的tmp路径
最后需要保存的store_path路径
'''
def getpic(path, zip_path, tmp_path, store_path):
'''
:param path:源文件
:param zip_path:docx重命名为zip
:param tmp_path:中转图片文件夹
:param store_path:最后保存结果的文件夹（需要手动创建）
:return:
'''
'''=============将docx文件重命名为zip文件===================='''
os.rename(path, zip_path)
# 进行解压
f = zipfile.ZipFile(zip_path, 'r')
# 将图片提取并保存
for file in f.namelist():
      f.extract(file, tmp_path)
# 释放该zip文件
f.close()
'''=============将docx文件从zip还原为docx===================='''
os.rename(zip_path, path)
# 得到缓存文件夹中图片列表
pic = os.listdir(os.path.join(tmp_path, 'word/media'))
'''=============将图片复制到最终的文件夹中===================='''
for i in pic:
      # 根据word的路径生成图片的名称
      new_name = path.replace('\\', '_')
      new_name = new_name.replace(':', '') + '_' + i
      shutil.copy(os.path.join(tmp_path + '/word/media', i), os.path.join(store_path, new_name))
'''=============删除缓冲文件夹中的文件，用以存储下一次的文件===================='''
for i in os.listdir(tmp_path):
      # 如果是文件夹则删除
      if os.path.isdir(os.path.join(tmp_path, i)):
         shutil.rmtree(os.path.join(tmp_path, i))



if __name__ == '__main__':
# 源文件
path = r'E:\dogcat\提取图片\log.docx'
# docx重命名为zip
zip_path = r'E:\dogcat\提取图片\log.zip'
# 中转图片文件夹
tmp_path = r'E:\dogcat\提取图片\tmp'
# 最后保存结果的文件夹
store_path = r'E:\dogcat\提取图片\测试'
m = getpic(path, zip_path, tmp_path, store_path)

# 至于处理doc文件直接转存成docx文件就可以了

def docTTTTTdocx(doc_name, docx_name):
　　try:
      # 首先将doc转换成docx
      word = client.Dispatch("Word.Application")
      doc = word.Documents.Open(doc_name)
      # 使用参数16表示将doc转换成docx
      doc.SaveAs(docx_name, 16)
      doc.Close()
      word.Quit()
except:
      pass

# 这里如果转换不成功，可能是路径的问题，把doc_name换成完整路径，如下：
from win32com.client import Dispatch
def docToDocxR(docPath, docxPath):
'''将doc转存为docx'''
word = Dispatch('Word.Application')
pathPrefix = sys.path[0]+'\\'
print(pathPrefix)
doc = word.Documents.Open(pathPrefix+docPath)
doc.SaveAs(pathPrefix+docxPath, FileFormat=12)
doc.Close()
word.Quit()

SonGoku · 发表于 2020-7-31 21:57:48

我就改了路径，结果报错了

Compiled module: CREATE_GIF_ANIMATION.
% Expression must be a scalar or 1 element array in this context: <BYTE Array[17]>.
% Execution halted at: CREATE_GIF_ANIMATION 39 E:\idl\IDL_gif\create_gif_animation.pro
% $MAIN$

TIM截图20200731215700.png

麻烦帮我看下是怎么回事吧

15195775117 · 发表于 2020-8-1 12:47:50

SonGoku 发表于 2020-7-31 21:57
我就改了路径，结果报错了

Compiled module: CREATE_GIF_ANIMATION.

既然代码没怎么改，问题可能出在数据上，能看下图片的命名和大小吗？

SonGoku · 发表于 2020-8-2 15:31:46

图片是在qq缓存中随机挑选的，尺寸已经调成了同样的，大小在15-70KB左右 TIM截图20200802152916.png

15195775117 · 发表于 2020-8-3 08:46:58

SonGoku 发表于 2020-8-2 15:31
图片是在qq缓存中随机挑选的，尺寸已经调成了同样的，大小在15-70KB左右

我之前处理的图的命名是：01.jpg，02.jpg，03.jpg，......，11.jpg
你按这样命名即可，代码中是要把图名转为数字的，所以报错

SonGoku · 发表于 2020-8-3 19:06:15

本帖最后由 SonGoku 于 2020-8-3 19:48 编辑

15195775117 发表于 2020-8-3 08:46
我之前处理的图的命名是：01.jpg，02.jpg，03.jpg，......，11.jpg
你按这样命名即可，代码中是要把图名 ...

你的code里是图的命名是'.png'，我试着改过名字了，还是报一样的错误；-----------------------------------------------------------------------------------------------
找到错误啦，改了这一句，加了'中括号i中括号'（不知道为什么打不出中括号，只能这样表示了）

15195775117 · 发表于 2020-8-3 21:32:38

SonGoku 发表于 2020-8-3 19:06
你的code里是图的命名是'.png'，我试着改过名字了，还是报一样的错误；-------------------------------- ...

解决了？那就好

		自动登录	找回密码
密码			立即注册

[混合编程] [已收纳]IDL复合案例之把docx文档中的多图做成gif

登录后查看更多精彩内容~

浏览过的版块