在使用python中的xml.etree.ElementTree库解析xml文件时,默认不会做任何的换行和缩进处理,导致输出的xml文件阅读起来很不方便。以高通平台小基站射频校准配置文件为例,笔者希望该xml文件的格式如下所示,这样很容易找出Tx、Rx、Nl等等射频链路的校准配置。
<?xml version='1.0' encoding='utf-8'?> <CalConfigDefinitions><BoardConfigs><BoardDef name="F02001-2" configs="F02001-2_LTE20" /></BoardConfigs><CalConfigs><CalConfig name="F02001-2_LTE5" topology="407"><TxConfig txPathIdList="TX1,TX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" enableFb="false" txRefFreq="2590" listOfTxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" firstGainState="40" firstGainStatePowerLimitHigh="10" firstGainStatePowerLimitLow="-45" highestAllowedGainState="0" targetMaxPower="17" txDcLeakageLimit="-40" txIqImageLimit="-40" maxFreqSweepPeakToPeakDelta="10" rxFbDcLeakageLimitDbfs="-34" rxFbIqImageLimitDbc="-40" rxfbSignalDbfsMin="-20" rxfbSignalDbfsMax="0" txFreqSweepTargetReferencePowerDbm="17" txDigitalGainMaxDb="-3" txHighDcLeakageLimit="-43" txLowCutoffGainState="30" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" txMonotonicCheckThresholdDb="-3" /><RxFb1AclrNoiseCal rxfbPathIdList="FB1,FB2" bwConst="LTE5" band="B41" fg="2" freq="2590" rxfbMaxGainState="16" /><RxConfig rxPathIdList="RX1,RX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" rxRefFreqMhz="2590" listOfRxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" rxGainStateList="0,1,2,3,4,5,6,7" rxSigGenPowersForGainState="-66,-59,-56,-51,-46,-41,-31,-21" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" /><RxConfig rxPathIdList="NL1" antennaNum="3" bwConst="LTE20" band="B25" fg="3" rxRefFreqMhz="1960" listOfRxSweepFreqMhz="1930,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995" rxGainStateList="0,1,2,3,4,5" rxSigGenPowersForGainState="-65,-57,-51,-40,-29,-18" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" /></CalConfig></CalConfigs> </CalConfigDefinitions> 1234567891011121314
而实际Python xml.etree.ElementTree库默认输出的xml文件如下,所有节点堆积在同一行。下面通过增加换行和缩进来美化xml文件,读者不需要纠结该xml文件的实际意义,本文仅以此为例。
<?xml version='1.0' encoding='utf-8'?> <CalConfigDefinitions><BoardConfigs><BoardDef name="F02001-2" configs="F02001-2_LTE20" /></BoardConfigs><CalConfigs><CalConfig name="F02001-2_LTE5" topology="407"><TxConfig txPathIdList="TX1,TX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" enableFb="false" txRefFreq="2590" listOfTxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" firstGainState="40" firstGainStatePowerLimitHigh="10" firstGainStatePowerLimitLow="-45" highestAllowedGainState="0" targetMaxPower="17" txDcLeakageLimit="-40" txIqImageLimit="-40" maxFreqSweepPeakToPeakDelta="10" rxFbDcLeakageLimitDbfs="-34" rxFbIqImageLimitDbc="-40" rxfbSignalDbfsMin="-20" rxfbSignalDbfsMax="0" txFreqSweepTargetReferencePowerDbm="17" txDigitalGainMaxDb="-3" txHighDcLeakageLimit="-43" txLowCutoffGainState="30" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" txMonotonicCheckThresholdDb="-3" /><RxFb1AclrNoiseCal rxfbPathIdList="FB1,FB2" bwConst="LTE5" band="B41" fg="2" freq="2590" rxfbMaxGainState="16" /><RxConfig rxPathIdList="RX1,RX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" rxRefFreqMhz="2590" listOfRxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" rxGainStateList="0,1,2,3,4,5,6,7" rxSigGenPowersForGainState="-66,-59,-56,-51,-46,-41,-31,-21" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" /><RxConfig rxPathIdList="NL1" antennaNum="3" bwConst="LTE20" band="B25" fg="3" rxRefFreqMhz="1960" listOfRxSweepFreqMhz="1930,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995" rxGainStateList="0,1,2,3,4,5" rxSigGenPowersForGainState="-65,-57,-51,-40,-29,-18" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" /></CalConfig></CalConfigs></CalConfigDefinitions> 12
Python版本: Python 3.7
Xml解析库: xml.etree.ElementTree
IDE: PyCharm
遇到问题首先想到的是搜一下大神们是如何操作的,网上的解决办法基本上大同小异:在创建好所有节点之后、初始化tree之前进行换行(n)和缩进(t)处理,如以下部分代码所示。
def prettyXml(self,tree,indent,newline,level=0): """ Pretty Xml File before writing :param tree: 创建好的父节点,未初始化的tree :param indent: 缩进“t” :param newline: 换行“n” :param level: 用于递归操作的变量,实现不同级别节点增加不同数量的缩进“t” :return: 无 """ treeList=list(tree) for subElement in treeList: # print(subElement) if treeList.index(subElement) < (len(treeList) - 1): subElement.tail=newline+indent*(level+1) else: subElement.tail=newline+indent*level self.prettyXml(subElement,indent,newline,level+1) # parent_node的创建过程省略 prettyXml(parent_node,'t','n')# 换行和缩进处理 tree=ET.ElementTree(parent_node)# 初始化tree,ET为导入的xml.etree.ElementTree,导入过程省略 write_xml(tree, r"E:2_PersonalStudy1_PythonStudyMyPyCharmProjectTestDemovenvOutputTest.xml")# 生成xml文件,方法省略
12345678910111213141516171819202122通过以上处理后,可以得到如下xml文件。看起来要美观很多,但是和笔者想要得到的结果还有一定差距。返回来查看上述prettyXml方法,只能实现在节点的tail处增加换行和缩进处理,对于父节点CalConfigDefinitions、BoardConfigs、CalConfigs的“<…>”末尾处无法处理,这是该方法的不足之处,需要进一步处理。
<?xml version='1.0' encoding='utf-8'?> <CalConfigDefinitions><BoardConfigs><BoardDef name="F02001-2" configs="F02001-2_LTE20" /></BoardConfigs><CalConfigs><CalConfig name="F02001-2_LTE5" topology="407"><TxConfig txPathIdList="TX1,TX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" enableFb="false" txRefFreq="2590" listOfTxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" firstGainState="40" firstGainStatePowerLimitHigh="10" firstGainStatePowerLimitLow="-45" highestAllowedGainState="0" targetMaxPower="17" txDcLeakageLimit="-40" txIqImageLimit="-40" maxFreqSweepPeakToPeakDelta="10" rxFbDcLeakageLimitDbfs="-34" rxFbIqImageLimitDbc="-40" rxfbSignalDbfsMin="-20" rxfbSignalDbfsMax="0" txFreqSweepTargetReferencePowerDbm="17" txDigitalGainMaxDb="-3" txHighDcLeakageLimit="-43" txLowCutoffGainState="30" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" txMonotonicCheckThresholdDb="-3" /><RxFb1AclrNoiseCal rxfbPathIdList="FB1,FB2" bwConst="LTE5" band="B41" fg="2" freq="2590" rxfbMaxGainState="16" /><RxConfig rxPathIdList="RX1,RX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" rxRefFreqMhz="2590" listOfRxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" rxGainStateList="0,1,2,3,4,5,6,7" rxSigGenPowersForGainState="-66,-59,-56,-51,-46,-41,-31,-21" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" /><RxConfig rxPathIdList="NL1" antennaNum="3" bwConst="LTE20" band="B25" fg="3" rxRefFreqMhz="1960" listOfRxSweepFreqMhz="1930,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995" rxGainStateList="0,1,2,3,4,5" rxSigGenPowersForGainState="-65,-57,-51,-40,-29,-18" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" /></CalConfig></CalConfigs> </CalConfigDefinitions> 12345678910
让我们站在大神们的肩膀上继续成长。既然prettyXml方法只能在tail处操作,那是否可以通过修改ElementTree.py源码实现"<…>"末尾处换行和缩进呢?
查找ElementTree.py源码,发现可以在 _serialize_xml方法中增加换行和缩进处理,修改后的代码如下(代码中还去除了节点属性的默认排序功能,读者可忽略此部分处理,如果感兴趣可以查看笔者的另一篇关于xml节点属性排序问题的博客)。
综上,通过修改_serialize_xml方法并结合使用prettyXml方法,便可以输出问题摘要章节中笔者想要得到的xml文件。
def _serialize_xml(write, elem, qnames, namespaces, short_empty_elements, level=1, **kwargs):# 增加level参数,用于递归操作,不同级别节点增加不同数量的“t” tag = elem.tag text = elem.text if tag is Comment: write("<!--%s-->" % text) elif tag is ProcessingInstruction: write("<?%s?>" % text) else: tag = qnames[tag] if tag is None: if text: write(_escape_cdata(text)) for e in elem: _serialize_xml(write, e, qnames, None, short_empty_elements=short_empty_elements) else: write("<" + tag) items = list(elem.items()) if items or namespaces: if namespaces: for v, k in sorted(namespaces.items(), key=lambda x: x[1]): if k: k = ":" + k write(" xmlns%s="%s"" % ( k, _escape_attrib(v) )) for k, v in items:# 【此处可不做关注】和源码比较,去掉了sorted方法,实现节点属性顺序的自定义 if isinstance(k, QName): k = k.text if isinstance(v, QName): v = qnames[v.text] else: v = _escape_attrib(v) write(" %s="%s"" % (qnames[k], v)) if text or len(elem) or not short_empty_elements: write(">n" + level * "t")# 在>处增加“n”和“t” if text: write(_escape_cdata(text)) for e in elem: _serialize_xml(write, e, qnames, None, short_empty_elements=short_empty_elements,level=level+1)# 增加level参数,低级别节点“t”加1 write("</" + tag + ">") else: write(" />") if elem.tail: write(_escape_cdata(elem.tail))
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849'虽然已经实现了xml文件的美化,但是我们是通过修改 _serialize_xml和使用prettyXml来实现的,其实完全可以合并到一起,也就是将节点tail的处理也加入到_serialize_xml方法中,代码如下,输出xml的效果是一样的。
def _serialize_xml(write, elem, qnames, namespaces, short_empty_elements, level=1, **kwargs):# 增加level参数,用于递归操作,不同级别节点增加不同数量的“t” tag = elem.tag text = elem.text if tag is Comment: write("<!--%s-->" % text) elif tag is ProcessingInstruction: write("<?%s?>" % text) else: tag = qnames[tag] if tag is None: if text: write(_escape_cdata(text)) for e in elem: _serialize_xml(write, e, qnames, None, short_empty_elements=short_empty_elements) else: write("<" + tag) items = list(elem.items()) if items or namespaces: if namespaces: for v, k in sorted(namespaces.items(), key=lambda x: x[1]): # sort on prefix if k: k = ":" + k write(" xmlns%s="%s"" % ( k, _escape_attrib(v) )) for k, v in items: # 【此处可不做关注】和源码比较,去掉了sorted方法,实现节点属性顺序的自定义 if isinstance(k, QName): k = k.text if isinstance(v, QName): v = qnames[v.text] else: v = _escape_attrib(v) write(" %s="%s"" % (qnames[k], v)) if text or len(elem) or not short_empty_elements: write(">n" + level * "t")# 在>处增加“n”和“t” if text: write(_escape_cdata(text)) elemList=list(elem)# 将elem转化成列表 for e in elem:# 循环处理各个节点 if elemList.index(e) < (len(elemList) - 1): e.tail = "n" + "t" * (level)# 父节点的子节点换行和缩进处理(最后一个子节点除外) else: e.tail = "n" + "t" * (level - 1)# 父节点的最后一个子节点换行和缩进处理 _serialize_xml(write, e, qnames, None, short_empty_elements=short_empty_elements,level=level+1)# 增加level参数,低级别节点“t”加1 write("</" + tag + ">") else: write(" />") if elem.tail: write(_escape_cdata(elem.tail))
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354'相关知识
XML基本解析
xml 属性value换行显示
[Java拾遗一] XML的书写规范与解析.
js上传文件带参数,并且,返回给前台文件路径,解析上传的xml文件,存储到数据库中
常见C/C++ XML解析器比较
JSON格式化 json在线解析工具 在线json格式校验
花了1个月时间,把Python库全部整理出来了,覆盖所有,建议收藏
将字体添加为 XML 资源
基于YOLOv5的PCB板缺陷检测
使用YOLOv8训练该数据集农业害虫检测数据集 农业虫害数据集.该数据集的害虫类别共为三类,该数据集共4010张JPG图片,标签文件为xml格式,4010个。
网址: Python解析xml文件: ElementTree解析xml换行和缩进美化问题 https://m.huajiangbk.com/newsview1356928.html
上一篇: 创造炫酷的上位机界面:C#窗体美 |
下一篇: echarts折线图美化 |