XML 解析和发出行为变更

Attention

This behavior change is in the 2025_01 bundle.

For the current status of the bundle, refer to Bundle History.

When this behavior change bundle is enabled, parsing and emitting XML content changes when using the COPY INTO <table> command with the XML file format and when calling the following functions:

Before the change:

XML 解析和发出行为:

  • 某些调用 CHECK_XML 函数的查询返回带错误消息的字符串。
  • 某些调用 PARSE_XML 函数的查询失败。
After the change:

XML 解析和发出行为:

  • Some queries that returned a string with an error message when calling the CHECK_XML function before the change now return NULL.
  • Some queries that failed when calling the PARSE_XML function before the change now succeed, and the function returns the parsed XML.
  • 具有包含尖括号或撇号的 XML 字符串的查询,在更改后返回不同的结果。
  • Queries with XML strings containing white space or XML attributes relating to preserving white space return different results after the change.

以下部分提供有关更改的更多详细信息。

解析包含处理指令的 XML 内容

以下示例使用 PARSE_XML 函数解析处理指令中带问号的 XML 内容:

SELECT PARSE_XML('<?PITarget PIContent ??><mytag />') AS mytag;
Returned before the change::
100100 (22P02): Error parsing XML: prematurely terminated XML document in processing instructions, pos 33
Returned after the change::
+-----------------+
| MYTAG           |
%-----------------%
| <mytag></mytag> |
+-----------------+

解析包含尖括号或撇号的 XML 内容

以下示例使用 PARSE_XML 函数解析在 XML 属性值中包含尖括号和撇号的 XML 内容。更改后,XML 属性值中的撇号和尖括号在返回值和发出的 XML 中会正确转义:

SELECT PARSE_XML('<mytag myattr="&lt;&gt;\'"/>') AS mytag;
Returned before the change::
+------------------------------+
| MYTAG                        |
%------------------------------%
| <mytag myattr="<>'"></mytag> |
+------------------------------+
Returned after the change::
+-----------------------------------------+
| MYTAG                                   |
%-----------------------------------------%
| <mytag myattr="&lt;&gt;&apos;"></mytag> |
+-----------------------------------------+

解析包含用户定义实体的 XML 内容

以下示例使用 PARSE_XML 函数解析包含用户定义实体的 XML 内容:

SELECT PARSE_XML('<!DOCTYPE doc [<!ENTITY placeholder "some text">]><doc>&placeholder;</doc>')
  AS placeholder;
Returned before the change::
100100 (22P02): Error parsing XML: unknown entity &placeholder;, pos 68
Returned after the change::
+-------------------------------------------------------------+
| PLACEHOLDER                                                 |
%-------------------------------------------------------------%
| <!DOCTYPE doc [<!ENTITY placeholder "some                   |
| text">]><doc>some text</doc>                                |
+-------------------------------------------------------------+

解析保留空格的 XML 内容

This change was made so the behavior in Snowflake matches the XML specification (https://www.w3.org/TR/xml11/#sec-white-space) regarding preservation of whitespace:

  • Before the change, whitespace is preserved for the xsl:space="preserve" attribute. After the change, whitespace isn’t preserved for the xsl:space="preserve" attribute.
  • Before the change, whitespace isn’t preserved for the xml:space="preserve" attribute. After the change, whitespace is preserved for the xml:space="preserve" attribute.

The following example uses the PARSE_XML function to parse XML content and specifies the xsl:space="preserve" attribute:

SELECT PARSE_XML('<mytag xsl:space="preserve"> my content </mytag>')
  AS space_preserve;
Returned before the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
%--------------------------------------------------%
| <mytag xsl:space="preserve"> my content </mytag> |
+--------------------------------------------------+
Returned after the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
%--------------------------------------------------%
| <mytag xsl:space="preserve">my content</mytag>   |
+--------------------------------------------------+

The following example uses the PARSE_XML function to parse XML content and specifies the xml:space="preserve" attribute:

SELECT PARSE_XML('<mytag xml:space="preserve"> my content </mytag>')
  AS space_preserve;
Returned before the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
%--------------------------------------------------%
| <mytag xml:space="preserve">my content</mytag>   |
+--------------------------------------------------+
Returned after the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
%--------------------------------------------------%
| <mytag xml:space="preserve"> my content </mytag> |
+--------------------------------------------------+

加载保留空格的 XML 内容

The following example loads data into a table using the COPY INTO <table> command. The PRESERVE_SPACE parameter is set to TRUE to preserve white space:

COPY INTO mytable
  FROM @my_xml_stage
  FILE_FORMAT = (TYPE = 'XML' PRESERVE_SPACE = TRUE);
Loaded content before the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
%--------------------------------------------------%
| <mytag xsl:space="preserve"> my content </mytag> |
+--------------------------------------------------+
Loaded content after the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
%--------------------------------------------------%
| <mytag xml:space="preserve"> my content </mytag> |
+--------------------------------------------------+

Before and after the change, the content preserves the white space, but the attribute changes from xsl:space="preserve" to xml:space="preserve".

参考:1862