Module YAMLx

YAMLx — pure-OCaml YAML 1.2 parser.

Typical usage:

  (* Read a single-document config file — most common pattern *)
  match YAMLx.Value.of_yaml_file "config.yaml" with
  | Ok value  -> (* process value *)
  | Error msg -> (* handle error *)

  (* Parse a YAML string into a single typed value — raising variant *)
  let value = YAMLx.Value.of_yaml_exn "answer: 42\nflag: true"

  (* Multi-document YAML stream *)
  match YAMLx.Values.of_yaml "doc1\n---\ndoc2" with
  | Ok docs  -> (* process docs *)
  | Error msg -> (* handle error *)

  (* Serialize typed values back to YAML *)
  let yaml = YAMLx.Values.to_yaml docs

  (* Parse preserving the full AST (tags, anchors, positions) *)
  let nodes = YAMLx.Nodes.of_yaml_exn "- foo\n- bar"

  (* Serialize nodes back to YAML *)
  let yaml = YAMLx.Nodes.to_yaml nodes

All errors are reported by raising Error. Use Value.of_yaml or Value.of_yaml_file to get a result instead of raising.

Source positions

type pos = {
  1. line : int;
  2. column : int;
    (*

    0-based Unicode codepoint column

    *)
  3. column_bytes : int;
    (*

    0-based UTF-8 byte column

    *)
  4. offset : int;
    (*

    codepoint index from the start of the input

    *)
  5. offset_bytes : int;
    (*

    UTF-8 byte offset from the start of the input

    *)
}

A location in the YAML source text. line is 1-based. column and column_bytes are 0-based distances from the start of the line, in codepoints and UTF-8 bytes respectively. offset and offset_bytes are absolute distances from the start of the input, measured the same way. The byte fields make it easy to slice the original string without re-encoding.

val pp_pos : Ppx_deriving_runtime.Format.formatter -> pos -> Ppx_deriving_runtime.unit
val show_pos : pos -> Ppx_deriving_runtime.string
type loc = {
  1. start_pos : pos;
  2. end_pos : pos;
}

A source range. start_pos is the first character of the node; end_pos is the position immediately after the last character.

val pp_loc : Ppx_deriving_runtime.Format.formatter -> loc -> Ppx_deriving_runtime.unit
val show_loc : loc -> Ppx_deriving_runtime.string
val zero_pos : pos

The position at the very start of an empty input: line=1, column=0, offset=0, all byte fields 0. Useful for constructing nodes programmatically when source positions are not meaningful.

val zero_loc : loc

A zero-length location at the start of an empty input: both start_pos and end_pos equal zero_pos. Useful for constructing nodes programmatically when source locations are not meaningful.

YAML schema

type schema =
  1. | Yaml_1_2
  2. | Yaml_1_1
    (*

    YAML schema used to resolve untagged plain scalars to typed values.

    • Yaml_1_2 (the default): YAML 1.2 JSON schema. Booleans are only true/false; octal uses 0o… prefix; sexagesimal not recognised.
    • Yaml_1_1: YAML 1.1 schema. Extended booleans (yes/no/on/off etc.), 0…-prefixed octal, sexagesimal integers and floats, and merge-key (<<) expansion. Use this to read legacy YAML files.

    New projects should use YAML 1.2. Pass ~schema:Yaml_1_1 to Values.of_yaml_exn (and friends) when reading older files. A %YAML 1.1 or %YAML 1.2 directive in the stream selects the schema automatically for that document (use ~strict_schema:true to make a mismatch an error instead).

    *)

Errors

type yaml_error = {
  1. msg : string;
  2. loc : loc;
}
type error =
  1. | Scan_error of yaml_error
    (*

    Invalid character sequence or encoding error detected by the scanner. Carries a position.

    *)
  2. | Parse_error of yaml_error
    (*

    Well-formed tokens in an invalid order detected by the parser. Carries a position.

    *)
  3. | Expansion_limit_exceeded of int
    (*

    Alias expansion visited more nodes than the configured limit. The payload is the limit that was exceeded. See default_expansion_limit.

    *)
  4. | Depth_limit_exceeded of int
    (*

    YAML nesting depth exceeded the configured maximum during composition. The payload is the limit that was exceeded. See default_max_depth.

    *)
  5. | Printer_error of string
    (*

    A feature unsupported by the plain-YAML printer was encountered (e.g. a tag, a complex mapping key).

    *)
  6. | Document_count_error of string
    (*

    The input contained the wrong number of documents for a single-document operation.

    *)
  7. | Schema_error of yaml_error
    (*

    A schema conflict: the document's %YAML directive disagrees with the requested schema (when ~strict_schema:true), or a plain scalar is ambiguous between YAML 1.1 and 1.2 (when ~reject_ambiguous:true).

    *)
  8. | Simplicity_error of yaml_error
    (*

    A YAML feature not allowed in plain mode was encountered: an anchor, alias, explicit tag, or (in YAML 1.1 mode) a merge key (<<). Raised when ~plain:true is passed to Values functions.

    *)
  9. | Duplicate_key_error of yaml_error
    (*

    A mapping contains a duplicate key. Raised when ~strict_keys:true is passed to Values functions. The location points to the second (duplicate) occurrence.

    *)
  10. | Cycle_error of yaml_error
    (*

    A cyclic alias was encountered during value resolution. The YAML structure is valid (e.g. &doc {a: *doc}) but cannot be represented as a finite value tree. The location points to the alias that closes the cycle.

    *)
exception Error of error

The single exception raised by this library. Match on the payload to distinguish error kinds:

  match YAMLx.Nodes.of_yaml_exn input with
  | nodes -> ...
  | exception YAMLx.Error (YAMLx.Scan_error e) -> ...
  | exception YAMLx.Error (YAMLx.Parse_error e) -> ...
  | exception YAMLx.Error (YAMLx.Depth_limit_exceeded n) -> ...
  | exception YAMLx.Error _ -> ...
val format_loc : ?file:string -> loc -> string

Default location formatter used by catch_errors and register_exception_printers.

The output format depends on the extent of loc:

  • Zero-width (start = end): "line 3, column 8"
  • Single-line range: "line 3, columns 8-11"
  • Multi-line range: "lines 3-12, columns 8-4"

When ~file is given, a "file <name>, " prefix is prepended, e.g. "file foo.yaml, line 3, columns 8-11".

Columns are 0-based Unicode codepoint offsets from the start of the line, matching the pos fields column (not column_bytes).

val default_format_loc : ?file:string -> loc -> string
  • deprecated

    After version 0.1.0, this function was renamed format_loc.

val catch_errors : ?file:string -> ?format_loc:(?file:string -> loc -> string) -> (unit -> 'a) -> ('a, string) Stdlib.result

Catch Error and return Ok _ or Error msg.

When ~file is given it is prepended to every error message: positional errors (scan/parse/schema) become "file foo.yaml, line L, columns C1-C2: msg" and non-positional errors become "file foo.yaml: msg".

~format_loc overrides how source locations are formatted (default: format_loc). Provide a custom implementation to adapt the output for editors, LSP servers, or structured logging.

val register_exception_printers : ?format_loc:(?file:string -> loc -> string) -> unit -> unit

Register a printer for Error so it displays legibly in uncaught-exception output. ~format_loc overrides location formatting (default: default_format_loc).

val default_expansion_limit : int

Default node-visit budget for alias expansion (1,000,000).

val default_max_depth : int

Default maximum nesting depth (512).

val show_yaml_error : ?format_loc:(?file:string -> loc -> string) -> yaml_error -> string

Format a yaml_error as "location: message". Uses default_format_loc by default; pass ~format_loc to customise location formatting.

Scalar styles

type scalar_style =
  1. | Plain
    (*

    unquoted, e.g. foo

    *)
  2. | Single_quoted
    (*

    e.g. 'foo'

    *)
  3. | Double_quoted
    (*

    e.g. "foo"

    *)
  4. | Literal
    (*

    block scalar |: newlines preserved

    *)
  5. | Folded
    (*

    block scalar >: newlines folded to spaces

    *)

How a scalar value was written in the source. Preserved in AST nodes so callers can distinguish, for example, a quoted empty string from an unquoted null.

val pp_scalar_style : Ppx_deriving_runtime.Format.formatter -> scalar_style -> Ppx_deriving_runtime.unit
val show_scalar_style : scalar_style -> Ppx_deriving_runtime.string

AST nodes

type node =
  1. | Scalar_node of {
    1. anchor : string option;
      (*

      &name if present

      *)
    2. tag : string option;
      (*

      resolved tag URI if present

      *)
    3. value : string;
    4. style : scalar_style;
    5. loc : loc;
    6. height : int;
      (*

      always 1 for scalars

      *)
    7. head_comments : string list;
    8. line_comment : string option;
    9. foot_comments : string list;
    }
  2. | Sequence_node of {
    1. anchor : string option;
    2. tag : string option;
    3. items : node list;
    4. flow : bool;
      (*

      true for [a, b] style, false for block

      *)
    5. loc : loc;
    6. height : int;
      (*

      1 + max height of items, or 1 if empty

      *)
    7. head_comments : string list;
    8. line_comment : string option;
    9. foot_comments : string list;
    }
  3. | Mapping_node of {
    1. anchor : string option;
    2. tag : string option;
    3. pairs : (node * node) list;
    4. flow : bool;
      (*

      true for {a: b} style, false for block

      *)
    5. loc : loc;
    6. height : int;
      (*

      1 + max height of keys and values, or 1 if empty

      *)
    7. head_comments : string list;
    8. line_comment : string option;
    9. foot_comments : string list;
    }
  4. | Alias_node of {
    1. name : string;
      (*

      the anchor name, without the *

      *)
    2. resolved : node Stdlib.Lazy.t;
      (*

      The node this alias refers to. Lazy to allow cycles (e.g. a node that contains an alias to itself). Force with Lazy.force when traversing.

      *)
    3. loc : loc;
    4. height : int;
      (*

      always 1 for alias nodes

      *)
    5. head_comments : string list;
    6. line_comment : string option;
    7. foot_comments : string list;
    }

An in-memory representation of a parsed YAML document that preserves all source-level detail: tags, anchors, scalar styles, source positions, and — on a best-effort basis — comments.

Comment preservation is best-effort. Comments inside flow collections are dropped. The attachment rules may change in future versions.

  • head_comments: standalone comment lines immediately before the node. Each string is one comment line's text, without the leading '#'.
  • line_comment: a comment on the same source line as the node, after its content. Text does not include the leading '#'.
  • foot_comments (collections only): standalone comment lines after the last child of the collection, before the next sibling.
val pp_node : Ppx_deriving_runtime.Format.formatter -> node -> Ppx_deriving_runtime.unit
val show_node : node -> Ppx_deriving_runtime.string

Typed values

type value =
  1. | Null of loc
  2. | Bool of loc * bool
  3. | Int of loc * int64
  4. | Float of loc * float
  5. | String of loc * string
  6. | Seq of loc * value list
  7. | Map of loc * (loc * value * value) list

A YAML value resolved according to the YAML 1.2 JSON schema.

Plain (unquoted) scalars are matched against the following patterns:

  • null, Null, NULL, ~, or empty string → Null
  • true/True/TRUE/false/False/FALSEBool
  • Decimal, 0x… hex, or 0o… octal integers → Int
  • Decimal or scientific floats; .inf, .nan variants → Float
  • Everything else, and all quoted or block scalars → String

Each constructor carries a loc giving the source range of the corresponding YAML node. Use Value.equal for location-independent structural equality.

pp_value and show_value are derived by @@deriving show and are primarily useful when another type embeds value and also uses @@deriving show. For direct use prefer Value.pp and Value.show.

val pp_value : Ppx_deriving_runtime.Format.formatter -> value -> Ppx_deriving_runtime.unit
val show_value : value -> Ppx_deriving_runtime.string
val value_loc : value -> loc

Return the source location carried by a value.

Node operations

module Nodes : sig ... end

Operations on the lossless AST node representation.

module Node : sig ... end

Operations on a single lossless AST node.

Value operations

val equal_value : value -> value -> bool

Structural equality that ignores source locations.

module Values : sig ... end

Operations on typed YAML values for multi-document streams.

module Value : sig ... end

Single-document interface for typed YAML values.