Module Cduce_types.CharSet

Sets of characters (Unicode code points) represented as disjoint intervals. The range of valid code points is [0-0x10ffff]. Surrogate pairs, that is code points in the range [0xd7ff-0xdfff] are allowed but their use is not recommended.

module V : sig ... end

A module for manipulating atom Values.

include Tset.S with type elem = V.t
include Tset.Tset_base
type elem

The type of the values in the set

The type of the set, with mandatory custom operations.

include Custom.T
type t
val dump : Stdlib.Format.formatter -> t -> unit
val check : t -> unit
val equal : t -> t -> bool
val hash : t -> int
val compare : t -> t -> int
val empty : t

The empty set

val any : t

The full set, containing all possible values for this kind.

val atom : elem -> t

atom e creates a singleton set containing element e.

Set operations :

val cup : t -> t -> t

cup t1 t2 returns the unions of t1 and t2.

val cap : t -> t -> t

cap t1 t2 returns the intersection of t1 and t2.

val diff : t -> t -> t

diff t1 t2 returns the set of elements of t1 not in t2.

val neg : t -> t

neg t returns the set diff any t.

module Infix : sig ... end

Type specific operations:

val char_class : V.t -> V.t -> t

char_class i j returns the set of codepoints between i and j inclusive. Returns empty if i > j.

val mk_classes : (V.t * V.t) list -> t

mk_classes l returns the set of disjoint unions of ranges of code points in l. Overlaping ranges are supported (and simplified).

val is_char : t -> V.t option

is_char t returns Some c if t is the singleton containing c, otherwise returns None.

val extract : t -> (V.t * V.t) list

extract t returns the list of interval of code points in t. The returned interval are disjoint and in increasing order of their lower bound.

Membership:

val is_empty : t -> bool

is_empty t checks wheter t is the empty set.

val contains : V.t -> t -> bool

contains i t checks whether the integer a belongs to t

val single : t -> V.t

single t assumes t is a singleton and returns its unique element.

raises [Not_found]

if t is the empty set

raises [Exit]

if t contains more than one element

val disjoint : t -> t -> bool

disjoint t1 t2 checks whether t1 and t2 have an empty intersection.

val sample : t -> V.t

sample t returns an element of t.

raises [Not_found]

if t is empty.

Formatting functions :

val print : t -> (Stdlib.Format.formatter -> unit) list

print t returns, for each interval in the set t, a function that prints the interval. Singleton intervals are juste printed as c (using V.print). The intervals are always disjoints and printed in increasing order of their lower-bound, separated by "|". As a special case, any, that is the interval 0-0x1f0000 is printed as Char.