HT Deconstruction

HT Deconstruction and Casual-to-Strict Resolution Doctrine

Status: Canon-aligned operational doctrine
Language: Hadokai Tubatonona / HT
Scope: Word-form validation, strict/casual romanization, deconstruction, right-building morphology, and canon-status classification

Purpose

This doctrine defines how an HT word-form is evaluated, decomposed, reconstructed, confirmed, or rejected.

The goal is not merely to translate a form. The goal is to classify its canonical standing accurately.

A parser, reader, or AI assistant must distinguish between:

  1. a word directly defined by the lexicon;
  2. a word uniquely confirmed by existing canon;
  3. a form that is structurally possible but unresolved;
  4. a form that is invalid under HT structure.

The parser must never infer meaning from structure alone.

Governing principle alignment

This doctrine is governed by the locked Canonical Principle Set.

The strict CVC requirement follows from Principle 7: Consonants Occupy Boundary Positions and from the foundational commitment to bounded resolution at the syllable level.

The additive treatment of roots, class roots, rightward specification, and suffixal construction follows from Principle 8: Morphology Is Additive and Non-Mutating.

The refusal to guess meaning from structure follows from Principle 11: Meaning Is Built, Not Guessed.

The doctrine below does not replace those principles. It operationalizes them for word-form validation and deconstruction.

1. Foundational Premise

HT has two relevant romanized modes: strict romanization and casual romanization.

Strict romanization records the full syllabic structure of the word. Every strict syllable is:

C V C

That is:

opening consonant + vowel nucleus + closing consonant

The silent consonant ungU is written as = and occupies a real consonant slot.

Casual romanization is the readable surface form derived from strict form by removing ungU markers.

Because casual romanization removes structural information, casual-to-strict recovery is sometimes lossy. A casual form does not always contain enough information to recover the canonical strict form.

Therefore:

Strict-to-casual is deterministic. Remove all ungU markers.

Casual-to-strict is not always deterministic. Recovery requires lexicon attestation, unique decomposition, or coiner-supplied strict form.

2. Canonical Component Inventory

Before any decomposition happens, the parser must validate the component inventory.

Canonical consonant set

{b, c, d, f, g, h, j, k, l, m, n, p, r, s, S, t, v, w, y, z, =}

Canonical vowel set

{a, e, i, I, o, u, U}

Only these symbols are canonical HT syllable components.

If any claimed HT word-form contains a component outside these sets, the form is immediately canon invalid.

This is not ambiguity. It is invalidity.

Examples of invalid components include:

q, x, A, E, O, R

Digraphs such as sh, th, or ch are not independent HT consonants. If they appear as two separate canonical consonants, they may be evaluated as two separate consonants. But if the word depends on treating them as a single consonant sound, that claimed component is noncanonical and invalid.

3. Input-Mode Gate

After inventory validation, the parser determines whether the form is strict or casual.

Core rule

If a word-form contains =, it is evaluated as strict.

The parser must not treat it as casual with some ungU markers.

The presence of = means the writer has claimed structural knowledge. Once that claim is made, the form must already be fully strict.

Therefore:

Any word containing = must divide cleanly into complete CVC syllables.

If it does not, it is canon invalid.

Examples:

tu=na=
Valid strict form: tu= + na=

ma==in
Valid strict form: ma= + =in

tun=a
Invalid. It contains =, so it must be strict, but it does not divide cleanly into CVC blocks.

tu=na
Invalid. It contains =, so it must be strict, but the final na is incomplete.

tubr=azna
Invalid. It contains =, so it must be strict, but r=a does not parse as a CVC syllable. The = is a consonant, not a vowel.

tuna
No =, so evaluate as casual.

This rule prevents mixed notation. A strict claim must be complete.

4. Strict-Mode Validation

A strict word must parse as one or more complete CVC syllables.

Each syllable must be exactly:

opening consonant + vowel nucleus + closing consonant

Each consonant slot must be filled by one canonical consonant, including =. The vowel nucleus must be filled by one canonical vowel.

A strict form is valid only if every syllable block follows that pattern.

Valid examples:

tu= = CVC
tun = CVC
=a= = CVC
=al = CVC
ma==in = ma= + =in

Invalid examples:

tu = incomplete CV
=a = incomplete CV
tuna= = not cleanly divisible into CVC syllables
tu=na = second syllable incomplete

If strict validation succeeds, the form is structurally valid. The parser may then check the lexicon.

If strict validation fails, the form is canon invalid.

5. Casual-Mode Evaluation

If the form contains no =, it is evaluated as casual.

Casual mode allows underspecification. A casual form may be defined, confirmable, structured-but-unresolved, or invalid.

Casual evaluation proceeds in this order:

  1. inventory validation;
  2. consonant-run validation;
  3. whole-word lexicon lookup;
  4. forced structural analysis;
  5. CVC completion;
  6. decomposition into attested morphemes through reverse-stripping;
  7. ambiguity assessment;
  8. resolution-state assignment.

A casual form must contain at least one canonical vowel. A form with no vowel nucleus cannot produce a valid HT syllable and is canon invalid.

6. Whole-Word Lexicon Lookup

Whole-word lookup is performed only after the relevant structural gates have passed.

After inventory and mode-validity gates are passed, whole-word lexicon lookup is the highest canonical authority for canon defined status.

The lexicon supplies meaning and canonical strict form for entries that pass structural gates.

A registered entry that violates inventory rules, strict-mode CVC validation, or casual consonant-run limits indicates a lexicon error, not a canonical form. The structural gates are not optional; they are structural truths about the language that the lexicon serves, not overrides.

If the whole casual or strict form is registered in the lexicon, the lexicon supplies the canonical strict form, meaning, grammatical category, and notes.

This produces the strongest status:

canon defined

No decomposition algorithm overrides a direct lexicon entry that has passed structural gates.

Casual aliases

The canonical record of an HT word is its strict form.

A casual form may function as a searchable alias to a strict-form lexicon entry, but the strict form remains the canonical record.

When a casual form is found in the lexicon, the parser should return the registered strict entry rather than inventing a strict form from the surface.

Multiple defined entries

If whole-word lookup returns multiple structurally valid lexicon entries for the same casual form, the form is canon defined but lexically ambiguous.

The parser may report all valid entries, but it must not select one unless context or canon supplies the intended reading.

This is not the same as canon structured. A canon structured form is structurally possible but unresolved by the lexicon. A lexically ambiguous form has valid lexicon entries, but more than one defined reading is available.

7. Structural Rules for Casual Decomposition

If the word is not found as a whole lexicon entry, the parser may begin structural analysis.

Structural rules determine possible strict completions. They do not automatically determine meaning.

7.1 CC Split Rule

Wherever two consonants are adjacent in casual romanization, a syllable boundary falls between them.

The first consonant becomes the coda of the syllable on the left.

The second consonant becomes the onset of the syllable on the right.

CC does not create ungU because both consonant positions are already filled.

Example:

boldar

Forced split:

bol | dar

Strict:

bol + dar

No ungU is required.

7.2 Consonant Run Limits

Consonant runs in casual romanization are bounded by canonical CVC structure.

The maximum allowed lengths are:

Internal consonant run: 2
This represents coda + onset across a syllable boundary.

Beginning consonant run: 1
This represents a single onset for the word-initial syllable.

Ending consonant run: 1
This represents a single coda for the word-final syllable.

Any casual form that exceeds these limits cannot be derived from valid strict form and is canon invalid.

This rule is absolute. There is no canonical mechanism by which longer consonant runs are permitted.

This rule is derived directly from CVC structure. A CVC syllable contributes at most one coda consonant to a boundary; the next syllable contributes at most one onset consonant. That gives a maximum of two consonants across an internal boundary, and exactly one consonant at word edges.

Removing ungU from valid strict forms can produce internal CC runs, but it cannot produce initial CC, final CC, or any CCC run.

Worked examples:

tubrazna
Internal CC runs br and zn, each of length 2. The word starts with a single consonant and ends with a vowel. All limits are respected. Structurally valid.

strolan
Initial consonant run str of length 3. Canon invalid.

brata
Initial consonant run br of length 2. Canon invalid.

tabr
Final consonant run br of length 2. Canon invalid.

7.3 VV Split Rule

HT does not allow diphthongs.

Wherever two vowels are adjacent in casual romanization, a syllable boundary falls between them. Each vowel belongs to its own syllable nucleus.

A VV boundary requires two ungU:

  1. one as the coda of the syllable on the left;
  2. one as the onset of the syllable on the right.

The two ungU sit adjacent in the strict form, rendering as ==.

The doubled == is not one mark. It is two distinct ungU performing two distinct functions: coda of the first syllable and onset of the second.

Examples:

main -> ma | in -> ma= + =in -> ma==in

vual -> vu | al -> vu= + =al -> vu==al

For longer vowel sequences:

VVV contains two VV boundaries and produces four internal ungU.

VVVV contains three VV boundaries and produces six internal ungU.

These counts refer only to the internal ungU produced by VV boundaries. Additional initial or final ungU may still be required depending on the full syllable structure of the word.

7.4 CVC Completion Rule

After forced boundaries are resolved, every syllable must be completed to CVC.

If a syllable lacks a voiced onset, add initial ungU.

If a syllable lacks a voiced coda, add final ungU.

Examples:

CV -> add final ungU: tu -> tu=

VC -> add initial ungU: al -> =al

V -> add both ungU: a -> =a=

CVC -> no ungU required: tun -> tun

ungU is not decoration. It appears only where a required consonant slot is unfilled.

8. Ambiguous Casual Structures

Some casual forms do not contain enough information to recover one strict form.

The major ambiguity occurs when a single consonant appears between two vowels.

Pattern:

V C V

The medial consonant may belong left as a coda or right as an onset. Both may be structurally valid.

Example:

a n a

Possible readings:

an | a

or:

a | na

Both can be completed into strict CVC structure, but they produce different strict forms.

This problem becomes severe in alternating CV chains.

Example:

tunafizavutogarocace

Pattern:

C V C V C V C V C V C V C V C V C V C V

There are no CC anchors. There are no VV anchors. Every interior consonant is negotiable.

With nine interior consonants, there are 2^9 = 512 possible strict completions.

Every completion has ten vowels and therefore ten syllables. Every completion has exactly ten ungU. The completions differ in where those ungU markers are placed.

Therefore, if the lexicon is ignored, this form is not canon defined and not canon confirmed. It is structurally possible but unresolved.

That status is:

canon structured

Meaning:

This form can be HT structurally, but no canonical strict form or meaning can be asserted from the casual surface alone.

9. HT Morphology Is Exclusively Right-Building

HT morphology is strictly additive and non-mutating.

A word begins with its primary conceptual root on the left and develops rightward through additive specification. Each added element further specifies, narrows, modifies, or derives the material to its left.

There is no productive prefix class in HT.

Forms such as zu and ya are not prefixes. They are class-forming roots.

zu anchors temporal constructions.

ya anchors interrogative and seeking constructions.

Their position at the left edge reflects their role as the primary concept of the form, not prefix attachment.

Examples:

zubava = time + past/backward + domain/objecthood -> “the past”

zufova = time + future/forward + domain/objecthood -> “the future”

zufoti = time + future/forward + immediacy -> “immediate future / about to”

yatuna = question/seeking + person -> “who”

yapensa = question/seeking + reason/thought -> “why”

yazu = question/seeking + time -> “when”

Any future left-edge class root must be introduced by explicit canonical action, not derived by analogy.

Time marking

HT time marking is not productive verbal suffixation.

Time is stated independently, not inflected on the verb. Time markers are independent words placed between the subject phrase and the verb phrase.

Temporal words such as zuba, zufo, zufoti, and zufoto are right-built lexical units anchored by zu, the temporal class root. They are not prefixes and do not attach to verbs.

Older material implying productive tense affixes does not reflect current canon and should be audited.

10. Reverse-Stripping as Canonical Search Method

Reverse-stripping is the canonical search method for unattested right-built forms in HT.

It mirrors the language’s construction direction in reverse: the most recently added rightward material comes off first, then the next, until the root or base is exposed.

Reverse-stripping is not a canonical authority that chooses meaning. It is the proper search procedure for right-built forms.

If a form belongs to the zu or ya class, the parser treats zu or ya as the leftmost primary conceptual root and analyzes the remaining material as rightward specification. This is still right-building decomposition, not prefix stripping.

Canonical confirmation requires:

  1. any class-root structure is canonically licensed;
  2. uniqueness across all complete decomposition paths;
  3. attestation of all components;
  4. a lexically licensed compositional meaning.

The method finds candidates. Licensed class-root recognition, uniqueness, attestation, and lexical meaning confirm the result.

10.1 The reverse-stripping procedure

Reverse-stripping operates on a working string. Initially, the working string is the full casual form being decomposed.

If the form is recognized as a zu-class or ya-class construction, the leftmost class root remains the conceptual anchor, while the rightward material is analyzed as additive specification.

The procedure:

  1. Whole-string lookup. Check the lexicon for the entire working string. If attested, record it as a complete match and terminate this branch successfully.
  2. Reverse character removal. If not attested as a whole, remove the rightmost character from the working string to produce a reduced left-anchor candidate.
  3. Reduced candidate lookup. Check the lexicon for the reduced candidate.
  4. On match. If the reduced candidate is attested:
    • record the matched portion as a base anchor with its lexicon entry;
    • treat the removed characters as a new rightward candidate string;
    • recursively apply reverse-stripping to that candidate string;
    • record the result as one possible decomposition path.
  5. On no match. If the reduced candidate is not attested, return to step 2 and remove another character.
  6. On exhaustion. If the working string is reduced to zero characters without finding any match, this branch fails. No decomposition exists with this anchor.
  7. Branching. When step 4 finds a match, the procedure must also continue from step 2 to check whether shorter matches exist at the same anchor position. Each successful match opens a separate decomposition branch.

The procedure must run to exhaustion across all branches.

Stopping at the first match found, including the longest match, is incorrect. Multiple complete decomposition paths may exist; the parser must enumerate them, not commit to the first one found.

10.2 Remainders and rightward candidate strings

The removed material in reverse-stripping is a raw rightward candidate string.

It does not need to be an independent whole word.

However, it must ultimately resolve into attested morphemes or lexical units for the full decomposition to be confirmed.

If the remainder cannot be decomposed into attested material, the branch is incomplete.

A structurally valid but unattested remainder does not authorize meaning. It may support canon structured status, but not canon confirmed status.

10.3 Suffixal accretion as construction history

Many HT lexical families were authored outward through suffixal accretion: a base form receives additional rightward material, and each addition further specifies the previous form.

Within such a family, the longest known form is often the most informative registered form.

Because of this construction history, reverse-stripping naturally finds useful candidates when a form was built through rightward accretion. The longer matches found by reverse-stripping often correspond to more-specialized lexical entries that capture semantic specificity.

This is a search-efficiency observation, not a canonical authority.

Reverse-stripping may find longer matches faster in right-built lexical families, but the canonical result still requires uniqueness across all complete decomposition paths.

10.4 Method output and canonization

Reverse-stripping produces a set of complete decomposition paths.

A complete path is one where every part of the original casual form has been assigned to attested morphemic material or attested lexical units.

One complete path:
The form may be canon confirmed if the meaning is lexically licensed by the attested components.

Multiple complete paths:
The form is canon structured unless context or lexicon registration selects one path. The parser reports the candidate paths but does not select among them.

No complete path with all components attested:
The form is canon structured if at least one structurally valid CVC completion exists for the unattested portions. It is canon invalid if the surface form has structural problems.

The method does not canonize. Reverse-stripping produces candidates; uniqueness and lexical attestation produce confirmation; direct lexicon entry produces definition.

11. Canon Confirmed

A form is canon confirmed when it is not directly registered as a whole word, but existing canon uniquely confirms it.

Requirements:

  1. the form decomposes into attested canonical morphemes or lexical units;
  2. the decomposition is complete;
  3. the decomposition path is unique;
  4. the resulting strict form is unique;
  5. the resulting meaning is lexically licensed by the attested components.

11.1 Lexically licensed compositional meaning

Meaning is not inferred freely by the parser.

A meaning is lexically licensed only when the registered entries for the root and rightward additions provide enough semantic and functional information to support the combined reading.

The parser may assemble what the lexicon licenses. It may not invent semantic relationships merely because the structure is valid.

If a decomposition is structurally unique but the semantic relationship is unclear, surprising, idiomatic, or not licensed by the component entries, the form is not canon confirmed. It remains canon structured or requires direct lexicon registration.

If a whole-word lexicon entry exists and its meaning differs from the expected component-by-component reading, the whole-word entry governs, provided it passes structural gates.

11.2 Unique path, not merely unique surface

Canon confirmed requires a unique decomposition path, not merely a unique visible result.

If two different decomposition paths produce the same casual surface or even the same strict form, the parser must still treat the form as ambiguous unless the lexicon or context resolves the intended path.

The parser reports the competing paths and does not choose among them.

12. The Four Resolution States

Every evaluated HT form should land in one of four public-facing states.

12.1 Canon Defined

The form is directly registered in the lexicon and passes all structural gates.

The lexicon provides its strict form, casual form, meaning, grammatical role, and notes.

This is the highest authority.

Parser statement:

Defined by lexicon.

If more than one valid entry is registered for the same casual surface, report:

Defined by lexicon, but lexically ambiguous.

12.2 Canon Confirmed

The form is not directly registered as a whole entry, but it decomposes uniquely into attested canon elements.

The parser can rebuild the strict form and lexically licensed meaning from the lexicon.

Parser statement:

Confirmed by unique decomposition from attested morphemes.

12.3 Canon Structured

The form uses only valid HT components and can be completed into legal CVC structure, but the strict form or meaning cannot be uniquely recovered.

This includes unattested but structurally possible casual forms, ambiguous alternating CV chains, multiple valid decomposition paths, and structurally valid forms with no lexically licensed meaning.

Parser statement:

Structurally possible, but canonically unresolved. No definition may be asserted.

12.4 Canon Invalid

The form fails one of the mandatory gates.

Invalid causes include:

  • noncanonical components;
  • illegal consonants or vowels;
  • no vowel nucleus;
  • presence of = in a form that does not parse as complete strict CVC;
  • strict form not divisible into full CVC syllables;
  • consonant runs that exceed the limits in Section 7.2;
  • structural patterns impossible under HT rules;
  • malformed romanization.

Parser statement:

Invalid as HT under canon structure.

13. Parser Decision Order

The parser should proceed as follows.

  1. Tokenize input into candidate word-forms.
  2. Validate every component against the canonical consonant and vowel sets.
  3. If any component is noncanonical, mark canon invalid.
  4. Determine mode. If the form contains =, set mode to strict. Otherwise, set mode to casual.
  5. In strict mode, require complete CVC syllable blocks.
  6. If strict validation fails, mark canon invalid.
  7. If strict validation succeeds, check whole-form lexicon lookup.
  8. If whole-form lookup succeeds and the entry passes structural gates, mark canon defined.
  9. If strict mode is valid but not directly defined, attempt decomposition through reverse-stripping where appropriate. If the form belongs to the zu or ya class, treat zu or ya as the leftmost primary conceptual root and analyze the remaining material as rightward specification.
  10. If exactly one complete valid decomposition exists and the meaning is lexically licensed, mark canon confirmed.
  11. If multiple complete decompositions exist, or if meaning is not lexically licensed, mark canon structured.
  12. In casual mode, check that the form contains at least one vowel nucleus.
  13. In casual mode, apply consonant-run limits.
  14. If any consonant run exceeds the allowed maximum, mark canon invalid.
  15. In casual mode, check whole-word lexicon lookup.
  16. If whole-word lookup succeeds and the entry passes structural gates, mark canon defined.
  17. If not directly defined, identify whether the form belongs to a canonically recognized class-root family such as the zu temporal class or the ya interrogative/seeking class. This is class-root recognition, not prefix recognition.
  18. Apply forced structural rules to the remaining form: CC split, VV split, and CVC completion.
  19. If the casual form cannot be completed into legal strict CVC structure, mark canon invalid.
  20. If structural completion is possible, attempt decomposition through reverse-stripping.
  21. The reverse-stripping procedure must enumerate all complete decomposition paths, not stop at the first match.
  22. If exactly one complete valid decomposition exists, any class-root structure is canonically licensed, and the meaning is lexically licensed, mark canon confirmed.
  23. If multiple valid decompositions exist, or if the form is structurally possible but not lexically resolved, mark canon structured.
  24. Never infer meaning from structure alone.

14. Worked Examples

Example 1: invalid component

Input:

tunaqiza

Result:

canon invalid

Reason:

q is not in the canonical HT inventory.

Example 2: valid strict form

Input:

tu=na=

Structural result:

tu= + na=

The form is structurally valid strict HT.

Final canon status depends on the lexicon.

If the lexicon defines it, it is canon defined.

If it decomposes uniquely into attested material with lexically licensed meaning, it is canon confirmed.

If it is structurally valid but unresolved, it is canon structured.

Example 3: incomplete strict form

Input:

tu=na

Result:

canon invalid

Reason:

The = forces strict evaluation, but the form does not divide into complete CVC syllables.

Example 4: malformed strict claim

Input:

tubr=azna

Result:

canon invalid

Reason:

The = forces strict evaluation. The segment r=a does not parse as a CVC syllable because = is a consonant, not a vowel.

Example 5: ambiguous alternating CV chain

Input:

tunafizavutogarocace

Lexicon ignored.

Result:

canon structured

Reason:

The form uses valid HT components and can be completed into strict CVC structure, but the casual surface alone allows 512 possible strict completions. No meaning may be asserted.

Example 6: invalid initial consonant run

Input:

strolan

Result:

canon invalid

Reason:

Initial consonant run str of length 3 exceeds the beginning consonant run maximum of 1.

Example 7: structurally valid casual form

Input:

tubrazna

Forced CC splits:

tub | raz | na

CVC completion:

tub + raz + na=

Strict completion:

tubrazna=

Result:

Structurally valid casual form.

Final canon status depends on the lexicon.

If the form is directly listed, it is canon defined.

If it decomposes uniquely into attested morphemes with lexically licensed meaning, it is canon confirmed.

If not, it remains canon structured.

Example 8: temporal class-root construction

Input:

zufoti

Relevant class root:

zu – temporal field / time reference

Rightward specification:

fo – future/forward projection

ti – immediacy / smallness / near focus

Result:

A right-built temporal marker meaning “immediate future” or “about to,” if attested in the lexicon.

Disposition:

canon defined if directly registered in the lexicon.

Reason:

zufoti is not a prefix attached to a verb. It is a temporal lexical unit anchored by the class root zu.

Example 9: interrogative class-root construction

Input:

yatuna

Relevant class root:

ya – inquiry / seeking / question

Rightward specification:

tuna – person

Result:

A right-built interrogative form meaning “who,” if attested in the lexicon.

Disposition:

canon defined if directly registered in the lexicon.

Reason:

yatuna is not formed by a prefix attaching to tuna. It is an interrogative lexical unit anchored by the class root ya.

15. Diagnostic-Code Appendix

The four resolution states are the public doctrine.

A parser may also use diagnostic codes to explain how it reached a status.

These codes are not additional canon categories. They are reasons, methods, or failure points.

Inventory diagnostics

inventory.valid – All components belong to the canonical HT inventory.

inventory.invalid_component – The form contains a noncanonical component.

Mode diagnostics

mode.strict_by_ungU – The form contains =, so it is evaluated as strict.

mode.casual_no_ungU – The form contains no =, so it is evaluated as casual.

Strict diagnostics

strict.valid_cvc_blocks – The strict form divides cleanly into complete CVC syllables.

strict.invalid_cvc_blocks – The strict form does not divide into complete CVC syllables.

strict.invalid_mixed_notation – The form contains =, but also contains incomplete casual material.

Casual structural diagnostics

casual.valid_completion – The casual form can be completed into at least one legal strict form.

casual.no_valid_completion – The casual form cannot be completed into legal strict form.

casual.unique_strict_completion – The casual form has one structurally recoverable strict completion, though meaning may still be unresolved.

casual.forced_cc_split – A CC boundary was found and applied.

casual.forced_vv_split – A VV boundary was found and applied.

casual.ambiguous_vcv – A single consonant between vowels creates unresolved syllable-boundary ambiguity.

casual.ambiguous_cv_chain – An alternating CV chain creates many possible strict completions.

casual.invalid_consonant_run – Umbrella diagnostic for any consonant run exceeding the limits in Section 7.2.

casual.invalid_initial_consonant_run – Word starts with two or more consonants before the first vowel.

casual.invalid_internal_consonant_run – Three or more consonants appear between vowels somewhere in the word.

casual.invalid_final_consonant_run – Word ends with two or more consonants after the last vowel.

casual.no_vowel_nucleus – The form contains no vowel and cannot form an HT syllable.

Lexicon diagnostics

lex.defined – The whole form is directly defined in the lexicon and passes structural gates.

lex.not_defined – The whole form is not directly defined in the lexicon.

lex.multiple_defined_entries – More than one valid lexicon entry exists for the same casual form.

lex.components_attested – All morphemic components in a proposed decomposition are attested.

lex.components_unattested – One or more proposed morphemic components are not attested.

lex.entry_violates_gates – A whole-word lexicon entry exists but fails structural validation. This indicates a lexicon error.

lex.meaning_licensed – The registered component meanings and functions support the combined reading.

lex.meaning_not_licensed – The structure is possible, but the proposed meaning is not licensed by registered component entries.

Decomposition diagnostics

decomp.unique_path – Exactly one complete valid decomposition path exists.

decomp.multiple_paths – More than one complete valid decomposition path exists.

decomp.no_complete_path – No full decomposition into attested material exists.

decomp.partial_matches – Some attested components were found, but not enough to confirm the whole form.

decomp.unresolved_remainder – Reverse-stripping produced a rightward candidate string that could not be fully resolved into attested material.

Class-root diagnostics

classroot.none – No class-forming root is identified.

classroot.zu_temporal – The form is anchored by the temporal root zu.

classroot.ya_interrogative – The form is anchored by the interrogative/seeking root ya.

classroot.unlicensed_left_edge – A suspected left-edge class root is not canonically licensed.

Method diagnostics

method.reverse_strip_used – Reverse-stripping was used as the search method.

method.right_built_family_analysis_used – The parser used known right-built family structure to analyze the form.

Canon-result diagnostics

canon.meaning_asserted – The parser may assert meaning because the form is canon defined or canon confirmed.

canon.meaning_not_asserted – The parser may not assert meaning because the form is canon structured or canon invalid.

16. Core Doctrine Statement

HT words are canonically grounded in strict form.

Casual romanization is a readable surface derived from strict form by removing ungU.

Because casual form is lossy, casual-to-strict recovery is only canonical when:

  1. the lexicon directly defines the form;
  2. existing canon uniquely confirms the form through decomposition;
  3. the coiner supplies and registers the strict form.

Structural validity is not meaning.

A structurally possible form may be HT-shaped without being canonically defined.

HT morphology is exclusively right-building. A word begins with its primary conceptual root on the left and develops rightward through additive specification.

There is no productive prefix class in HT.

zu and ya are class-forming roots, not prefixes.

zu anchors temporal constructions.

ya anchors interrogative and seeking constructions.

Their left-edge position reflects their role as the primary concept of the form, not prefix attachment.

Reverse-stripping is the canonical search method for unattested right-built forms, mirroring the construction direction in reverse. Class-root recognition, uniqueness, attestation, and lexically licensed meaning are the canonizing authorities.

The parser must distinguish:

canon defined

canon confirmed

canon structured

canon invalid

Diagnostic codes may explain why a form landed in one of those states, but the codes do not create new canon categories.

The parser may search, strip, rebuild, and compare candidates.

It must not guess meaning where canon has not supplied it.