LanguageTags

C# .NET library for ISO 639-2, ISO 639-3, RFC 5646 / BCP 47 language tags.

Build and Distribution

Source Code: GitHub - Source code, issues, discussions, and CI/CD pipelines.
Versioned Releases: GitHub Releases - Version tagged source code and build artifacts.
NuGet Packages NuGet Packages - .NET libraries published to NuGet.org.

Build Status

Releases

Release Notes

Version: 1.2:

Summary:

Refactored the project to follow standard patterns across other projects.
IO APIs are now async-only (LoadDataAsync, LoadJsonAsync, SaveJsonAsync, GenCodeAsync).
Added logging support for ILogger or ILoggerFactory per class instance or statically.

See Release History for complete release notes and older versions.

Getting Started

Get started with LanguageTags in two easy steps:

Add LanguageTags to your project:

# Add the package to your project
dotnet add package ptr727.LanguageTags

Write some code:

LanguageLookup languageLookup = new();
string iso = languageLookup.GetIsoFromIetf("af"); // "afr"
iso = languageLookup.GetIsoFromIetf("zh-cmn-Hant"); // "chi"
iso = languageLookup.GetIsoFromIetf("cmn-Hant"); // "chi"

LanguageTag languageTag = LanguageTag.CreateBuilder()
    .Language("en")
    .Script("latn")
    .Region("gb")
    .VariantAdd("boont")
    .ExtensionAdd('r', ["extended", "sequence"])
    .PrivateUseAdd("private")
    .Build();
string tag = languageTag.ToString(); // "en-latn-gb-boont-r-extended-sequence-x-private"

See Usage for detailed usage instructions.

Use Cases

ℹ️ TL;DR:

Catalog of ISO 639-2, ISO 639-3, RFC 5646 language tags in JSON and C# record format.

Code for IETF BCP 47 language tag construction and parsing per the RFC 5646 semantic rules.

⚠️ Note: The implemented language tag parsing and normalization logic may be incomplete or inaccurate.

Verify the results for your specific usage.

Refer to Libraries for other known implementations.

Refer to References for specification details.

Usage

ℹ️ Note: Refer to the Tag Theory section for an overview of terms and theory of operation.

Tag Lookup

Tag records can be constructed by calling Create(), or loaded from data LoadDataAsync(), or loaded from JSON LoadJsonAsync().
The records and record collections are immutable and can safely be reused and shared across threads.

Each class implements a Find(string languageTag, bool includeDescription) method that will search all tags in all records for a matching tag.
This is mostly a convenience function, and specific use cases should use specific tags.

Iso6392Data iso6392 = Iso6392Data.Create();
Iso6392Record? record = iso6392.Find("afr", false);
// record.Part2B = "afr"
// record.RefName = "Afrikaans"
record = iso6392.Find("zulu", true);
// record.Part2B = "zul"
// record.RefName = "Zulu"

Iso6393Data iso6393 = await Iso6393Data.LoadDataAsync("iso6393");
Iso6393Record? record = iso6393.Find("zh", false);
// record.Id = "zho"
// record.Part1 = "zh"
// record.RefName = "Chinese"
record = iso6393.Find("yue chinese", true);
// record.Id = "yue"
// record.RefName = "Yue Chinese"

Rfc5646Data rfc5646 = await Rfc5646Data.LoadJsonAsync("rfc5646.json");
Rfc5646Record? record = rfc5646.Find("de", false);
// record.SubTag = "de"
// record.Description[0] = "German"
record = rfc5646.Find("zh-cmn-Hant", false);
// record.Tag = "zh-cmn-Hant"
// record.Description[0] = "Mandarin Chinese (Traditional)"
record = rfc5646.Find("Inuktitut in Canadian", true);
// record.Tag = "iu-Cans"
// record.Description[0] = "Inuktitut in Canadian Aboriginal Syllabic script"

Tag Conversion

Tags can be converted between ISO 639 and IETF forms using GetIetfFromIso() and GetIsoFromIetf().
Tag lookup will use the user defined Overrides map, or the tag record lists, or the local system CultureInfo.
If a match is not found the undetermined und tag will be returned.

LanguageLookup languageLookup = new();
string ietf = languageLookup.GetIetfFromIso("afr"); // "af"
ietf = languageLookup.GetIetfFromIso("zho"); // "zh"

LanguageLookup languageLookup = new();
string iso = languageLookup.GetIsoFromIetf("af"); // "afr"
iso = languageLookup.GetIsoFromIetf("zh-cmn-Hant"); // "chi"
iso = languageLookup.GetIsoFromIetf("cmn-Hant"); // "chi"

Tag Matching

Tag matching can be used to select content based on preferred vs. available languages.

ℹ️ Examples:

HTTP Accept-Language and Content-Language.

Matroska media stream LanguageIETF Element.

IETF language tags are in the form of:

[Language]-[Extended language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]

Sub-tag matching happens left to right until a match is found.

Examples:

pt will match pt Portuguese, or pt-BR Brazilian Portuguese, or pt-PT European Portuguese.
pt-BR will only match pt-BR Brazilian Portuguese\
zh will match zh Chinese, or zh-Hans simplified Chinese, or zh-Hant for traditional Chinese, and other variants.
zh-Hans will only match zh-Hans simplified Chinese.

LanguageLookup languageLookup = new();
bool match = languageLookup.IsMatch("en", "en-US"); // true
match = languageLookup.IsMatch("zh", "zh-cmn-Hant"); // true
match = languageLookup.IsMatch("sr-Latn", "sr-Latn-RS"); // true
match = languageLookup.IsMatch("zha", "zh-Hans"); // false
match = languageLookup.IsMatch("zh-Hant", "zh-Hans"); // false

Tag Builder

The LanguageTagBuilder class supports fluent builder style tag construction, and will return a constructed LanguageTag class through the final Build() or Normalize() methods.

The Build() method will construct the tag, but will not perform any correctness validation or normalization.
Use the Validate() method to test for shape correctness. See Tag Validation for details.

The Normalize() method will build the tag and perform validation and normalization.
See Tag Normalization for details.

LanguageTag languageTag = LanguageTag.CreateBuilder()
    .Language("en")
    .Script("latn")
    .Region("gb")
    .VariantAdd("boont")
    .ExtensionAdd('r', ["extended", "sequence"])
    .PrivateUseAdd("private")
    .Build();
string tag = languageTag.ToString(); // "en-latn-gb-boont-r-extended-sequence-x-private"

LanguageTag languageTag = LanguageTag.CreateBuilder()
    .PrivateUseAddRange(["private", "use"])
    .Build();
string tag = languageTag.ToString(); // "x-private-use"

LanguageTag? languageTag = LanguageTag.CreateBuilder()
    .Language("ar")
    .ExtendedLanguage("arb")
    .Script("latn")
    .Region("de")
    .VariantAdd("nedis")
    .VariantAdd("foobar")
    .Normalize();
string tag = languageTag?.ToString(); // "arb-Latn-DE-foobar-nedis"

Tag Parser

The LanguageTag class static Parse() method will parse the text form language tag and return a constructed LanguageTag object, or null in case of a parsing failure.

Parsing will validate all subtags for correctness in type, length, and position, but not value, and case will not be modified.

Grandfathered tags will be converted to their current preferred form and parsed as such.
E.g. en-gb-oed -> en-GB-oxendict, i-klingon -> tlh.

The Normalize() method will parse the text tag, and perform validation and normalization.
See Tag Normalization for details.

LanguageTag? languageTag = LanguageTag.Parse("en-latn-gb-boont-r-extended-sequence-x-private");
// languageTag.Language = "en"
// languageTag.Script = "latn"
// languageTag.Region = "gb"
// languageTag.Variants[0] = "boont"
// languageTag.Extensions[0].Prefix = 'r'
// languageTag.Extensions[0].Tags[0] = "extended"
// languageTag.Extensions[0].Tags[1] = "sequence"
// languageTag.PrivateUse.Tags[0] = "private"
string tag = languageTag?.ToString(); // "en-latn-gb-boont-r-extended-sequence-x-private"

LanguageTag? languageTag = LanguageTag.Parse("en-gb-oed"); // Grandfathered
// languageTag.Language = "en"
// languageTag.Region = "GB"
// languageTag.Variants[0] = "oxendict"
string tag = languageTag?.ToString(); // "en-GB-oxendict"

Tag Normalization

The Normalize() method will convert tags to their canonical form.
See RFC 5646 Section 4.5 for details.

Normalization includes the following:

Replace the language subtag with their preferred values.
- E.g. iw -> he, in -> id
Replace extended language subtags with their preferred language subtag values.
- E.g. ar-afb -> afb, zh-yue -> yue
Remove or replace redundant subtags their preferred values.
- E.g. zh-cmn-Hant -> cmn-Hant, zh-gan -> gan, sgn-CO -> csn
Remove redundant script subtags.
- E.g. af-Latn -> af, en-Latn -> en
Normalize case.
- All subtags lowercase.
- Script title case, e.g. Latn.
- Region uppercase, e.g. GB.
Sort sub tags.
- Sort variant subtags by value.
- Sort extension subtags by prefix and subtag values.
- Sort private use subtags by value.

LanguageTag? languageTag = LanguageTag.CreateBuilder()
    .Language("en")
    .ExtensionAdd('b', ["ccc"]) // Add b before a to force a sort
    .ExtensionAdd('a', ["bbb", "aaa"]) // Add bbb before aaa to force a sort
    .PrivateUseAddRange(["ccc", "a"]) // Add ccc before a to force a sort
    .Normalize();
string tag = languageTag?.ToString(); // "en-a-aaa-bbb-b-ccc-x-a-ccc"

LanguageTag? languageTag = LanguageTag.ParseAndNormalize("en-latn-gb-boont-r-sequence-extended-x-private");
string tag = languageTag?.ToString(); // "en-GB-boont-r-extended-sequence-x-private"

LanguageTag? languageTag = LanguageTag.Parse("ar-arb-latn-de-nedis-foobar");
string tag = languageTag?.ToString(); // "ar-arb-latn-de-nedis-foobar"

LanguageTag? normalizedTag = languageTag?.Normalize();
string normalizedString = normalizedTag?.ToString(); // "arb-Latn-DE-foobar-nedis"

Tag Validation

The Validate() method will verify subtags for correctness.
See RFC 5646 Section 2.1 and RFC 5646 Section 2.2.9 for details.

Note that LanguageTag objects created by Parse() or Normalize() are already verified for form correctness during parsing, and Validate() is primarily of use when using the LanguageTagBuilder.Build() method directly.

Validation includes the following:

Subtag shape correctness, see Format for a summary.
No duplicate variants, extension prefixes, extension tags, or private tags.
No missing subtags.

LanguageTag languageTag = LanguageTag.CreateBuilder()
    .Language("en")
    .Region("US")
    .Build();
bool isValid = languageTag.Validate(); // true
// Or use the IsValid property
isValid = languageTag.IsValid; // true

Installation

Project integration:

# Add the package to your project
dotnet add package ptr727.LanguageTags

// Include the namespace
using ptr727.LanguageTags;

Debug log configuration:

// Configure global logging (static fallback)
using Microsoft.Extensions.Logging;
using ptr727.LanguageTags;
using Serilog;
using Serilog.Extensions.Logging;

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Debug()
    .WriteTo.Debug()
    .CreateLogger();

ILoggerFactory loggerFactory = new SerilogLoggerFactory(Log.Logger, dispose: true);
LogOptions.SetFactory(loggerFactory);

// Configure per-call logging (instance logger or factory)
using Microsoft.Extensions.Logging;
using ptr727.LanguageTags;
using Serilog;
using Serilog.Extensions.Logging;

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Debug()
    .WriteTo.Debug()
    .CreateLogger();

ILoggerFactory loggerFactory = new SerilogLoggerFactory(Log.Logger, dispose: true);
Options options = new() { LoggerFactory = loggerFactory };

LanguageTag? tag = LanguageTag.Parse("en-US", options);
LanguageLookup lookup = new(options);

Questions or Issues

Tag testing:

The BCP47 language subtag lookup site offers convenient tag parsing and validation capabilities.
Refer to the unit tests for examples, do note that tests may pass but not be complete or accurate per the RFC spec.

General questions:

Use the Discussions forum for general questions.

Bug reports:

Ask in the Discussions forum if you are not sure if it is a bug.
Check the existing Issues tracker for known problems.
If the issue is unique and a bug, file it in Issues, and include all pertinent steps to reproduce the issue.

Build Artifacts

Build process and artifacts:

LanguageTagsCreate project:
- Downloads language tag data files.
- Converts the tag data into JSON files.
- Generates C# records of the tags.
LanguageData directory:
- ISO 639-2: Source, Data, JSON, Code
- ISO 639-3: Source, Data, JSON, Code
- RFC 5646 : Source, Data, JSON, Code
A weekly GitHub Actions job keeps the data files up to date and automatically publishes new releases.

Tag Theory

ℹ️ Note: Refer to References for complete specification details.

Terminology

Brief overview of tag terms:

An IETF BCP 47 language tag is a standardized code that is used to identify human languages on the Internet.
The tag structure is standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47.
RFC 5646 defines the BCP 47 language tag syntax and semantic rules.
The subtags are maintained by Internet Assigned Numbers Authority (IANA) Language Subtag Registry.
ISO 639 is a standard for classifying languages and language groups, and is maintained by the International Organization for Standardization (ISO).
RFC 5646 incorporates ISO 639, ISO 15924, ISO 3166, and UN M.49 codes as the foundation for its language tags.

Format

ℹ️ TL;DR: IETF language tags are constructed from sub-tags with specific rules.

ℹ️ Note: Refer to RFC 5646 Section 2.1 for complete language tag syntax and rules.

Normal tags:

[Language]-[Extended language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]

Language:
- 2 - 3 alpha: Shortest ISO 639 code
- 4 alpha: Future use
- 5 - 8 alpha: Registered tag
- See RFC 5646 Section 2.2.1
Extended language:
- 3 alpha: Reserved ISO 639 code
- See RFC 5646 Section 2.2.2
Script:
- 4 alpha: ISO 15924 code
- See RFC 5646 Section 2.2.3
Region:
- 2 alpha: ISO 3166-1 code
- 3 digit: UN M.49 code
- See RFC 5646 Section 2.2.4
Variant:
- 5 - 8 alphanumeric starting with letter: Registered tag
- 4 - 8 alphanumeric starting with digit: Registered tag
- See RFC 5646 Section 2.2.5
Extension: ([singleton]-[extension])
- 1 alphanumeric: Singleton
- 2 - 8 alphanumeric: Extension
- See RFC 5646 Section 2.2.6

Private use tags:

x-[private]

x: Singleton
1 - 8 alphanumeric: Private use
See RFC 5646 Section 2.2.7

Grandfathered tags:

[grandfathered]

Grandfathered tags are converted to current form tags.
E.g. en-gb-oed -> en-GB-oxendict
E.g. i-klingon -> tlh.
See RFC 5646 Section 2.2.8

Examples:

zh : [Language]
zh-yue : [Language]-[Extended language]
zh-yue-hk: [Language]-[Extended language]-[Region]
hy-latn-it-arevela: [Language]-[Script]-[Region]-[Variant]
en-a-bbb-x-a-ccc : [Language]-[Extension]-[Private Use]
en-latn-gb-boont-r-extended-sequence-x-private : [Language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]

References

References and documentation:

Libraries

Other known language tag libraries:

3rd Party Tools

3rd party tools used in this project:

License

Licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.config		.config
.github		.github
.husky		.husky
.vscode		.vscode
LanguageData		LanguageData
LanguageTags		LanguageTags
LanguageTagsCreate		LanguageTagsCreate
LanguageTagsTests		LanguageTagsTests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CODESTYLE.md		CODESTYLE.md
HISTORY.md		HISTORY.md
LICENSE		LICENSE
LanguageTags.code-workspace		LanguageTags.code-workspace
LanguageTags.slnx		LanguageTags.slnx
README.md		README.md
version.json		version.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LanguageTags

Build and Distribution

Build Status

Releases

Release Notes

Getting Started

Table of Contents

Use Cases

Usage

Tag Lookup

Tag Conversion

Tag Matching

Tag Builder

Tag Parser

Tag Normalization

Tag Validation

Installation

Questions or Issues

Build Artifacts

Tag Theory

Terminology

Format

References

Libraries

3rd Party Tools

License

About

Uh oh!

Releases 73

Contributors 4

Languages

License

ptr727/LanguageTags

Folders and files

Latest commit

History

Repository files navigation

LanguageTags

Build and Distribution

Build Status

Releases

Release Notes

Getting Started

Table of Contents

Use Cases

Usage

Tag Lookup

Tag Conversion

Tag Matching

Tag Builder

Tag Parser

Tag Normalization

Tag Validation

Installation

Questions or Issues

Build Artifacts

Tag Theory

Terminology

Format

References

Libraries

3rd Party Tools

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 73

Contributors 4

Languages