QUICK REVIEW

[Paper Review] xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning

Linzheng Chai, Jian Yang|arXiv (Cornell University)|Jan 13, 2024

Topic Modeling7 citations

TL;DR

xCoT introduces cross-lingual instruction tuning to transfer reasoning from high-resource to low-resource languages, using xICL, Random-CoT, and cross-lingual distillation to improve multilingual chain-of-thought reasoning.

ABSTRACT

Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in large language models and improve a variety of downstream tasks. CoT mainly demonstrates excellent performance in English, but its usage in low-resource languages is constrained due to poor language generalization. To bridge the gap among different languages, we propose a cross-lingual instruction fine-tuning framework (xCOT) to transfer knowledge from high-resource languages to low-resource languages. Specifically, the multilingual instruction training data (xCOT-INSTRUCT) is created to encourage the semantic alignment of multiple languages. We introduce cross-lingual in-context few-shot learning (xICL)) to accelerate multilingual agreement in instruction tuning, where some fragments of source languages in examples are randomly substituted by their counterpart translations of target languages. During multilingual instruction tuning, we adopt the randomly online CoT strategy to enhance the multilingual reasoning ability of the large language model by first translating the query to another language and then answering in English. To further facilitate the language transfer, we leverage the high-resource CoT to supervise the training of low-resource languages with cross-lingual distillation. Experimental results on previous benchmarks demonstrate the superior performance of xCoT in reducing the gap among different languages, highlighting its potential to reduce the cross-lingual gap.

Motivation & Objective

Bridge the cross-lingual gap in chain-of-thought reasoning for low-resource languages.
Create multilingual instruction data that aligns reasoning across languages.
Develop training strategies (xICL, Random-CoT, xDistill) to enhance cross-lingual transfer.
Demonstrate improvements on multilingual benchmarks MGSM and MSVAMP.

Proposed method

Construct xCoT-Instruct, a multilingual instruction dataset by translating English data into 10 languages while keeping English outputs.
Introduce cross-lingual in-context few-shot learning (xICL) by code-switching demo queries across languages to align representations.
Apply Random-CoT during multilingual instruction tuning: translate the query to a random intermediate language, then answer in English.
Use cross-lingual distillation (xDistill) to supervise low-resource outputs with high-resource CoT distributions at the token level.
Train using multilingual fine-tuning (D = {D^Lk}) with a joint objective to align outputs across languages (P(a^Lj|c^Li,q^Li;M)).
Optionally augment data with D' generated by the fine-tuned model to reinforce correct reasoning paths.

Experimental results

Research questions

RQ1How can cross-lingual instruction tuning improve chain-of-thought reasoning in low-resource languages?
RQ2Does integrating multilingual in-context learning and code-switching enhance cross-language alignment of reasoning processes?
RQ3Can high-resource language CoT supervision effectively transfer to low-resource languages via distillation?
RQ4What is the impact of Random-CoT and multilingual data augmentation on multilingual reasoning accuracy?
RQ5How do xCoT components perform on MGSM and MSVAMP multilingual benchmarks?

Key findings

xCoT achieves state-of-the-art performance on MGSM and MSVAMP benchmarks across 11 and 10 languages, respectively.
Cross-lingual in-context learning (xICL) with code-switching significantly improves multilingual alignment.
Random-CoT, which translates queries into intermediate languages before answering in English, boosts multilingual reasoning.
Cross-lingual distillation (xDistill) leverages high-resource CoT to supervise low-resource languages at the token level.
Multilingual representations become more aligned in shared space after xCoT-Instruct training compared to baselines.
Ablation studies show cumulative gains from xICL, mSampling, Random-CoT, and xDistill, with xCoT achieving the best overall performance.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.