Class: SentenceSplitter
SentenceSplitter is our default text splitter that supports splitting into sentences, paragraphs, or fixed length chunks with overlap.
One of the advantages of SentenceSplitter is that even in the fixed length chunks it will try to keep sentences together.
Constructors
new SentenceSplitter()
new SentenceSplitter(
options
?):SentenceSplitter
Parameters
• options?
• options.chunkOverlap?: number
• options.chunkSize?: number
• options.chunkingTokenizerFn?
• options.paragraphSeparator?: string
• options.splitLongSentences?: boolean
• options.tokenizer?: any
• options.tokenizerDecoder?: any
Returns
Source
packages/core/src/TextSplitter.ts:78
Properties
chunkOverlap
chunkOverlap:
number
Source
packages/core/src/TextSplitter.ts:70
chunkSize
chunkSize:
number
Source
packages/core/src/TextSplitter.ts:69
chunkingTokenizerFn()
private
chunkingTokenizerFn: (text
) =>string
[]
Parameters
• text: string
Returns
string
[]
Source
packages/core/src/TextSplitter.ts:75
paragraphSeparator
private
paragraphSeparator:string
Source
packages/core/src/TextSplitter.ts:74
splitLongSentences
private
splitLongSentences:boolean
Source
packages/core/src/TextSplitter.ts:76
tokenizer
private
tokenizer:any
Source
packages/core/src/TextSplitter.ts:72
tokenizerDecoder
private
tokenizerDecoder:any
Source
packages/core/src/TextSplitter.ts:73
Methods
combineTextSplits()
combineTextSplits(
newSentenceSplits
,effectiveChunkSize
):TextSplit
[]
Parameters
• newSentenceSplits: SplitRep
[]
• effectiveChunkSize: number
Returns
TextSplit
[]
Source
packages/core/src/TextSplitter.ts:215
getEffectiveChunkSize()
private
getEffectiveChunkSize(extraInfoStr
?):number
Parameters
• extraInfoStr?: string
Returns
number
Source
packages/core/src/TextSplitter.ts:114
getParagraphSplits()
getParagraphSplits(
text
,effectiveChunkSize
?):string
[]
Parameters
• text: string
• effectiveChunkSize?: number
Returns
string
[]
Source
packages/core/src/TextSplitter.ts:131
getSentenceSplits()
getSentenceSplits(
text
,effectiveChunkSize
?):string
[]
Parameters
• text: string
• effectiveChunkSize?: number
Returns
string
[]
Source
packages/core/src/TextSplitter.ts:157
processSentenceSplits()
private
processSentenceSplits(sentenceSplits
,effectiveChunkSize
):SplitRep
[]
Splits sentences into chunks if necessary.
This isn't great behavior because it can split down the middle of a word or in non-English split down the middle of a Unicode codepoint so the splitting is turned off by default. If you need it, please set the splitLongSentences option to true.