StandardTokenizer discards some kinds of characters: Symbol, Punctuation, …
SOLUTION: haruyama/StandardPlusTokenizer · GitHub
This tokenizer tokenizes all characters but spaces.
kuromoji divides FULLWIDTH DIGIT series into separate character tokens.
SOLUTION: MappingCharFilterFactory
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-fullwidth-digit.txt" />
<tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
lucene-gosen tokenizes HALFWIDTH characters into one unk, even if characters include punctuations.
SOLUTION: MappingCharFilterFactory or kuromoji