字体文件信息读取（极速解析TTF字体名称）

作者: 天下第九九八十一 | 来源:发表于2020-02-17 16:17 被阅读0次

字体文件信息读取（极速解析TTF字体名称）
字体自定义
IOS 自定义字体
使用自定义字体
导入自定义字体库
Linux Mint字体发虚怎么办？
Unity中UGUI使用fnt位图字体
使用自定义字体 TTF文件 - iOS
vue中引入digital-液晶数字字体
Python 字体爬虫应对

一、替代原方法

这里已经有人写过了，不过使用了随机读写，而且有一点错误。我来尝试一下仅用FileInputStream读取，并试着比较两者性能。

每个`Name Record`的长度为`2*6=12`，我们要读取的信息为`uNameID`范围0-7([更多](https://docs.microsoft.com/en-us/typography/opentype/spec/name#name-ids))，分别对应如下：

{
    0: 'copyright',
    1: 'fontFamily',
    2: 'fontSubFamily',
    3: 'fontIdentifier',
    4: 'fontName',
    5: 'fontVersion',
    6: 'postscriptName',
    7: 'trademark',
}

JavaScript

另外考虑读取的编码为Unicode情况，要求：

uPlatformID === 0 && uEncodingID === (0,1,3)

JavaScript

或

uPlatformID === 3 && uEncodingID === 1

满足如上条件后，我们通过`uOffset+uStorageOffset+uStringOffset`分别获取每条`Name Record`中的字符串值，即可对应到上面列出的八个属性的值。

到这里，我们就得到了单个ttf文件中需要的信息了。

什么叫“编码为Unicode”的情况？祭出bing搜索大法，实际上，有这样的情况划分：

const isWindowsPlatform =
    platformID === 3 &&
    (encodingID === 0 || encodingID === 1 || encodingID === 10)

  const isUnicodePlatform =
    platformID === 0 &&
    (encodingID === 0 ||
      encodingID === 1 ||
      encodingID === 2 ||
      encodingID === 3 ||
      encodingID === 4)

不过这文同样什么都没说。

再祭出bing搜索仙器。得：
啥都没有！

算了算了，就按照上面那篇文章，当 isWindowsPlatform 或者 isUnicodePlatform 之时，字体名称( fontFamily )算是以 UTF8 编码，否则，算是以UTF16 编码。

这是第一篇文章没有考虑的地方，它是什么都按照 UTF16 编码算。幸运的是，TTF字体可以记录多种不同编码的 fontFamily 名称，第一篇文章给出的方法大概率不会出错。

性能测试：
一，我的方法在前，第一篇文章的方法在后，测试20个字体文件：

方法耗时：, 2 
方法耗时：, 52 

方法耗时：, 3 
方法耗时：, 64 

方法耗时：, 3 
方法耗时：, 64 

方法耗时：, 4 
方法耗时：, 55 

方法耗时：, 3 
方法耗时：, 64

二、他的方法在前，测试20个字体文件：

方法耗时：, 65 
方法耗时：, 2 

方法耗时：, 61 
方法耗时：, 2 


方法耗时：, 54 
方法耗时：, 1

不必再测了吧。他的方法每测一个都要平均2~3ms的时间，而这个时间我把全部的20个文件都测了个遍，而且结果一致：

ARIALUNI.TTF
Arial Unicode MS

ARIALUNI_Longman.TTF
Arial Unicode MS

ARIALUNI_O.ttf
Arial Unicode MS

ARIALUNI_S_Longman.ttf
Arial Unicode MS

ARIALUNI_S_Oxford.ttf
Arial Unicode MS

ARIALUNI_V.ttf
Arial Unicode MS

BASEMIC_.TTF
Basemic

BASES___.TTF
Basemic Symbol

BASET___.TTF
Basemic Times

DoulosSILR.ttf
Doulos SIL

KK.TTF
KK

Ksphonet.ttf
Kingsoft Phonetic Plain

ksphonetic.ttf
Kingsoft Phonetic Plain

LINGOES.TTF
Lingoes Unicode

L_10646.TTF
Lucida Sans Unicode

ProggyClean.fon

SPPY.TTF
sppy

TRANSLIT.TTF
Transliteration

_ARIALUNI_O.ttf
Arial Unicode MS

_ARIALUNI_V.ttf
Arial Unicode MS
方法耗时：, 2

虽然吧，五、六十毫秒的时间不值一提，但是呢，一个App启动时间多个100ms，少个100ms，还是有点区别的。

下面献出代码给幸运观众：

    private static String parseFontName(ReusableBufferedInputStream bin) throws IOException {
        bin.skip(4);
        int numOfTables = readShort(bin);
        bin.skip(6);
        boolean found = false;
        byte[] buff = new byte[4];
        for (int i = 0; i < numOfTables; i++) {
            bin.read(buff,0,4);
            int checkSum = readInt(bin);
            int offset = readInt(bin);
            int length = readInt(bin);
            String tname = new String(buff, StandardCharsets.UTF_8);
            if ("name".equalsIgnoreCase(tname)) {
                int now = 12+16*(i+1);
                int toSkip=offset-now;
                //CMN.Log("name table found!!!", offset, now, toSkip);
                if(toSkip>=0){
                    while(toSkip>0){
                        toSkip-=bin.skip(toSkip);
                    }
                    //now=offset;
                    int fSelector = readShort(bin);
                    int nRCount = readShort(bin);
                    int storageOffset = readShort(bin);
                    ArrayList<Integer> arr = new ArrayList<>(6);
                    for (int j = 0; j < nRCount; j++) {
                        int platformID = readShort(bin);
                        int encodingID = readShort(bin);
                        int languageID = readShort(bin);
                        int nameID = readShort(bin);
                        int stringLength = readShort(bin);
                        int stringOffset = readShort(bin);
                        //1 says that this is font name. 0 for example determines copyright info
                        if(nameID==1){
                            arr.add(stringOffset);
                            arr.add(stringLength);
                            //byte[] bf = bin.getBytes();
                            byte[] bf = new byte[stringLength];
                            offset =  now + stringOffset + storageOffset;
                            now = now + 3*2 + 6*2*(j+1);
                            toSkip=offset-now;
                            //CMN.Log("font name found!!!", stringLength, offset, now, toSkip);
                            if(toSkip>=0){
                                while(toSkip>0){
                                    toSkip-=bin.skip(toSkip);
                                }
                                bin.read(bf, 0, stringLength);
                                boolean utf8 = platformID==3 && (encodingID==0||encodingID==1||encodingID==10) || platformID==0 && encodingID>=0 && encodingID<=4;
                                //CMN.Log(platformID, encodingID, utf8);
                                return new String(bf, 0, stringLength, utf8?StandardCharsets.UTF_16:StandardCharsets.UTF_8);
                            }
                            break;
                        }
                    }
                }
                break;
            }else if (tname.length() == 0) {
                break;
            }
        }
        return null;
    }

    private static int readInt(InputStream bin) throws IOException {
        int ch1 = bin.read();
        int ch2 = bin.read();
        int ch3 = bin.read();
        int ch4 = bin.read();
        if ((ch1 | ch2 | ch3 | ch4) < 0)
            throw new EOFException();
        return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
    }

    private static int readShort(InputStream bin) throws IOException {
        int ch1 = bin.read();
        int ch2 = bin.read();
        if ((ch1 | ch2) < 0)
            throw new EOFException();
        return (short)((ch1 << 8) + (ch2 << 0));
    }

仅仅就这一个方法（readInt、readShort不算）。其中， ReusableBufferedInputStream 是为了复用缓存而拓展的一个类：

public class ReusableBufferedInputStream extends BufferedInputStream {
    public ReusableBufferedInputStream(@NotNull InputStream in, int size) {
        super(in, size);
    }

    public void reset(InputStream in) throws IOException {
        this.in = in;
        pos = 0;
        markpos = -1;
        count = 0;
    }

    public byte[] getBytes(){
        return buf;
    }
}

使用方法：

        File[] arr = new File("F:\\assets\\fonts\\").listFiles();
        if(arr!=null)
        for (File fI:arr) {
            String path=fI.getPath();
            if(path.regionMatches(true, path.length()-4, ".ttf", 0, 4)){
                FileInputStream fin = new FileInputStream(fI);
                if(bin==null) bin =  new ReusableBufferedInputStream(fin, 4096);
                else bin.reset(fin);
                System.out.println(parseFontName(bin));
            }
        }

优化/潜在失效：

就是前面说过的utf8还是utf16编码，没找到靠谱文档只靠猜测了一下，暂时没遇到问题。
FileInputStream这类常规流的特性就是一往直前，不可倒退。所以，当 parseFontName 方法中出现 toSkip<0 的情况，只返回 null，分析失败了。如果要修复，关闭原来的文件流，再重开，处理逻辑复杂许多，加之测试中并没有发现TTF有这种情况，暂时忽略。
需要手动关闭 FileInputStream。测试中我并没有关闭。可以在 ReusableBufferedInputStream中处理一下。
应当避免Stream.skip循环变成死循环。

二、控制变量，比较效率。

结论：纵然原方法存在诸多效率低下之处，剔除这些因素，RandomAccessFile仍然比普通的Stream要消耗更多时间。

测试代码：

        for (File fI:arr) {
            CMN.Log();
            CMN.Log(fI.getName());
            String path=fI.getPath();
            if(path.regionMatches(true, path.length()-4, ".ttf", 0, 4)){
                RandomInputStream fin = new RandomInputStream(fI);
                CMN.Log(parseFontName(fin));
                fin.close();
            }
        }

拓展出来的RandomInputStream测试类：

public class RandomInputStream extends InputStream {
    private final RandomAccessFile raf;
    private final File f;

    public RandomInputStream(File _f) throws FileNotFoundException {
        f = _f;
        raf = new RandomAccessFile(f, "r");
    }

    @Override
    public int read() throws IOException {
        return raf.read();
    }

    @Override
    public long skip(long n) throws IOException {
        raf.seek(raf.getFilePointer()+n);
        return n;
    }

    @Override
    public int read(@NotNull byte[] b) throws IOException {
        return raf.read(b);
    }

    @Override
    public int read(@NotNull byte[] b, int off, int len) throws IOException {
        return raf.read(b, off, len);
    }

    @Override
    public int available() throws IOException {
        return (int) (f.length()-raf.getFilePointer());
    }
}

测试耗时：

方法耗时：, 16

字体文件信息读取（极速解析TTF字体名称）
一、替代原方法这里已经有人写过了，不过使用了随机读写，而且有一点错误。我来尝试一下仅用FileInputStre...
字体自定义
ttf文件结构解析从字体中提取ttf文件浏览：http://blog.csdn.net/kmyhy/articl...
IOS 自定义字体
概览: 1.支持的字体格式2.如何导入字体文件3.如何找到字体名称4.如何使用字体一.支持的字体格式 ttf 或...
使用自定义字体
向工程内添加*.ttf字体文件。检查*.ttf字体文件。在info.plist中，添加Fonts provid...
导入自定义字体库
导入外部字体 otf/ttf/ttc 1.Mac电脑下搜索字体册，找到字体的otf文件 2.导出字体，将字体文件拖...
Linux Mint字体发虚怎么办？
用微软雅黑字体在windows系统的文件夹里，找到 “msyhbd.ttf"和“msyh.ttf” 两个字体文件...
Unity中UGUI使用fnt位图字体
Cocos中，支持fnt格式的字体文件，但是Unity只支持ttf，和otf的字体文件，所以想用位图字体，就得自己...
使用自定义字体 TTF文件 - iOS
将TTF字体文件导入工程2.设置plist文件如下 3.遍历字体 4.设置字体 5 storyBoard 设置添...
vue中引入digital-液晶数字字体
1. 下载.ttf字体文件到本地，放在src中的assets文件下 2. 引入字体在css文件中引入字体 3. ...
Python 字体爬虫应对
思路：爬取网页 -> 提取信息及字体文件地址 -> 字体下载 -> 字体解析为 XML 文档 -> 将 XML 文...