C++ Boost JSON解析库的使用

2018-06-22

2.6k 字

大约需要 11 分钟

最近在写一个C++项目的时候，有大量的配置信息，于是将这些配置信息整合进一个文本文件中，选择了JSON这种数据格式。C++在处理JSON数据的库有很多，比如Jsoncpp，Boost等，这个项目中由于本身就已经用到了Boost这个库，因此，也就选用Boost来进行JSON的解析了。

Boost的JSON解析，使用的是property_tree这个数据类型，它可以方便的解析XML和JSON。

一、Boost JSON解析库的几个注意事项

在具体介绍之前，必须要强调一下，这个库默认不是线程安全的！不是线程安全的！不是线程安全的！不做任何处理的情况下，如果直接在多线程的程序中使用Boost解析JSON，可能会在奇怪的时候报段错误。这是由于Boost的JSON解析是基于SPIRIT语法解析的，而SPIRIT本身就不是线程安全的，我们如果需要它支持线程安全，就必须加入一个宏#define BOOST_SPIRIT_THREADSAFE，把它放在引用boost的头文件的最开始就行。理论上，在编译的时候加入宏也是可以的。另一个需要注意的是，一般网上找的教程中，property_tree都是不支持unicode编码的，如果想要支持unicode，需要一些额外的操作。这个从网上可以查到，我尝试了一下，最终还是放弃了。取而代之的一个方案就是把中文的各种路径啥的，用软链接替换成英文和数字。之后世界就美好了。

二、boost::property_tree::ptree 类型

对于JSON或者XML，boost将他们解析之后都会生成一个ptree的数据结构。类似于下面的结构。

struct ptree
{
    data_type data;                         // data associated with the node
    list< pair<key_type, ptree> > children; // ordered list of named children
};

可以看出，这是一个很标准的树的结构。对于树中的每一个节点，都有自己的数值和子节点，每个子节点都有一个唯一的名字。data_type和key_type通常是std::string或std::wstring。如果希望处理unicode的字符串的话，就需要用到std::wstring了。下面的例子中，使用的全部都是std::string。

三、JSON文件的解析

首先，我们用一个小栗子，来介绍一下Boost是如何读取JSON数据的。

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>

#include <sstream>
#include <iostream>

int main() {

    std::string s = "{ \"a\": 100, \"b\": [1, 2, 3, 4] }";
    std::stringstream ss(s);
    boost::property_tree::ptree ptree;
    // 读取JSON数据
    boost::property_tree::read_json(ss, ptree);
    std::cout << ptree.get_child("a").data() << std::endl;
    // 写回JSON数据
    boost::property_tree::write_json("./tmp.json", ptree);
}

这里首先我们需要定义一个boost::property_tree::ptree类型的对象，之后通过boost::property_tree::read_json函数进行数据的读取，之后就可以使用各种ptree的接口进行数据的操作了。在boost/property_tree/json_parser.hpp文件中我们可以看到读写JSON的一些接口。

namespace boost {
    namespace property_tree {
        namespace json_parser {
            template<typename Ptree> 
            void read_json(std::basic_istream< typename Ptree::key_type::value_type > &, 
                           Ptree &);
            template<typename Ptree> 
            void read_json(const std::string &, Ptree &, 
                            const std::locale & = std::locale());
            template<typename Ptree> 
            void write_json(std::basic_ostream< typename Ptree::key_type::value_type > &, 
                            const Ptree &, bool = true);
            template<typename Ptree> 
            void write_json(const std::string &, const Ptree &, 
                            const std::locale & = std::locale(), bool = true);
        }
    }
}

它支持读写JSON，对于读取操作，它支持直接根据文件名称来加载JSON或者通过输入流来加载。输出也是相同。所以我们上面的Demo中，需要将字符串s转换成字符串流对象ss，之后才能进行加载。写文件支持写入到文件或者输出流中，最后一个bool值表示是否格式化输出json。

四、JSON对象的读取

我们知道JSON对象主要有两种格式：键值对和数组。JSON灵活就在于键值对的值还可以键值对或者数组，数组的每个元素也是。那么我们分别介绍键值对和数组的数据获取方式。

1）键值对的解析

ptree支持一个操作叫做get_child，可以根据键的名字，来获取子节点。而且这个名字还可以是累加的。什么叫可以累加呢？我们看一下下面的代码：

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>

#include <sstream>
#include <iostream>

int main() {
    std::string s = "{ \"a\": { \"b\":1, \"c\":2 }, \"d\":3 }";
    std::stringstream ss(s);
    boost::property_tree::ptree ptree;
    boost::property_tree::read_json(ss, ptree);
    std::cout << "input text:" << std::endl;
    boost::property_tree::write_json(std::cout, ptree);
    std::cout << "-------------------------" << std::endl;
    std::cout << "parse result: " << std::endl;
    std::cout << "a->b: " << ptree.get_child("a").get_child("b").get_value<int>() << std::endl;
    std::cout << "a->b: " << ptree.get_child("a.b").get_value<int>() << std::endl;
    std::cout << "a->c: " << ptree.get_child("a.c").get_value<int>() << std::endl;
    std::cout << "d: " << ptree.get_child("d").get_value<int>() << std::endl;
}

输出的结果为：

input text:
{
    "a": {
        "b": "1",
        "c": "2"
    },
    "d": "3"
}
-------------------------
parse result:
a->b: 1
a->b: 1
a->c: 2
d: 3

get_child这个函数，可以根据节点的名字，获取到子节点的ptree对象。这个节点的名字可以使用.连接各个层级的名称。get_value<Type>方法，可以获取节点的值，并且转换成期望的数据类型。如果我们就是想获取节点的值。不期望有任何转换，可以使用data这个函数。 get_child要求输入的名称路径必须是存在的，否则会抛异常。如果我们不知道某个名称路径是否存在的话，可以使用get_child_optional这个函数，如果路径不存在，该函数会返回boost::null。get_child_optional返回的类似于指针的结构，如果需要获取值，可以用这样的写法：pt.get_child_optional("some_key")->get_value<int>()。我们可以向现在这样通过各种树的操作，选择到我们的需要的节点，再通过get_value<Type>函数获取到数据值。但这样的操作有时候会有点繁琐。boost支持更简化的一些操作。下面是同样功能的一个例子：

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>

#include <sstream>
#include <iostream>

int main() {
    std::string s = "{ \"a\": { \"b\":1, \"c\":2 }, \"d\":3 }";
    std::stringstream ss(s);
    boost::property_tree::ptree ptree;
    boost::property_tree::read_json(ss, ptree);
    std::cout << "input text:" << std::endl;
    boost::property_tree::write_json(std::cout, ptree);
    std::cout << "-------------------------" << std::endl;
    std::cout << "parse result: " << std::endl;
    std::cout << "a->b: " << ptree.get<int>("a.b") << std::endl;
    std::cout << "a->c: " << ptree.get<int>("a.c") << std::endl;
    std::cout << "d: " << ptree.get<int>("d") << std::endl;
}

get这个函数相当于先get_child得到要找的节点，之后再调用get_value<Type>这个函数。get_value<Type>这个函数可以获取节点的值，同时把它转换成Type格式。即ptree.get<int>("a.b")等价于ptree.get_child("a.b").get_value<int>()。通过get函数，我们可以很方便的获取某个节点的数据，而且还能顺便完成类型的转换，真的不能更方便了！

2）数组的解析

为什么数组的解析要单独来说呢？因为，数组格式中，没有键，所以我们不能根据名字来获取节点了，所以读取的方式有了些许的不同。 Boost针对数组，给我们提供了遍历子节点的迭代器接口。可以十分方便的遍历某节点的所有的子节点（当然在键值对的解析中也可以使用）。

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>

#include <sstream>
#include <iostream>

int main() {

    std::string s = "[1, 2, 3, 4]";
    std::stringstream ss(s);
    boost::property_tree::ptree ptree;
    boost::property_tree::read_json(ss, ptree);
    boost::property_tree::write_json(std::cout, ptree);

    // visit array data
    for (boost::property_tree::ptree::iterator it = ptree.begin(); it != ptree.end(); ++ it) {
        std::cout << it->second.get_value<int>() << " ";
    }
    std::cout << std::endl;
    
    // simpler in c++11
    for (auto it: ptree) {
        std::cout << it.second.get_value<int>() << " ";
    }
}

打印的结果：

{
    "": "1",
    "": "2",
    "": "3",
    "": "4"
}
1 2 3 4
1 2 3 4

可以看出，Boost中将JSON数组也是按照键值对的方式去存储，只是键的内容是一个空的字符串。迭代器的first是键的结果，数组中就是空字符串。second就是我们的值。

3）其他的实用接口

bool empty(): 返回该节点是否含有子节点。比如当一个节点已经是叶子节点的时候，可以用这个函数来判断。
assoc_iterator find(const key_type &key): 给定一个名字路径，返回指向该节点的迭代器或者boost::property_tree::ptree::not_found。
size_type count(const key_type &key): 返回指定名称路径的节点的子节点的数目。

五、JSON对象的编辑

Boost支持很多的对JSON对象的写的操作，但是我在项目中没有用到，所以在这里暂时就没有动力整理下去了~~ 这里附上Boost ptree的文档，方便大家查阅：https://www.boost.org/doc/libs/1_65_1/boost/property_tree/ptree.hpp

六、疑难杂症

1.怎么判断某个键是否存在？

使用get_child_optional，再判断返回是否为boost::null，这个对象直接相当于false。

auto node = ptree.get_child_optional("somekey");
if (!node) {
    // node not exists
}

2.怎么方便的遍历数组？

这个功能，我还专门查过。其实懂了之前的迭代器的使用，就能方便的遍历了。下面是我用的一个代码。

template <typename T>
std::vector<T> as_vector(boost::property_tree::ptree const &pt, boost::property_tree::ptree::key_type const& key) {
    std::vector<T> r;
    for (auto &item: pt.get_child(key)) {
        r.push_back(item.second.get_value<T>());
    }
    return r;
}

使用的话就这样：

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>

#include <sstream>
#include <iostream>
#include <vector>

template <typename T>
std::vector<T> as_vector(boost::property_tree::ptree const &pt, boost::property_tree::ptree::key_type const& key) {
    std::vector<T> r;
    for (auto &item: pt.get_child(key)) {
        r.push_back(item.second.get_value<T>());
    }
    return r;
}

int main() {

    std::string s = "{\"arr\": [1, 2, 3, 4]}";
    std::stringstream ss(s);
    boost::property_tree::ptree ptree;
    boost::property_tree::read_json(ss, ptree);
    boost::property_tree::write_json(std::cout, ptree);

    auto result = as_vector<int>(ptree, "arr");
    for (auto &&d: result) {
        std::cout << d << " ";
    }
}

不过这个解决方案有个问题，就是如果根节点就是数组的话，似乎就不能很好的work了。

3.怎么解析中文

/(ㄒoㄒ)/~~

转载请注明出处，谢谢！