【搜索文章】:搜索(es)+ 搜索记录(mongodb)+ 搜索联想词
|总字数:2.6k|阅读时长:10分钟|浏览量:|
需求
用户输入关键字时,可以检索出结果,



ElasticSearch(搜索)
准备工作
- 使用docker安装es,配置ik分词器
- 重新建一个search模块,用来写搜索微服务的业务代码
- 导入es的依赖
- 配置RestHighLevelClient
@Getter @Setter @Configuration @ConfigurationProperties(prefix = "elasticsearch") public class ElasticSearchConfig { private String host; private int port;
@Bean public RestHighLevelClient client(){ System.out.println(host); System.out.println(port); return new RestHighLevelClient(RestClient.builder( new HttpHost( host, port, "http" ) )); } }
|
spring: autoconfigure: exclude: org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration elasticsearch: host: 192.168.140.102 port: 9200
|
- 初始化索引库数据(项目上线之前需要批量导入):
@Autowired private ApArticleMapper apArticleMapper;
@Autowired private RestHighLevelClient restHighLevelClient;
@Test public void init() throws Exception { List<SearchArticleVo> searchArticleVos = apArticleMapper.loadArticleList(); BulkRequest bulkRequest = new BulkRequest("app_info_article"); for (SearchArticleVo searchArticleVo : searchArticleVos) { IndexRequest indexRequest = new IndexRequest().id(searchArticleVo.getId().toString()) .source(JSON.toJSONString(searchArticleVo), XContentType.JSON); bulkRequest.add(indexRequest); } restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT); }
|
文章搜索
- 单一条件查询:直接放入SearchSourceBuilder
如果查询逻辑简单,只有一个独立条件,可以直接将条件放入SearchSourceBuilder的query方法中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(QueryBuilders.termQuery("status", "active"));
|
- 组合多个条件:必须使用BoolQueryBuilder,当需要组合多个条件(如 AND/OR/NOT 逻辑)时,必须显式使用 BoolQueryBuilder。
类型 |
作用 |
是否影响评分 |
是否可缓存 |
must |
子条件,必须满足,类似逻辑 AND |
✅ 是 |
❌ 否 |
filter |
子条件 必须满足,但不参与相关性评分 |
❌ 否 |
✅ 是(可缓存) |
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery() .must(QueryBuilders.termQuery("status", "active")) .must(QueryBuilders.rangeQuery("age").gte(18)) .should(QueryBuilders.termQuery("tag", "urgent")) .mustNot(QueryBuilders.termQuery("deleted", true));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(boolQuery);
|
虽然技术上可以将所有查询都包装成 BoolQuery,但直接使用单一条件更简洁
private final RestHighLevelClient restHighLevelClient; @Override public ResponseResult search(UserSearchDto dto) throws IOException { if(dto == null || StringUtils.isBlank(dto.getSearchWords())) { return ResponseResult.errorResult(AppHttpCodeEnum.PARAM_INVALID); } SearchRequest searchRequest = new SearchRequest("app_info_article"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); BoolQueryBuilder boolQuery = QueryBuilders.boolQuery(); QueryStringQueryBuilder queryStringQueryBuilder = QueryBuilders.queryStringQuery(dto.getSearchWords()) .field("title") .field("content") .defaultOperator(Operator.OR); boolQuery.must(queryStringQueryBuilder); RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("publishTime") .lt(dto.getMinBehotTime().getTime()); boolQuery.filter(rangeQueryBuilder); searchSourceBuilder.from(0); searchSourceBuilder.size(dto.getPageSize()); searchSourceBuilder.sort("publishTime", SortOrder.DESC); HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.field("title"); highlightBuilder.preTags("<font style='color: red; font-size: inherit;'>"); highlightBuilder.postTags("</font>"); searchSourceBuilder.highlighter(highlightBuilder); searchSourceBuilder.query(boolQuery); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = searchResponse.getHits().getHits(); List<Map> list = new ArrayList<>(); for (SearchHit hit : hits) { String json = hit.getSourceAsString(); Map map = JSON.parseObject(json, Map.class); if(hit.getHighlightFields() != null && hit.getHighlightFields().size() > 0) { Text[] titles = hit.getHighlightFields().get("title").getFragments(); String title = StringUtils.join(titles); map.put("h_title", title); }else { map.put("h_title", map.get("title")); } list.add(map); } return ResponseResult.okResult(list); }
|
新增文章创建索引

思路:文章审核成功后使用kafka发送消息,文章微服务是消息的生产者;搜索微服务接收到消息后,添加数据到索引库,搜索微服务是消息的消费者。
- 文章微服务(生产者)
到yml中配置生产者:
spring: kafka: bootstrap-servers: 192.168.140.102:9092 producer: retries: 10 key-serializer: org.apache.kafka.common.serialization.StringSerializer value-serializer: org.apache.kafka.common.serialization.StringSerializer
|
往消息队列中发送消息:
SearchArticleVo searchArticleVo = new SearchArticleVo(); BeanUtils.copyProperties(article, searchArticleVo); searchArticleVo.setContent(dto.getContent()); searchArticleVo.setStaticUrl(path); kafkaTemplate.send(ArticleConstants.ARTICLE_ES_SYNC_TOPIC, JSON.toJSONString(searchArticleVo));
|
- 搜索微服务(消费者)
到yml中配置消费者:
spring: kafka: bootstrap-servers: 192.168.140.102:9092 consumer: group-id: ${spring.application.name} key-deserializer: org.apache.kafka.common.serialization.StringDeserializer value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
|
mongodb(搜索记录)
需要给每个用户保存一份搜索记录,数据量大,要求加载速度快,通常这样的数据存储到mongodb更合适,不建议存到mysql中。
- mongodb:
- 支持分片,适合存储用户搜索日志这种持续写入的场景
- 基于磁盘存储,成本低
- mysql:
- 对高频写入(如每秒数千次插入)的支持较弱
- 搜索记录通常是半结构化或非结构化数据,需频繁变更表结构来适应新字段
- redis:
- redis基于内存的,内存成本高,适合存储热数据(如缓存)
- Redis 的 RDB 快照和 AOF 日志是异步持久化机制,在宕机时可能丢失部分数据
- 数据量过大时,从磁盘加载备份到内存的恢复过程耗时较长
- MongoDB:适合作为主存储,满足海量数据、灵活查询、低成本持久化的核心需求。
- Redis:适合作为缓存层,加速近期数据的访问,但无法替代 MongoDB 的长期存储角色。
- MySQL:不适合高频写入和非结构化日志场景。
准备工作
1. 配置环境
使用docker安装mongodb:
docker run -di \ --name mongo-service \ --restart=always \ -p 27017:27017 \ -v ~/data/mongodata:/data \ mongo
|
2. springboot集成mongodb
- 添加mongodb依赖:
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-mongodb</artifactId> </dependency>
|
- 配置mongodb:
spring: data: mongodb: host: 192.168.140.102 port: 27017 database: leadnews-history
|
- 映射
@Data @Document("ap_associate_words") public class ApAssociateWords implements Serializable { private static final long serialVersionUID = 1L; private String id;
private String associateWords;
private Date createdTime;
}
|
- 核心方法
@Autowired private MongoTemplate mongoTemplate;
@Test public void saveTest(){ ApAssociateWords apAssociateWords = new ApAssociateWords(); apAssociateWords.setAssociateWords("黑马头条"); apAssociateWords.setCreatedTime(new Date()); mongoTemplate.save(apAssociateWords); }
|
@Test public void saveFindOne(){ ApAssociateWords apAssociateWords = mongoTemplate.findById("67a330c35faec30826dcbe8e", ApAssociateWords.class); System.out.println(apAssociateWords); }
|
@Test public void testQuery(){ Query query = Query.query(Criteria.where("associateWords").is("黑马头条")) .with(Sort.by(Sort.Direction.DESC,"createdTime")); List<ApAssociateWords> apAssociateWordsList = mongoTemplate.find(query, ApAssociateWords.class); System.out.println(apAssociateWordsList); }
|
@Test public void testDel(){ mongoTemplate.remove(Query.query(Criteria.where("associateWords").is("黑马头条")),ApAssociateWords.class); }
|
保存搜索记录

用户搜索后,为了让用户能更快的得到搜索的结果,异步发送请求记录关键字。

private final MongoTemplate mongoTemplate;
@Override @Async public void save(String keyword, Integer userId) { Query query = Query.query(Criteria.where("userId").is(userId) .and("keyword").is(keyword)); ApUserSearch apUserSearch = mongoTemplate.findOne(query, ApUserSearch.class); if(apUserSearch != null) { apUserSearch.setCreatedTime(new Date()); mongoTemplate.save(apUserSearch); return; } apUserSearch = new ApUserSearch(); apUserSearch.setUserId(userId); apUserSearch.setKeyword(keyword); apUserSearch.setCreatedTime(new Date()); Query query1 = Query.query(Criteria.where("userId").is(userId)); query1.with(Sort.by(Sort.Direction.DESC, "createdTime")); List<ApUserSearch> apUserSearches = mongoTemplate.find(query1, ApUserSearch.class); if(apUserSearches == null || apUserSearches.size() < 10) { mongoTemplate.save(apUserSearch); }else { ApUserSearch lastUserSearch = apUserSearches.get(apUserSearches.size() - 1); mongoTemplate.findAndReplace(Query.query(Criteria.where("id").is(lastUserSearch.getId())), apUserSearch); } }
|
在之前写的文章搜索的业务代码中,异步调用“保存搜索记录”的方法。
其中:userId通过app网关的过滤器
拦截到前端发过来的userId,并把userId放到请求头中传给搜索微服务,搜索微服务的拦截器
获取app网关发来的userId,存到ThreadLocal中。
注意:由于是异步调用save方法,是又开了一个线程,此时这个线程是没办法从ThreadLocal中获取到userId,只能通过主线程传过来。
查询搜索历史
public ResponseResult findUserSearch() { ApUser user = AppThreadLocalUtil.getUser(); if(user == null) { return ResponseResult.errorResult(AppHttpCodeEnum.NEED_LOGIN); } List<ApUserSearch> list = mongoTemplate.find(Query.query(Criteria.where("userId").is(user.getId())) .with(Sort.by(Sort.Direction.DESC, "createdTime")), ApUserSearch.class); return ResponseResult.okResult(list); }
|
根据用户id和当前某个用户的id查找记录,并按照创建时间降序排列。
删除某一个历史记录
public ResponseResult delUserSearch(HistorySearchDto dto) { if (dto.getId() == null) { return ResponseResult.errorResult(AppHttpCodeEnum.PARAM_INVALID); } ApUser user = AppThreadLocalUtil.getUser(); if(user == null) { return ResponseResult.errorResult(AppHttpCodeEnum.NEED_LOGIN); } mongoTemplate.remove(Query.query(Criteria.where("userId").is(user.getId()) .and("id").is(dto.getId())), ApUserSearch.class); return ResponseResult.okResult(AppHttpCodeEnum.SUCCESS); }
|
根据用户id和当前某个搜索记录的id进行删除
搜索联想词
搜索词(数据来源)
使用网上搜索频率较高的一些词:
- 自己维护联想词:通过分析用户搜索频率较高的词,按照排名作为搜索词
- 第三方获取:5118…
导入联想词

实现
正则表达式:

@Override public ResponseResult search(UserSearchDto dto) { if(StringUtils.isBlank(dto.getSearchWords())) { return ResponseResult.errorResult(AppHttpCodeEnum.PARAM_INVALID); } if(dto.getPageSize() > 20) { dto.setPageSize(20); } String regexStr = ".*?\\" + dto.getSearchWords() + ".*"; Query query = Query.query(Criteria.where("associateWords") .regex(regexStr)) .limit(dto.getPageSize()); List<ApAssociateWords> list = mongoTemplate.find(query, ApAssociateWords.class); return ResponseResult.okResult(list); }
|
其实搜索联想词,就是提前先把词库导入到mongodb表中,用户在输入的时候,就会对这个表进行模糊查询,遇到符合条件的就立马匹配。