Basic concepts in elastic search
发布日期:2021-08-23 22:52:30 浏览次数:2 分类:技术文章

本文共 6428 字,大约阅读时间需要 21 分钟。

There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process.

Near Realtime (NRT)

Elasticsearch is a near-realtime search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

Cluster

A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.

Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster. For instance you could use logging-devlogging-stage, and logging-prod for the development, staging, and production clusters.

Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.

Node

A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. You can define any node name you want if you do not want the default. This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

A node can be configured to join a specific cluster by the cluster name. By default, each node is set up to join a cluster named elasticsearch which means that if you start up a number of nodes on your network and—assuming they can discover each other—they will all automatically form and join a single cluster named elasticsearch.

In a single cluster, you can have as many nodes as you want. Furthermore, if there are no other Elasticsearch nodes currently running on your network, starting a single node will by default form a new single-node cluster named elasticsearch.

Index

An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.

In a single cluster, you can define as many indexes as you want.

 

Type(Deprecated in 6.0.0.)

A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, e.g. one type for users, another type for blog posts. It is no longer possible to create multiple types in an index, and the whole concept of types will be removed in a later version. See  for more.

 

 

Document

A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in  (JavaScript Object Notation) which is a ubiquitous internet data interchange format.

Within an index/type, you can store as many documents as you want. Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.

 

Shards & Replicas

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

Sharding is important for two primary reasons:

  • It allows you to horizontally split/scale your content volume
  • It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.

In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

Replication is important for two primary reasons:

  • It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
  • It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards).

The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may also change the number of replicas dynamically anytime. You can change the number of shards for an existing index using the  and  APIs, however this is not a trivial task and pre-planning for the correct number of shards is the optimal approach.

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

 

转载地址:https://blog.csdn.net/weixin_33804582/article/details/86030399 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:SQL Server 2014内存优化表的使用场景(转载)
下一篇:Docker入门学习

发表评论

最新留言

路过按个爪印,很不错,赞一个!
[***.219.124.196]2024年03月22日 04时17分57秒

关于作者

    喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!

推荐文章

电梯运行仿真c语言代码,电梯调度算法模拟(示例代码) 2019-04-21
android组件动态接收数据库,Android开发——fragment中数据传递与刷新UI(更改控件)... 2019-04-21
云麦小米华为体脂秤怎么样_云康宝和华为智能体脂秤对比评测,实际体验告诉你哪款更好... 2019-04-21
linux 条件判断 取非_Linux awk 系列文章之 awk 多重条件判断 2019-04-21
c语言中如何将字符串的元素一个一个取出_C语言中常用的字符串操作函数 2019-04-21
json mysql 字段 默认值_Newtonsoft.Json 六个超简单又实用的特性,值得一试 【上篇】... 2019-04-21
ocdma相干非相干_《Acconeer 60GHz脉冲相干雷达芯片:A111》 2019-04-21
修改表格字体颜色_Excel技巧:Excel如何修改字体颜色 2019-04-21
native react 变颜色 点击_React Native主动更改StackNavigator标头颜色 2019-04-21
prism项目搭建 wpf_WPF MVVM使用prism4.1搭建 2019-04-21
python发微信红包群_用Python实现微信自动化抢红包,再也不用担心抢不到红包了... 2019-04-21
python中func自定义函数_Python函数之自定义函数&作用域&闭包 2019-04-21
wget连接指定端口_端口通不通 telnet wget ssh 2019-04-21
eureka 调用服务_Spring Cloud微服务架构从入门到会用(二)—服务注册中心Eureka... 2019-04-21
easyexcel 工具类_问了个在阿里的同学,他们常用的15款开发者工具! 2019-04-21
mysql统计结果大于0时返回true_mysql表查询练习 2019-04-21
c语言对结构体排序中间变量,求助:c语言怎么实现结构体的排序? 总是弄不对啊... 2019-04-21
c语言宏定义只能在最前面吗,C语言宏定义注意事项 2019-04-21
android悬浮窗服务卡死,Android 悬浮窗兼容问题谈 2019-04-21
表格相关的html语言,HTML标记语言——表格标记 2019-04-21