When to use horizontal partitioning and when to use database sharding? -


i'm reading article on wikipedia: http://en.wikipedia.org/wiki/shard_(database_architecture) trying find major difference between these 2 techniques. here found:

horizontal partitioning splits 1 or more tables row, within single instance of schema , database server. may offer advantage reducing index size (and search effort) provided there obvious, robust, implicit way identify in table particular row found, without first needing search index, e.g., classic example of 'customerseast' , 'customerswest' tables, zip code indicates found.

sharding goes beyond this: partitions problematic table(s) in same way, across potentially multiple instances of schema. obvious advantage search load large partitioned table can split across multiple servers (logical or physical), not multiple indexes on same logical server.

as understood, horizontal partitioning more applicable single instance (single node environment) whereas sharding used in multi-node / multiple data-center environment. correct? or there different usage scenarios ?

extra question: huge table (with millions of rows) simple schema (about 4-5 columns) , best technique improve read/write performance on table ?

you correct, horizontal partition (supported example in mysql , postgresql) splits table within single server. can improve performance because data , indexes can split across many disk volumes, improving i/o. done key range.

with database sharding, dividing data across multiple servers, not within single server. in case use shard key partition data, typically sort of hashing algorithm. can white paper on subject here (offered our company, not specific product, explains technology): http://www.codefutures.com/database-sharding-white-paper/

the advantage of dbms single server partitioning is relatively simple set , manage. disadvantage limited single server can do. particularly case when comes heavy write contention, database locking , heavy queries.

database sharding takes more work, has advantage of being shared-nothing approach, scalable.

a clear indicator database sharding needed when single server cannot keep write volume. if have many heavy queries, can drive need type of solution.

having said of this, if talking "millions" of rows 4 - 5 columns, , reads can well-indexed fast access, doubtful if need implement either of these options. when talking 100s of millions or billions of rows, 1000s of users, database scalability critical.

there informational web site working on regarding database scalability: www.bigdatascalability.com. includes links various articles, , have new content added on time.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -