sql server - Azure SQL Performance/Best Practices: Partitioned Data vs Lots of rows -
scenario: multiple customers create "objects" stored inside of "customerobject" table. let's looks this:
customerobject:
id bigint customerid bigint type int jsondynamicproperties nvarchar(max)
- each customer create somewhere around 50,000 objects.
- there around 1000 customers.
- total objects system need track around 50-75 million.
- read , write operations split 50/50
environment:
- asp.net core
- entity framework core
- azure sql
my question in reference performance , best practices:
at point (if ever) make sense give each customer own objects table vs having objects live in same table?
does having 1000 or more tables have more of performance impact having 50-70 million rows in single table hitting time?
when using entity framework core, can hydrate customerobject data model using different tables, depending on customer running query?
are there other immediate gotchas can point out come mind?
thanks guidance can provide!
sql azure/sql server better different schema. reading , writing large blobs day proposed design, , performance bottleneck on io compared more optimal logical , physical database design pattern. (in other words, doing code-first techniques fast code , slower execute in case).
i try answer basic questions first caveat want approach problem differently are:
re: 1 table vs. n tables: sql server , sql azure create query plans , cache them. compilation of plans can expensive in cases, makes sense have fewer tables same schema reduce compilation overhead in sql. (you have not done schema other json blob, , overhead of reading , writing blobs far optimal until address in application).
re: 1000 tables vs. 1 big table: assuming have right indexing done, sql can have table seek-for-customer pattern billions of rows without sweating. don't want 1000s of tables because of compilation overhead, want make sure queries seek on specific enough avoid io (logical or physical) app perform well/near optimally.
re: ef + loading multiple tables: no ef expert (i sql expert), believe issue goes away given answers 1 + 2.
now try give guidance on how solve problem more efficiently. since pay performance in sql azure, should save money allowing run in smaller database reservation size.
there 2 main patterns can try, , depend bit on whether have open schema or fixed schema. if allow customers create arbitrary values in json blob (open schema), want consider property bag or entity attribute value (eav) pattern. like: create table eav (customerid bigint, attribute nvarchar(100), value sqlvariant) want create clustered index on customerid, attribute.
this pattern allow read specific values index seeks or scan whole customer's attributes if need read everything. values collocated on disk , updates io data , log subset of attributes touch. (nvarchar(max) support partial writes, it's ef going read , write whole blob each time if had guess how implemented).
eav patterns not quite easy if want update many attributes @ once since need write out n update statements or 1 larger 1 set of rows update.
if have known set of columns, better create table columns each one. (sql has "sparse" column attribute can use largely-null columns). allows index each field if needed , can allow more complex applications (beyond get-put apps) perform well. so, if wanted search customer sales on tuesdays , greater amount of money, schema pattern allows indexing models make queries perform (assuming b-tree indexes being used).
please note sql server/sql azure support table partitioning. large tables, can management benefits using partitioning. however, if have indexed application this, not need partitioning main-line data access performance fast since can index seek.
hope helps! conor cunningham architect, sql core engine
Comments
Post a Comment