ALTERNATE UNIVERSE DEV

CodingBlocks

Databases the SQL [see-kwuhl]

Welcome back for part 2 of the podcast about databases.  In this half, we discuss several of the things we believe that developers should know about databases.  From joins to unions, group by's and indexing, we try to touch on a lot of the items that most developers should at least be familiar with when working with database systems.  

News

Database Basics, and Maybe a TOUCH of Advanced Stuff

  • CROSS JOIN - cartesian product of two tables - every row in table 1 matched up with every row in table 2
    • Careful!  Doing this on large tables could crash your server!
  • INNER JOIN - where the only rows you get back is when the data in table 1 matches the data in table 2 on the join conditions
  • Outer Joins - LEFT OUTER, RIGHT OUTER, FULL OUTER
    • LEFT OUTER will return all records from the table on the left side of the join and any data that matches in the right table, otherwise the data in the right table will be nulled
    • RIGHT OUTER will return all records from the table on the right side of the join and any data that matches in the left table, otherwise the data in the left table will be nulled
    • FULL OUTER will return all data from both tables with the data that's common between the two tables fully filled in, otherwise, the data that's missing from each side will be nulled
  • Database Normalization
    http://en.wikipedia.org/wiki/Database_normalization
  • Checkout @SqlKris on Twitter - runs a database blog on learning SQL and very helpful in responding to questions on Twitter
    https://twitter.com/sqlkris
  • Refactoring databases can be very difficult - usually means refactoring a lot of application code, not to mention any stored procedures, views, etc that may live in the database
  • Outlaw is still 21....
  • Do you put your data interactions in a stored procedure or do you put that code in an application?
    • Pros would be that you've centralized your database "logic"
    • Where this doesn't work - if you need data from other systems and using linked servers is not an option
  • You can join tables across databases (at least in SQL Server)
  • Cardinality - one to one or one to many
  • To subtype or not to subtype a table?
    • If you decide to do this, you could have hundreds of tables and managing this through your application could be a major pain...but, the performance would be outstanding
    • If you don't do subtypes but you do the EAV route (Entity Attribute Value schema), it's easier to maintain but query performance wouldn't be as good as the subtyping
      http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
  • Set Operators
    • UNION - appends two recordsets together (and throws out duplicates)
    • UNION ALL - appends two recordsets together and keeps the duplicates
    • EXCEPT - returns all the rows in the first recordset unless it's in the second recordset
    • INTERSECT - returns all the rows that are common between the first recordset and the second recordset (similar to doing an INNER JOIN on every column being returned from the two tables being used)
  • Check out SQL Authority
    http://www.sqlauthority.com
  • Aggregating Data
    • Difference between a HAVING and a WHERE clause? - Interview question asked in every developer interview known to man!  :-)
    • GROUP BY - used to "group" or aggregate data based off the provided columns
      • Have to use a GROUP BY when doing an AVG (average) or a SUM or a MAX, MIN, etc.
      • Why no GROUP BY *????
    • DISTINCT or GROUP BY - can do similar things if you're trying to remove duplicate values
    • COUNT(DISTINCT...)
  • Row numbers - think paging - you want to get records between 100 and 120
    • Oracle - rownum
    • SQL Server 2005 and up - ROW_NUMBER()
    • mySQL - start drinking heavily
  • Windowed Functions in SQL Server - GLORIOUS
  • Is char...."char" as in you burnt our burgers, or is it "car" as in you drive it - PLEASE, leave your comment below!!!
  • nvarchar vs varchar - if you will EVER need to store UNICODE (international characters, etc.), then go nvarchar...if not, save the space and use varchar
  • To Guid or not to Guid?!  Why they suck as a primary key on your table (for performance)
  • Parameterized queries - USE THEM!
    OWASP in Episode 4
    https://www.owasp.org/index.php/Query_Parameterization_Cheat_Sheet
  • What about SQL Developers who want to program?
    • PHP
    • Perl (similar to what database guys do with scrubbing data)
    • Javascript - simple language to learn out of the box - extremely powerful with things like NodeJS

Performance in Databases

  • Indexes
    • Clustered Indexes - stores the data sorted in the table (makes your table a clustered table)
    • Non-clustered indexes - stored outside the table but points back to the records in the main table storage
    • Can index temp tables!  Sometimes necessary
    • SQL Server 2008 (and up) - Filtered Indexes
    • Creating a ton of indexes is not always the right solution!
    • Understanding fill factors - leaving space for wiggle room on an index
    • CAN be a performance bottleneck on inserts / updates

Resources We Like

Tips of the Week

Episode source