SQL Server to Postgres – Gotchas

Paradise via Purgatory

Postgres is a great alternative to SQL Server.  It operates differently from SQL Server however, and some of these disparities tripped me up when I first switched.  This post serves as a map of common pitfalls to avoid.  It is not exhaustive, by any means.  It is simply a list of the most time consuming mistakes I incurred as a newbie.


Identifier Case Issues

Often, the first issue with Postgres that developers run into when migrating from SQL Server is how Postgres handles case in identifiers.  By default, if you execute the following SQL statement:

Postgres quietly converts it to this:

Even more frustrating, if you use the PGAdmin UI designer for creating tables, it will create the table with quoted identifiers:

For maintenance purposes, quoted identifiers suck.  It makes all SQL statements very hard to read, and embedded SQL almost impossible to read.  For that reason, I use lower case for all identifiers.  The most readable form of the above statement is:

However, in my experience, choosing separating underscores for identifier names creates headaches when integrating ORMs or micro ORMs.  With Entity Framework, all the table and column names must be manually mapped to their respective class definitions.  With Dapper (my micro ORM of choice), properties must be named in the same style as the column, e.g. Col_One.  Um, no thanks.  Instead, just bite the bullet and remove the underscore from the table definition.  A little bit of readability is lost in the database, but that is offset by the readability gained in the application.

Identifier case management is straight-forward as long as you know how it works before you start building your application.  Choose a strategy and then test it before moving too far forward. Make sure it meets your specific needs.  I also suggest not using GUI tools to define your database objects to avoid unintended identifier conversions.


Query Case Issues

Case issues can also trip up the unwary developer when string comparisons are made in queries.  String comparisons in Postgres are case sensitive.  In SQL Server this query:

will return the same results as this query:

In Postgres, they will return two different result sets.  Over time, I have come to appreciate the greater control this gives me.  That said, it does require some planning.  Case insensitive comparisons have to be executed explicitly.  There are a couple of ways to go about this.  I find the easiest way is to use the UPPER (or LOWER) function for both sides of the equality test.  There is a performance penalty incurred, however.  Adding the upper or lower functions forces a table scan, even on indexed columns.  It is therefore important to create an index that properly handles this scenario. If you plan on performing case insensitive queries on tables of any real size and/or with any real frequency, try this:

There are also options to handle pattern matching (queries that use the like operator), so investigate index creation based your specific needs.  It is also important to note that, while simple to implement, this method will not work with all languages.  If you have more complex use cases, check out the documentation.


Tools and Keepalive

PGAdmin has the very annoying habit of crashing every time it is left in the background for more than a few minutes.  This is due to the fact that idle connections are closed after a short interval, by default (the crash part is due to the fact that PGAdmin sucks).  I lived with this an embarrassingly long time before I found the solution here.  Regardless of the scenario, if you find your connections getting reset, the keepalive settings are a good starting point for your investigation.


Connection Pool Issues

Once your server is placed under load, connection timeout related errors may start occurring.  Usually, the reason for this is that the connection pool is too small.  The default max connection pool size in Npgsql is 20.  I find that to be a little low for most scenarios.  If your application has a lot of users, or long running queries, you will hit that limit quickly.  Start bumping up the max until your resource contention issues subside.


Is that it?

No.  There are plenty of other missteps I made.  Especially of note, I found that I had to pay more attention to config settings on the server with Postgres.  I also found myself paying more attention to performance related settings both in client and server configurations.  I may go over those in detail in the future, but this post has been a good start.

SQL Server to Postgres – Tradeoffs

Nothing is free

Moving from SQL Server to Postgres does not come without some tradeoffs.  Some of these tradeoffs are real, others not so much.  We will look at three specifically: tooling, performance, and institutional knowledge.



Let’s start with tooling because, for me, it has been the most the most painful tradeoff.  SQL Server has first class support up and down the .Net Stack.  From tools to libraries, SQL Server just works.  Postgres…. not so much.

SQL Management Studio is a great interface to your databases that works right out of the box when SQL Server is installed.  It has decent auto-complete support, a great execution plan analyzer, and a tabbed interface that is immediately recognizable to any Visual Studio user.  Postgres comes with PGAdmin out of the box.  It is… well, it’s not good.  PGAdmin III was a mean joke played on people with eyeballs, rather than a useful tool.  PGAdmin IV is a ground up rewrite, and makes considerable strides, but it inexplicably retains a lot of its predecessor’s UX sins.  I am still deciding whether or not I like PGAdmin IV, but I found even Powershell was a better choice than PG Admin III.  There are other more capable management tools out there for people with a few bucks to spend, but this discussion is limited to direct trade-offs.

My biggest personal loss when moving to Postgres was Redgate’s SQL tools.  I had come to rely on these to a disconcerting degree, and have yet to find a direct replacement in the Postgres world.   A partial solution can be pieced together with products from multiple vendors and/or free sources, but they are just not the same bang for your buck/time.

Within Visual Studio, the story is quite a bit better.  Npgsql is a great, full-feature library for communicating with Postgres.  It works almost identically to its SQL Server counterparts, and includes support for both basic ADO.Net functionality and Entity Framework.  It requires a bit more attention to detail, but the intellectual overhead is minimal.  There are also some extra steps to get EF designers working, but again, it is relatively pain free.  Npgsql will not be your roadblock.



When researching Postgres, you immediately run into the widely held opinion that it is slow.  Let’s put this one to bed right now.  It is not.  The Postgres team has made improved performance a major goal since around version 8 and it shows.  Is it slower than MySQL?  Maybe.  It does have to pay attention to pesky little things like data integrity, so yeah, maybe it is a smidge slower.  I’ll happily take that trade-off.

What about SQL Server?  My goal when evaluating the performance of Postgres in relation to SQL Server was not to decide if Postgres was more performant than SQL Server.  My goal was to determine if its performance was comparable under similar, basic loads.  I was interested in the type of load a basic line of business application would place on the server.  Once servers are scaled to handle heavy loads for popular web apps or ETL functions, all databases start falling over if they aren’t tuned correctly. There are case studies online where both Postgres and SQL Server handle avalanches of data, if you are interested.  I was worried about the 80% cases.

Does Postgres perform in an acceptable manner, without any additional configuration, when executing basic reads and writes?  I ran a little test to find out.  I will state clearly from the beginning: this test is brain dead and unscientific.  I wrote code that would never pass production muster and gave no thought to performance tuning.  I created the same eight column table with a mix of data types in both databases, hosted on a local VM.  In a console app, I used the built-in ADO.Net (and Npgsql) classes (commands and data adapters) to insert 1,000 records into each database and read back all the rows in the table. I ran the test 5 times and wrote out the results to the console (all times are in milliseconds):


The results were not what I expected, to be honest.  It just goes to show that SQL Server’s implicit transactions are not your friend.  It also illustrates my point that the wailing Greek chorus of performance woes in regards to Postgres is overblown and out of date.  In short, if you are deciding whether or not to use Postgres, don’t let performance doubts hold you back.


Institutional Knowledge

When deciding to replace any tool, it is important to recognize all the knowledge that your staff has built up over the years, and examine how much of that knowledge will transfer to the new tool.  Databases are especially interesting in this regard; they span and surround your tech stack.  There is the database itself with its schema, its functions, the data, etc. There is your code with its interfaces to the database and its related tooling.  Then there are all the tools and processes that are tangential, but important, to the storing of data.  The admin tools, the backup, recovery and replication processes.  Any ETL integrations.  The list can go on and on.  It is not something to take lightly, but it is often overlooked.

In terms of basic CRUD, SQL Server and Postgres translate pretty well.  Basic SQL statements are the same, or very similar.  There are different data types, but they usually have a direct mapping.  I personally found the transition from stored procedures to Postgres functions a little perplexing, but that leveled out in short order.  In your code, if standard data access methods (ADO.Net, EF, etc.) are used, the transition is almost seamless.  There are a few gotchas I will cover in another post, but there is a ton of knowledge overlap there.

The processes outside of development are where the differences really start to add up.  Backup, recovery, replication, etc. are handled very differently than they are in SQL Server.  That isn’t to say that Postgres doesn’t have methods to handle these processes.  The knowledge of one just does not translate to the other.


Should I?

Switching from one technology to another always has tradeoffs.  Deciding whether or not the switch makes sense is unique to your situation, and that decision can only be made after identifying and weighing the trade-offs.  For me, the cost-savings of Postgres outweighed what I lost when I switched from SQL Server.

SQL Server to Postgres – Introduction

Where are the .Net products?

While .Net is an integral part of most enterprise development departments, it lacks highly visible public products and web projects (Stack Overflow being the most notable exception).  Why is this?  A serious barrier to adoption is database technology.  Most tooling and support for .Net data access revolves around SQL Server.  SQL Server is a great database, but its price makes it a non-starter for most up and coming product companies.  However, there is a viable low cost/no cost alternative to SQL Server that allows companies to scale up with minimal pain.

I have used PostgreSQL on my last two .Net projects, and despite a few bumps in the road, I have been pleasantly surprised by the results.  In the next few posts, I plan to examine how Postgres can be integrated into a successful .Net project.


What’s wrong with SQL Server?

From a technical standpoint, nothing is wrong with SQL Server.  It is a fine database, and if money was not a concern I would not being writing this post.  However, money makes the world go around, especially in startups and small software product shops, and SQL Server is very expensive.  Let’s use a real world SQL Server install as an illustration.  Using Nick Craver’s excellent post on Stack Overflow’s infrastructure as a guide, it potentially costs those guys almost $600,000 (retail prices found here) to run SQL Server Enterprise across 40 total cores.  Granted, Stack Overflow is a huge website.  We also don’t know their particular costs (yay for opaque enterprise pricing!), their upgrade path, or many other important variables.  So, what about a startup scenario instead?  Using those same retail prices, installing SQL Server Enterprise on a single eight core server will cost almost $115,000.  That is almost the cost of another full-time developer. That is a huge chunk out of a startup’s budget and doesn’t take recovery, replication or failover installations into account.  Could a company get by with the Standard or even Express editions when starting out?  Probably, but if you have any success at all, you will eventually have to scale up to Enterprise and it is going to hurt when you do.  Also, because each new DB server brought online will be a massive hit to the bottom line, you are forced into a scale up strategy from the start whether it makes technical sense or not.  Another, minor to me, concern is portability.  All things being equal, Linux installations run in a smaller footprint than Windows and allow for more flexibility when it comes to scaling.  The SQL team has released a preview of SQL Server running on Linux, but it will be quite a while before anybody trusts it with a real, mission-critical workload.  Postgres has been running on multiple platforms for a long time.


Why Postgres (and why not MySQL)?

Postgres is the best SQL Server alternative for .Net developers.  It contains most of the major features of SQL Server, and its dialect of SQL is very easy for .Net developers to learn.  Most of the application level data access tooling for Postgres already exists, and is actively maintained.  It is OSS, free, and has a very liberal license.  Postgres has a reputation for being a performance laggard, but I find that particular complaint to be overblown and outdated.  We will explore the performance characteristics in more depth later, but the bottom line is that I have no complaints so far.

Tangentially, I should mention why I choose not to use the most popular OSS database, MySQL.  First, the dual licensing model is burdensome.  Can you use it for free or not?  For most websites, I think it can be used for free without getting entangled in GPL issues.  But, if I wanted to waste time on license issues, I could waste that time reading SQL Server’s 31 page(!) licensing guide.  Second, the reliability is problematic.  To achieve its impressive performance, MySQL installations may cut a few corners where data integrity is concerned, and I prefer to keep those corners intact.  That is not to say that MySQL is not a good database for a lot of scenarios, it just doesn’t suit me.


Why bother with .Net at all?

I happen to really like programming on the .Net stack.  I really like C#, and love F#.  Visual Studio, while it certainly has issues, is the nicest IDE I have used (and I have used a lot of them).  It has been steadily enhanced every couple of years since I started using it in 2000, and it has rapidly improved its third party integration story in the last five years or so.  Since its inception, Microsoft has had a really strong language and dev tools team, and I haven’t found another toolset that suites me so well.  The addition of VS Code on OSX has more or less completed the circle for me.  It has become my preferred environment for Node, .Net Core, and Python development.


tldr; (Really? It isn’t even a thousand words!)

While SQL Server is a great database server, its cost and lack of portability are impediments to greater acceptance of the .Net environment.  PostgreSQL is an attractive alternative, once you understand the tradeoffs and avoid a few gotchas.  We will explore those tradeoffs and pitfalls in the next two posts.

The Single Developer’s Guide to Scrum

Over the years I have been exposed to, sometimes against my will, most development methodologies. I have been on Waterfall teams, Agile teams, and have been a part of the obligatory “We don’t need no stinking process” team. I find though, that the one process that has had a demonstrative impact on development efficiency is Scrum. I haven’t put a lot of thought into why this is. I just know that it feels right for modern development, and it doesn’t work against you, as is the case with some other approaches. While I have both worked with and led many teams of all different sizes, I also sometimes work alone. I wanted to investigate if I could take most of the Scrum process and apply it to my solitary coding projects.

What doesn’t fit

My first step was to think through the Scrum process, and see what roles don’t apply to a lone coder. I quickly zeroed in on the Scrum Master role. I have to be the Product Owner since we need a product in order to need a process. I have to be the Development Team, because well, duh. The Scrum Master, according the Scrum Guide, “… is responsible for ensuring Scrum is understood and enacted. Scrum Masters do this by ensuring that the Scrum Team adheres to Scrum theory, practices, and rules.” This is both useful and necessary, but in my case if the Product Owner and Development Team don’t want to adhere to Scrum, then this post is sort of pointless.

The second part of Scrum I looked at, scalpel in hand, were the events that make the process what it is. The Sprint, Sprint Planning, Sprint Review, Sprint Retrospective, and the Daily Scrum. At first blush, the Daily Scrum looked ripe for excision. This turned out not to be the case. Even for a team of one, taking fifteen minutes to review, adjust, and plan for the next day is beneficial to reaching the Sprint Goal. What I ended up dropping was the Scrum Review. My stakeholders are my customers; my busy, distracted customers. Asking them to sit through one or more demos a month is both unrealistic and counterproductive. When the Sprint is done, I release the product, send an e-mail and hope for useful feedback. Long experience has taught me that this will be the absolute best case scenario.  The exception to this rule is consulting gigs.  Sprint Reviews are still useful for consulting projects, even if I am the only developer.

Everything else

I have built a massive Product Backlog over time from both user requests and my own evolving ideas of how the products should function. Luckily, the nature of my products make these backlogs easy to turn into Sprint Backlogs. The Sprint Planning generally only takes me a couple of hours of focused work. I have been in the problem domain so long that I spend most of my time breaking down the backlog into tasks that can be easily estimated. The creation of the Sprint Backlog and its ordering has become second nature. By end of the planning, the Sprint Goal is self-evident. The Sprint Retrospective is generally constrained to the review of missed estimates and other in house hiccups (unless I have received timely feedback, which is unusual). The Definition of Done might be the most useful part of the process for the solitary coder. It functions as a checklist of completion and a guard against missed steps. This is of paramount importance when there is nobody to check behind you.

An aside regarding tools

I use TFS to manage the process, store the artifacts, and track performance. Most of the work done here for regular teams also applies to the team of one.  I use Balsamiq for mockups. I automate everything I can, from builds, to tests, to release. This takes a lot of time to set up, but pays enormous dividends over time.

Is it worth it?

I find that Scrum, in a slightly altered format, facilitates productivity gains for the lone developer.   It has improved the quality of my products, and has helped me reduce the complexity of managing them. They say form is freeing, and in this case at least, it’s been true.

Adventures Offshore

A while back, I did some consulting at a large software and services firm with several offices across the globe.  The job entailed reviewing a suite of products that had been developed and maintained out of an office in…let’s say Mandalay. The firm was closing the development wing of this office and wanted to be sure they had all the bits and pieces needed to bring the products back to the local office.  I took the gig because the products in question were based on an array of technologies I wanted to become more familiar with: the LAMP stack, an Android product, and a Java product.  There was also some Flash in there, but I took the job anyway.  While I anticipated a learning curve in relation to the unfamiliar technology (and the usual tears shed over Flash), the project quickly morphed into a lesson on how not to manage an offshore project.

When I headed to the office to get a clone of the various repos, I received my first bit of bad news.  There was no repo, I was told.

Didn’t the Mandalay office have access to the VCS?


Why didn’t they place the code under source control?


Instead of a nicely ordered repo chock full of history, diffs, and comments, I got a zip file.  A zip file of zip files, actually.  No matter.  I took a deep breath, and started decompressing the beast.

Wow.  The sheer mass of…stuff.  There were directory structures fifteen layers deep that went nowhere. There were multiple copies of the same project, all with minor differences.  There were build instructions saved in Notepad that didn’t seem to refer to the projects they resided with.  I should note that there were (or had been) working copies of code for all the products contained in the zip file.  These were products that had been sold and were being run in the wild by big name customers.  Keeping this in mind, I sorted the folders and files by date, picked a product, steeled myself, and began to build.

The first product I built had instructions that seemed to match some of the files I saw.  It was a LAMP installation for the back-end of a kiosk product.  I began following the instructions, and made some headway. Alas, after creating the 62nd directory and 307th symbolic link, I began to wonder, why wasn’t the build automated?  This was a monster product.  Were customers required to install it this way?  Do they also drown baby kittens for sport here? I started asking questions, sniffing around for missing files.

How is this product usually installed?


Where does it run?

AWS, I think.

Can I see the production server and do a sanity check on my build?

I don’t have access.

Who does?


Can you ask the Mandalay team if there is a build script?


Days pass.

Is there a build script?

Shrug (there has to be a repetitive stress injury in the neck and shoulders by now).

Did you ask the Mandalay office?

Yes, but they haven’t replied.

How do you contact them?

E-mail (Naturally)

Is there any other way?

Shrug (…)

This went on until the end of the contract.  I will spare you the rest of the absurdity.  I did the best I could until I gleefully handed things off to a couple of new hires (sorry guys).  I didn’t learn as much of the technology stack as I hoped, but I did pick up some valuable lessons about working with offshore teams.  These lessons are common sense, but nevertheless:

1. Offshore offices should use the same tools and processes as the home offices.  This seems obvious to me, but it was not the case here.  The home office used Microsoft tools and a sorta Scrum process.  The offshore office did none of this.  They used mostly open sources tools and created no artifacts that could enable the home office to follow the status of the projects. Oh, and nobody in the home office had the skillset to maintain the products.  They were orphaned once the offshore office closed (and still haven’t been touched as far as I know).

2. Offshore offices should be relatively close, geographically speaking.  I repeatedly had questions for the offshore team that could never be asked or answered in real time, because their office was more than twelve time zones away.  The smartest and most economically efficient programmers may be on the other side of the world, but money isn’t everything.  Communication is key to any project and it completely broke down here.

3. Everybody should agree to use one language.  I am all for living in a polyglot world, but English is as close as we have to a universal language in programming and should be used for any shared project artifacts.  I ran across line after line of somebody’s native language in the code comments and it served as a brutal lesson of how far Google translate still has to go.

4. The offshore team should meet the local team in person.  There was a distinct lack of trust between the offshore office and the local office, and I think this was because they had never met.  If you can put a face with an e-mail and remember all the beers that (gender neutral) dude bought you, you will likely be more helpful (It also didn’t help that the offshore team would be out of work soon, but the point stands).

I think offshore projects and teams have their place.  I have never had the pleasure managing such a project, but should I ever do so, I will go forward with the knowledge of how horribly wrong it can go if not handled properly.

SQL Server Security Checklist

Every so often, I like to review my SQL Server security settings and make sure they match current best practices.  I find that unless I build a checklist, I invariably forget something important.  It is worth mentioning that I will be skipping the obvious, like setting login passwords that a toddler can’t guess.

Before attempting to secure the database server, it makes sense to decide what it is you are trying accomplish.  Here is what I came up with:

  1.  Prevent the theft of the data.
  2.  Prevent unauthorized viewing of data.  Both on the server, and over the wire.
  3.  Prevent the unauthorized modification of data.
  4.  Secure the server from denial of service.

To accomplish these goals you need to:

  1. Prevent unauthorized access to the database, its files, and the OS.
  2. Secure the connection.
  3. Encrypt sensitive data.

Preventing Unauthorized Access Checklist

  1. Check for open firewall ports – 1433 is only port I need open.  I don’t have any named instances, so I have SQL Browser disabled and no dynamic ports are being used.  I also only open the ports for the client application IP addresses. The DB server cannot be reached from the open Internet.  In my particular case this is the best method for mitigating DoS threats on the DB end of things.
  2. Check SQL Server’s OS Account – I am currently using the MSSQLSERVER account.  It is specific to the application and has the minimum privileges needed to run the DB server.  It is a better option than Network Service, and especially better than Local System.
  3. Make sure no unnecessary accounts have access to the DB data and log folders.  SQL Server, and DB administrators should only have access.
  4. Make sure xp_cmdshell is disabled – I use the following query to check.  The result should be 0:
    SELECT value_in_use FROM sys.configurations WHERE name = 'xp_cmdshell'
  5. Guard against SQL Injection – Make sure to parameterize all your queries.  This is painful and slow and absolutely necessary.
  6. Make sure the client login has the least amount of privilege necessary.  Use a different login for each database, and only grant permission to objects when necessary for the proper functioning of the application.

Secure the Connection Checklist

  1. Encrypt the client connections – Ensure Force Encryption is turned on in SQL Configuration manager.  A self-signed certificate can be used, but you will need to explicitly trust the certificate in your client application.  You definitely want to use SSL in most environments as TDS packets are sent more or less in the clear.
  2. Use Channel Binding – Channel binding helps prevent man in the middle attacks by binding the encryption key generated by the client to that client.  There is a performance penalty to be paid here, so it is a tradeoff between performance and security that has to be decided on a case by case basis.

Encrypt Sensitive Data Checklist

  1. Hash passwords – This is just common sense, but passwords should be hashed and salted.
  2. Other data – Any piece of data for which the thought of its being stolen causes an immediate rise in blood pressure should be encrypted.  There are built-in tools in SQL Server that can handle this for you.  I use symmetric encryption and store the key on a different server.  It isn’t fool proof, but I like to think that even if someone gains access to the database server, hopefully they won’t also have the key.

This is hardly an exhaustive list, but it is the best a developer masquerading as a DB admin can conjure.  Just building the checklist is a great learning (and relearning) experience and I will come back every so often to update it as I pick up new information on possible attacks and freak out accordingly.