Is SQL a coding

Is it considered an anti-pattern to write SQL in the source code?

Is it considered anti-pattern to hard-code SQL into an application like this:

Normally I would have a repository level etc, but I excluded them in the code above for simplicity.

I recently had feedback from a colleague who complained that SQL was written in source code. I didn't get a chance to ask why and he's been gone for two weeks now (maybe more). I suppose he meant either:

  1. Use LINQ
  2. Use stored procedures for SQL

Am I right? Is it considered an anti-pattern to write SQL in the source code? We are a small team working on this project. In my opinion, the advantage of stored procedures is that SQL developers can be involved in the development process (writing of stored procedures, etc.).

To edit The following link covers hardcoded SQL statements: Is there any benefit in creating a SQL statement?


You've ruled out the crucial part for the sake of simplicity. The repository is the abstraction layer for persistence. We divide the persistence into a separate layer so that we can simplify the persistence technology if necessary to change can . So if SQL is outside of the persistence layer, the overhead of a separate persistence layer is completely undone.

As a result: SQL is okay in the persistence layer that is specific to an SQL technology (e.g. SQL is okay in a, but not in a). Outside the persistence layer, SQL breaks your abstraction and therefore (by me) viewed as very bad practice .

To tools like LINQ or JPQL: These can only abstract the SQL variants. When LINQ code or JPQL queries exist outside of a repository, the persistence abstraction is broken in the same way as with raw SQL.

Another great benefit of having a separate persistence layer is that you can resolve your business logic code without having to set up a DB server.

You get fast unit tests with a low memory profile and reproducible results on all platforms that your language supports.

In an MVC + Service architecture, this is a simple task of mocking the repository instance, creating some trial data in memory, and defining that the repository should return that trial data when a particular getter is called. You can then define test data for each unit test and then no longer have to worry about cleaning up the database.

Testing writes to the database is just as easy: make sure that the relevant update methods have been called on the persistence layer and confirm that the entities were in the correct state at the time.

Most standard business applications these days use different layers with different responsibilities. However Which Layers you for Your Application, and the shift that is responsible for you and your team. Before you can make a decision on whether to put SQL right into the function you showed us is right or wrong, you need to know

  • Which level in your application has which responsibility

  • Which layer does the above function come from?

There is no "one size fits all" solution for this. In some applications the designers prefer to use an ORM framework and let the framework generate all SQL statements. In some applications, designers prefer to store such SQL statements in stored procedures only. For some applications there is a handwritten persistence (or repository) layer in which the SQL resides, and for other applications it is fine to make exceptions under certain circumstances in order to place SQL strictly in that persistence layer.

So what to think about: what layers want or do you need in your specific application and as you wish you the responsibilities? They have written: "Normally I would have a repository level" , but which exact responsibilities would you like to have at this level and which responsibilities would you like to transfer to another position? Answer this first, then you can answer your question yourself.

Marstato gives a good answer, but I would like to add one more comment.

SQL in Source is NOT an anti-pattern, but it can cause problems. I can remember when you had to put SQL queries into the properties of components that were put on each form. That made things ugly very quickly and you had to jump through the hoops to find queries. I am a strong advocate of centralizing database access within the confines of the languages ​​I have worked with. Your colleague can get flashbacks on those dark days.

Now some of the comments are about supplier retention as if that were automatically a bad thing. Is not it. If I sign a six-digit check every year to use Oracle, you can bet that any application that accesses this database will use the additional Oracle syntax should use accordinglybut to the fullest. I won't be happy if my shiny database is crippled by programmers who poorly write vanilla ANSI SQL when there is an "Oracle way" of writing SQL that doesn't cripple the database. Yes, changing databases will be more difficult, but I've only seen it a couple of times in over 20 years at a large customer and one of those cases passed from DB2 -> Oracle because the mainframe that hosted DB2 was out of date was and was taken out of service. Yes, this is vendor loyalty, but for corporate customers it is desirable to pay for an expensive, high-performance RDBMS like Oracle or Microsoft SQL Server and then take full advantage of it. You have a support agreement as your comfort blanket. When I pay for a database with a large stored procedure implementation,

This leads to the next point: if you are writing an application that accesses a SQL database , you need SQL as well as learning the other language, and by learning I also mean query optimization. I'll get annoyed with you for writing SQL generation code that flushes the SQL cache with a deluge of nearly identical queries when you could have used a cleverly parameterized query.

No excuses, no hiding behind a wall of hibernation. ORMs badly can be used really Cripple application performance. I remember seeing a question about Stack Overflow a few years ago:

At rest, 250,000 records are traversed to check the values ​​of some properties and update objects that meet certain conditions. It's running a little slow, what can I do to speed it up?

How about "UPDATE table SET field1 = where field2 is True and Field3> 100". Creating and disposing of 250,000 objects could be your problem ...

ie ignore hibernation when it is inappropriate to use it. Understand the database.

In conclusion, embedding SQL in code can be bad practice, but there are much worse things you can do if you are trying to avoid embedding SQL.

Yes, hard-coding SQL strings in application code is generally an anti-pattern.

Let's try to break away from the tolerance we've developed over the years in the production code. Mixing completely different languages ​​with different syntax in the same file is generally not a desirable development technique. This is different from template languages ​​like Razor, which are designed to give contextual meaning to multiple languages. How Sava B. Mentioned in a comment below, SQL in C # or any other application language (Python, C ++, etc.) is a string like any other and semantically meaningless. The same is true for mixing multiple languages ​​in most cases, although this is obviously acceptable in certain situations, e.g. B. Inline assembly in C, small and understandable CSS snippets in HTML (noting that CSS is designed to be blended with HTML), and others.

(Robert C. Martin on mixing languages, Clean Code , Chapter 17, "Code Smells and Heuristics", page 288)

For this answer, I'll focus on SQL (as asked in the question). The following problems can arise when storing SQL as a set of unmapped strings à la carte:

  • The database logic is difficult to find. What are you looking for to find all of your SQL statements? Strings with "SELECT", "UPDATE", "MERGE" etc.?
  • Refactoring applications with the same or similar SQL becomes difficult.
  • Adding support for other databases is difficult. How would that be achieved? Add if..then statements for each database and store all queries as strings in the method.
  • Developers read an instruction in a different language and are distracted by the shift in focus from the purpose of the method to the implementation details of the method (how and where to get data from).
  • While one-liners aren't too much of a problem, inline SQL strings fall apart as statements become more complex. What do you do with a 113 line statement? Are you putting all 113 lines in your method?
  • How does the developer efficiently move queries back and forth between his SQL editor (SSMS, SQL Developer, etc.) and his source code? The C # prefix makes this easier, but I've seen a lot of code that quotes each line of SQL and skips the line breaks.
  • Indentation marks, which are used to align the SQL with the surrounding application code, are transmitted over the network each time it is executed. This is probably insignificant for small applications, but it can add up as you use the software.

Complete ORMs (Object-Relational Mappers such as Entity Framework or Hibernate) can eliminate randomly peppered SQL in the application code. My use of SQL and resource files is just an example. ORMs, helper classes, etc. can help achieve the goal of cleaner code.

As Kevin said in a previous answer, SQL in code can be acceptable in small projects, but large projects start out as small projects, and the likelihood that most teams will return and get it right is often inversely proportional to code size.

There are many easy ways to keep SQL in a project. One of the methods I use a lot is to write each SQL statement into a Visual Studio resource file called "sql". A text file, JSON document, or other data source can make sense depending on your tools. In some cases, having a separate class for creating SQL strings might be the best option, but it can have some of the problems described above.

SQL Example: What Looks More Elegant?:


The SQL code is always in an easy-to-localize file, or in a group of files, each of which has a descriptive name that describes what they are doing rather than how they do it:

This simple method runs a single query. In my experience, the benefits increase when the use of the "foreign language" becomes more complex. My use of a resource file is just an example. Depending on the language (in this case SQL) and platform, different methods may be more suitable.

These and other methods resolve the above list as follows:

  1. The database code is easy to find because it is already centralized. In larger projects you group like-SQL in separate files, possibly under a folder named.
  2. Support for a second, third, etc. database is easier. Create an interface (or other language abstraction) that returns the unique statements of each database. The implementation for each database consists only of statements that are similar to the following: While these implementations may skip the resource and include the SQL in a string, some (not all) of the above problems will still occur.
  3. Refactoring is easy - just use the same resource again. In fact, with a few format instructions, you can often use the same resource record for different DBMS systems. I do this a lot.
  4. The use of the secondary language can descriptive Use names, e.g. B. instead of blunt
  5. SQL statements are called on one line, regardless of their size and complexity.
  6. SQL can be copied and pasted between database tools like SSMS and SQL Developer without modification or careful copying. No quotation marks. No trailing backslashes. In the case of the Visual Studio Resource Editor, one click highlights the SQL statement. CTRL + C and then paste it into the SQL editor.

Building SQL in a resource is fast, so mixing resource usage with SQL in-code is hardly an incentive.

Whichever method you choose, I've found that mixing languages ​​usually degrades code quality. I hope that some of the problems and solutions outlined here will help developers get rid of that code smell when appropriate.

It depends on whether. There are several approaches that can work:

  1. Modularize the SQL and isolate it into a separate set of classes, functions, or the unit of abstraction used by your paradigm. Then call them with the application logic.
  2. Move all complex SQL statements into views and then only execute very simple SQL statements in the application logic so that you don't have to modularize anything.
  3. Use an object-relational mapping library.
  4. YAGNI, just write SQL directly into the application logic.

As is often the case, if your project has already chosen one of these techniques, you should be consistent with the rest of the project.

(1) and (3) are both relatively good at maintaining independence between the application logic and the database, as the application will continue to compile and run basic smoke tests when you replace the database with another vendor. However, most providers do not fully conform to the SQL standard. Therefore, replacing one vendor with a different vendor is likely to require extensive testing and troubleshooting, no matter what technique you are using. I am skeptical that this is as big a thing as people imagine. Modifying databases is basically a last resort when you cannot get the current database to meet your needs. In this case, you have probably chosen the database poorly.

The choice between (1) and (3) mainly depends on how much you like ORMs. In my opinion, they are being overused. They are a poor representation of the relational data model because rows have no identity like objects have identity. You are likely to encounter pain points in terms of unique constraints and associations, and you may have difficulty formulating some more complicated queries depending on the ORM's capabilities. On the other hand, (1) will likely require a lot more code than an ORM.

(2) is rarely seen in my experience. The problem is that many shops forbid SWEs to change the database schema directly (because "this is the DBA's job"). This isn't necessarily a bad thing in itself; Schema changes have significant potential to break things and may need to be carefully implemented. However, for (2) to work, SWEs should at least be able to introduce new views and change the background queries of existing views with minimal or unbureaucratic effort. If this isn't the case at your workplace, then (2) probably won't work for you.

On the other hand, if you can get (2) working it is much better than most other solutions because it keeps the relational logic in SQL instead of application code. In contrast to general-purpose programming languages, SQL is specially designed for the relational data model and, accordingly, can better express complex data queries and transformations. Views can also be ported along with the rest of your schema as you change databases, but this makes it difficult to move.

For read access, stored procedures are basically just a worse version of (2).I don't recommend them in this capacity, but you might still want them for writing if your database doesn't support updatable views, or if you need to do something more complex than inserting or updating a single row at a time (e.g., transactions, read- Writing, etc.). You can gate your stored procedure to a view (i.e., H), but there is significant disagreement as to whether this is actually a good idea. Proponents will tell you that the SQL statements your application executes are as simple as possible. Critics will tell you that this is an unacceptable level of "magic" and that you should just perform the procedure right from within your application. I'd say this is a better idea if your stored procedure looks or acts a lot like one, or, and a worse idea if it does something else. Ultimately, you have to decide for yourself which style makes more sense.

(4) is the non-solution. It can be worthwhile for small or large projects that only interact with the database sporadically. However, this is not a good idea for projects with a lot of SQL, as duplicates or variations of the same query may be randomly distributed in the application, making it difficult to read and refactor.

Is it considered an anti-pattern to write SQL in the source code?

Not necessarily. Reading all of the comments here will provide valid arguments for hardcoding SQL statements in the source code.

The problem is where you put the instructions . If you put the SQL statements anywhere in the project, you are likely to be ignoring some of the SOLID principles that we usually strive for.

suppose he meant; either:

1) Use LINQ


2) Use stored procedures for SQL

We can't tell what he meant. However, we can guess. For example, the first thing that comes to mind is the Supplier loyalty . Hard coding SQL statements can result in tight coupling of your application with the DB engine. For example, use certain features from the manufacturer that are not ANSI compliant.

That is not necessarily wrong or bad. I'm just pointing to the fact.

Ignoring SOLID Policy and Vendor Bans may have adverse consequences that you may ignore. So it's usually good to sit down with the team and unmask your doubts.

The advantage of stored procedures, in my opinion, is that SQL developers can be involved in the development process (writing of stored procedures, etc.).

I don't think it has anything to do with the stored procedure benefits. If your coworker doesn't like hard-coded SQL, chances are they won't like moving business to stored procedures either.

Edit: The following link covers hardcoded SQL statements: Is there any benefit in creating a SQL statement?

Yes. Swiss Post lists the advantages prepared statements . It's a kind of SQL template. Safer than concatenating strings. But the post does not encourage you to go that route or confirm that you are right. It just explains how we can use hard-coded SQL in a safe and efficient way.

First, ask your colleague. Send an email, give him a call ... Whether he answers or not, join the team and unmask your doubts. Find the solution that best suits your needs. Don't make wrong assumptions based on what you read there.

I think it's bad practice, yes. Others have pointed out the advantages of storing all of the data access code in a separate layer. You don't have to search for it, it's easier to optimize and test ... But even within this level you have a few options: use an ORM, use sprocs, or embed SQL queries as strings. I'd say SQL query strings are by far the worst option.

With an ORM, development becomes much easier and less error-prone. With EF, you define your schema by simply creating your model classes (you should have created those classes anyway). Querying with LINQ is child's play - you can often get away with 2 lines of C # where you would otherwise have to write a Sproc and wait. In my opinion, this has a huge productivity and maintainability benefit - less code, fewer problems. But there is a performance overhead even if you know what you're doing.

Sprocs (or functions) are the other option. This is where you write your SQL queries manually. But at least you get a guarantee that they are correct. If you are working in .NET, Visual Studio will even throw compiler errors if the SQL is invalid. That's great. If you change or remove a column and some of your queries become invalid, you will likely find out during compile time. It's also a lot easier to manage sprocs in their own files - you will likely get syntax highlighting, autocomplete, etc.

If your queries are saved as sprocs, you can modify them without recompiling and redeploying the application. If you find that something is wrong in your SQL code, a DBA can simply fix it without accessing your app code. In general, if your queries are made up of string literals, you can't run dbas nearly as easily.

SQL queries as string literals also make the C # code less readable for data access.

Just as a rule of thumb, magic constants are bad. This includes string literals.

It's not an anti-pattern (this answer is dangerously close to opinion). Code formatting is important, however, and the SQL string should be formatted so that it is clearly different from the code that is using it. For example

I cannot tell you what your colleague meant. But I can answer that:

Is there any benefit in creating a SQL statement?

YES . The use of prepared statements with bound variables is the recommended defense against SQL injection