출처 http://raisonde.tistory.com/9

 

 

 

처음부터 끝까지 읽으면 사용할 수 있다. 천천히 읽어 보길..


기본적인 파일 입출력을 위해선 4개의 함수를 사용한다.

CreateFile, WriteFile, ReadFile, CloseHandle

기본적인 개념부터 설명 하자면, CreateFile은 언뜻 보기엔 파일을 만들기만 할것 같이 생겼지만 그게 아니다.
쉽게 말하면 파일 핸들을 할당하는 함수이다.

WriteFile로 파일에 특정 값을 쓰거나
ReadFile로 파일의 값을 읽어 올때 모두 파일 핸들을 이용한다. "abc.txt"같은 특정 파일에 다이렉트로 접근 할 수 없는 것이다.
이런식으로 특정 파일에 접근할 수 있는 핸들 할당을 위한 함수가 CreateFile이다. 

  1. fileHandle = CreateFile("abc.txt", CREATE); //이해를 돕기 위한 것으로, 원형이 이렇진 않음
  2. fileHandle = CreateFile("abc.txt", OPEN);


이런식으로 파일을 새로 만들거나 기존 파일을 열어서 fileHandle에 할당하고

  1. WriteFile(fileHandle, "abcde");
  2. ReadFile(fileHandle, &variable);


파일에 값을 쓰거나 어떤 주소로 파일의 값을 읽는다. 그리고

  1. CloseHandle(fileHandle);


핸들을 닫아준다. 


이게 끝이다. 대충 이런식으로 사용한다. 물론 저렇게 단순하게 생겨먹은 함수는 아니다. 
인자들이 워낙 복잡하기에 한눈에 이해를 못하는 경우가 있어서 간소화 하여 설명 한 것이다. 그럼 복잡한 원형을 보자.

 


딱 사용 할 수 있을 정도로만 설명 할 것이다.

CreateFile

HANDLE CreateFile(
    LPCTSTR lpFileName,
    DWORD dwDesiredAccess,
    DWORD dwShareMode,
    LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    DWORD dwCreationDisposition,
    DWORD dwFlagsAndAttributes,
    HANDLE hTemplateFile );

lpFileName 은 말그래도 파일 명이다. LTPSTR형의 변수를 입력 해 주거나 _T("Filename.file")과 같이 직접 적어준다.

  1. hFile = CreateFile(_T("Filename.file"),

dwDesiredAccess 은 접근 권한이다.
GENERIC_READ, GENERIC_WRITE 등의 여러 권한들이 있으며 논리합 연산자 "|"를 이용해서 여러개를 적을 수 있다.

  1. hFile = CreateFile(_T("Filename.file"), GENERIC_READ | GENERIC_WRITE,
 

dwShareMode 는 다른 프로세스에서 사용 가능 하도록 할 것인지를 성정한다. 이 또한 논리합 연산자를 사용 가능하다.
멀티 프로세싱을 할것이라면 추가적으로 검색 해 보기 바라며, 기본적으로 0을 적어 준다.

  1. hFile = CreateFile(_T("Filename.file"), GENERIC_READ | GENERIC_WRITE, 0,
 

lpSecurityAttributes 는 보안에 관련된 구조체를 가리키는 값이며, 초보적인 수준에서는 잘 다루지 않는다.
이 또한 궁금하면 추가적으로 찾아보길 바란다. 기본적으로 NULL을 적어준다.

  1. hFile = CreateFile(_T("Filename.file"), GENERIC_READ | GENERIC_WRITE, 0, NULL,

dwCreationDisposition 는 파일을 열것인지, 새로 만들 것인지를 설정한다. 이에 대한 옵션을 알아보자.
CREATE_NEW or 1 : 같은 파일 명이 존재하지 않을 경우에 새 파일을 만든다. 파일이 이미 존재하면 ERROR_FILE_EXISTS (80)에러를 발생시키고 fail을 리터한다.
CREATE_ALWAYS or 2 : 항상 새 파일을 만든다. 이미 존재 할 경우 기존 파일을 새 파일로 덮어 쓴다. 파일이 이미 존재하면 ERROR_ALREADY_EXISTS (183)에러가 발생되나 success를 리턴한다.
OPEN_EXISTING or 3 :  파일이 존재 할 경우에만 연다. 파일이 존재하지 않을 경우 ERROR_FILE_NOT_FOUND (2) 
OPEN_ALWAYS or 4 : 파일이 존재 할 경우 열고, 존재하지 않으면 새로 만들고 연다. 존재하지 않을 경우 ERROR_ALREADY_EXISTS (183)
TRUNCATE_EXISTING or 5 : 파일이 존재할 경우 파일을 초기화 하고(크기를 0으로) 연다.
파일이 존재하지 않을 경우 ERROR_FILE_NOT_FOUND (2) 오류를 발생시킨다.

  1. hFile = CreateFile(_T("Filename.file"), GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,
 

dwCreationDisposition 는 파일 파일의 특정들을 설정하는 플래그를 지정한다. 총 16개가 있다. 이는 파일을 만들때만 사용되는 것이며 기존 파일을 열때에는 기존 파일의 플래그를 따른다. 즉, 입력값은 무시된다.
일반적으론 0으로 설정 해 주면 된다.

  1. hFile = CreateFile(_T("Filename.file"), GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0


hTemplateFile 은 GENERIC_READ를 지정해서 연 기존 파일의 핸들이다. 파일의 특성들이 저장되므로 동일한 특성들을 가진 새 파일을 만들때 쓰이나 일반적으로는 NULL을 넣는다.

hFile = CreateFile(_T("Filename.file"), GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0, NULL);

결국 이렇게 쓰면 완성이다. 알아서 찾아 보라는게 많아 황당 할 수도 있겠으나 Windows API는 어떻게 보면 고급 프로그래밍이다. 즉, 기본적인 예제 및 연습용 프로그래밍에서는 쓰이지도 않을 수많은 옵션들을 제공하는 경우가 많기에 이런것들을 하나하나 다 알아가며 공부 할 수는 없다. 일단 필요한 것만 알아서 쓰고, 더 필요하다고 생각되면 찾아 쓰면 되는것이다.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx

WriteFile

BOOL WINAPI WriteFile(
   HANDLE hFile, 
   LPCVOID lpBuffer,
   DWORD nNumberOfBytesToWrite,
   LPDWORD lpNumberOfBytesWritten,
   LPOVERLAPPED lpOverlapped
);

ReadFile

BOOL WINAPI ReadFile(
   HANDLE hFile,
   LPVOID lpBuffer,
   DWORD nNumberOfBytesToRead,
   LPDWORD lpNumberOfBytesRead,
   LPOVERLAPPED lpOverlapped
);

보면 알다시피 두 함수의 인자는 상당히 비슷하다. CreateFile에 비해서 비교적 간단하므로 간단히 설명 하겠다.

hFile 은 CreateFile에서 할당 해 줬던 핸들을 입력 해 준다.

fHandle=CreateFile(_T("data.txt"),GENERIC_READ|GENERIC_WRITE,0,NULL,CREATE_ALWAYS, 0, NULL);
ReadFile(fHandle,
 

lpBuffer -     WriteFile에서는 파일에 쓰고 싶은 스트링이나 오브젝트를 입력한다.
                  ReadFile에서는 파일에서 읽은 값을 저장하고 싶은 스트링이나 오브젝트의 주소를 입력한다.

TCHAR Strings[5];
WriteFile(fHandle, _T("abcde"),
ReadFile(fHandle, Strings,


nNumberOfBytesToRead, nNumberOfBytesToWrite 는 말 그대로 얼마나 읽고 얼마나 쓸 것인가에 해당하는 크기 값을 입력한다.

WriteFile(fHandle, _T("abcde"), sizeof(TCHAR)*3,
ReadFile(fHandle, Strings, sizeof(TCHAR)*2,


lpNumberOfBytesRead, lpNumberOfBytesWritten 는 얼마나 읽고 쓰여졌는가 결과값이 저장되는 주소값을 입력 할 수 있다.

DWORD result;
WriteFile(fHandle, _T("abcde"), sizeof(TCHAR)*3, &result,
ReadFile(fHandle, Strings, sizeof(TCHAR)*2, &result,

lpOverlapped 는 지금 알 필요 없다. 지금 단계에서 꼭 써야 하는 개념도 아니고, 해당 지면에서 설명하기도 힘들다. 그냥 NULL을 넣어 주면 된다. 필요 하면 찾아 보도록!

완성된 예제를 보자

  1. #include <Windows.h>
  2. #include <tchar.h>
  3. #include <locale.h>
  4.  
  5. int _tmain(int argc, LPTSTR argv[]) {
  6.     HANDLE fHandle;
  7.     TCHAR Strings[5];
  8.     DWORD result;
  9.     LARGE_INTEGER curPtr;
  10.     _wsetlocale(LC_ALL, _T("Korean"));
  11.     fHandle=CreateFile(_T("data.txt"),GENERIC_READ|GENERIC_WRITE,0,NULL,CREATE_ALWAYS, 0, NULL);
  12.     WriteFile(fHandle, _T("abcde"), sizeof(TCHAR)*3, &result, NULL);
  13.     _tprintf(_T("쓰여진 바이트 수 : %d\n"), result);
  14.  
  15.     curPtr.QuadPart = 0; //파일 포인터 조정
  16.     SetFilePointerEx(fHandle,curPtr, NULL, FILE_BEGIN);
  17.  
  18.     ReadFile(fHandle, Strings, sizeof(TCHAR)*2, &result, NULL);
  19.     _tprintf(_T("읽혀진 바이트 수 : %d\n"), result);
  20.     _tprintf(_T("%s\n"),Strings);
  21.  
  22.     CloseHandle(fHandle);
  23. }



이중에서 이해 할 수 없는 부분은 딱 한군데. SetFilePointerEx 일 것이다.
파일 핸들은 파일을 컨트롤 하기 위해 항상 포인터 정보를 가지고 있다. 파일을 읽을때 항상 처음부터 끝까지 읽는게 아니라 원하는 어느 부분부터 어느 부분까지도 읽을 수 있어야 하기 때문이다. 쓸 때도 마찬가지다. 항상 끝에서 쓰는게 아니라, 중간에 끼워서 써야 될 수도 있고 제일 첫부분에 추가시켜야 되는 내용도 있을 수 있다.
그러므로 파일 포인터을 이동시켜 가며 파일을 제어 하는데, 글을 입력할때 깜박거리는 커서로 이해 하면 쉬울 것이다.
글을 다 쓰고 나면 이 커서는 항상 제일 끝에 있다. 그래야 다음에 추가로 쓸게 있더라도 편하게 쓰기만 하면 알아서 이어 지는 것이다.
하지만 우리가 원하는 것은 파일의 처음부터 읽는 것이므로 그 포인터를 제일 앞으로 옮겨 줘야 한다. 이런식으로 포인터를 인위적으로 조작 할때 SetFilePointerEx를 쓴다. (SetFilePointer 함수를 확상시킨 함수이지만, SetFilePointer함수는 옛날에 만들어진 함수로 4G이상의 파일을 컨트롤 할 수 없기 때문에 요즘은 거의 쓰이지 않는다.)

BOOL WINAPI SetFilePointerEx(
  HANDLE hFile,
  LARGE_INTEGER liDistanceToMove,
  PLARGE_INTEGER lpNewFilePointer,
  DWORD dwMoveMethod
);

앞의 글을 차근차근 읽었다면 이 인자를 어렵지 않게 이해 할 수 있을 것이다.

hFile 은 포인트를 조정 할 파일 핸들,
liDistanceToMove 는 얼마나 이동 할 것인가
lpNewFilePointer 옮겨진 포인터를 확인 할 수 있다. 굳이 확인 할 필요가 없을 경우엔 NULL
dwMoveMethod 어디서 이동 할 것인가 플래그들은 다음과 같다.

FILE_BEGIN : 파일의 처음부터 
FILE_END : 파일을 끝에서
FILE_CURRENT : 현재 위치에서

더 성명할 필요 없을거라 믿는다.
단, 중요한건 여기서는 LARGE_INTEGER를 사용 한다는 것인데, 이는 큰 용량의 파일을 컨트롤 하기 위해 쓰이는 파일 형이다.
Windows.h에서 이미 정의되어 있는 공용체(UNION)인데, 일반적으로는 QuadPart만 사용하면 일반 DWORD처럼 사용 가능하다. 자세한건 여기서 다루지 않겠다.

LARGE_INTEGER curPtr; //Windows.h에서 이미 정의되어 있으므로 그냥 사용
curPtr.QuadPart = 10 //이렇게 그냥 DWORD처럼 사용 할 수 있다.

아래 소스를 돌려 보고 위 소스와의 차이, 결과값을 보면 이해하기 쉬울 것이다.


  1. #include <Windows.h>
  2. #include <tchar.h>
  3. #include <locale.h>
  4.  
  5. int _tmain(int argc, LPTSTR argv[]) {
  6.     HANDLE fHandle;
  7.     TCHAR Strings[5];
  8.     DWORD result;
  9.     LARGE_INTEGER curPtr;
  10.     LARGE_INTEGER thisPtr;
  11.  
  12.     _wsetlocale(LC_ALL, _T("Korean"));
  13.  
  14.     fHandle=CreateFile(_T("data.txt"),GENERIC_READ|GENERIC_WRITE,0,NULL,CREATE_ALWAYS, 0, NULL);
  15.     WriteFile(fHandle, _T("abcde"), sizeof(TCHAR)*3, &result, NULL);
  16.     _tprintf(_T("쓰여진 바이트 수 : %d\n"), result);
  17.  
  18.     curPtr.QuadPart = sizeof(TCHAR);
  19.     SetFilePointerEx(fHandle,curPtr, &thisPtr, FILE_BEGIN);
  20.  
  21.     ReadFile(fHandle, Strings, sizeof(TCHAR)*2, &result, NULL);
  22.     _tprintf(_T("읽혀진 바이트 수 : %d\n현재 포인터 위치 : %d\n"), result, thisPtr.QuadPart);
  23.     _tprintf(_T("%s\n"),Strings);
  24.  
  25.     CloseHandle(fHandle);
  26. }
Posted by JinFluenza
글 원문 : http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc


Many Windows C++ programmers get confused over what bizarre data type identifiers like TCHAR and LPCTSTR are. Here, in brief, I will try to clear out the fog.

Many C++ Windows programmers get confused over what bizarre identifiers like TCHAR, LPCTSTR are. In this article, I would attempt by best to clear out the fog.

In general, a character can be represented in 1 byte or 2 bytes. Let's say 1-byte character is ANSI character - all English characters are represented through this encoding. And let's say a 2-byte character is Unicode, which can represent ALL languages in the world.

The Visual C++ compiler supports char and wchar_t as native data-types for ANSI and Unicode characters, respectively. Though there is more concrete definition of Unicode, but for understanding assume it as two-byte character which Windows OS uses for multiple language support.

There is more to Unicode than 2-bytes character representation Windows uses. Microsoft Windows use UTF-16 character encoding.

What if you want your C/C++ code to be independent of character encoding/mode used?

Suggestion: Use generic data-types and names to represent characters and string.

For example, instead of replacing:

char cResponse; // 'Y' or 'N'
char sUsername[64];
// str* functions

with

wchar_t cResponse; // 'Y' or 'N'
wchar_t sUsername[64];
// wcs* functions

In order to support multi-lingual (i.e., Unicode) in your language, you can simply code it in more generic manner:

#include<TCHAR.H> // Implicit or explicit include
TCHAR cResponse; // 'Y' or 'N'
TCHAR sUsername[64];
// _tcs* functions

The following project setting in General page describes which Character Set is to be used for compilation: (General -> Character Set)

This way, when your project is being compiled as Unicode, the TCHAR would translate to wchar_t. If it is being compiled as ANSI/MBCS, it would be translated to char. You are free to use char and wchar_t, and project settings will not affect any direct use of these keywords.

TCHAR is defined as:

#ifdef _UNICODE
typedef wchar_t TCHAR;
#else
typedef char TCHAR;
#endif

The macro _UNICODE is defined when you set Character Set to "Use Unicode Character Set", and therefore TCHAR would mean wchar_t. When Character Set if set to "Use Multi-Byte Character Set", TCHAR would mean char.

Likewise, to support multiple character-set using single code base, and possibly supporting multi-language, use specific functions (macros). Instead of using strcpy, strlen, strcat (including the secure versions suffixed with _s); or wcscpy, wcslen, wcscat (including secure), you should better use use _tcscpy, _tcslen, _tcscat functions.

As you know strlen is prototyped as:

size_t strlen(const char*);

And, wcslen is prototyped as:

size_t wcslen(const wchar_t* );

You may better use _tcslen, which is logically prototyped as:

size_t _tcslen(const TCHAR* );

WC is for Wide Character. Therefore, wcs turns to be wide-character-string. This way, _tcs would mean _T Character String. And you know _T may be char or what_t, logically.

But, in reality, _tcslen (and other _tcs functions) are actually not functions, but macros. They are defined simply as:

#ifdef _UNICODE
#define _tcslen wcslen 
#else
#define _tcslen strlen
#endif

You should refer TCHAR.H to lookup more macro definitions like this.

You might ask why they are defined as macros, and not implemented as functions instead? The reason is simple: A library or DLL may export a single function, with same name and prototype (Ignore overloading concept of C++). For instance, when you export a function as:

void _TPrintChar(char);

How the client is supposed to call it as?

void _TPrintChar(wchar_t);

_TPrintChar cannot be magically converted into function taking 2-byte character. There has to be two separate functions:

void PrintCharA(char); // A = ANSI 
void PrintCharW(wchar_t); // W = Wide character

And a simple macro, as defined below, would hide the difference:

#ifdef _UNICODE
void _TPrintChar(wchar_t); 
#else 
void _TPrintChar(char);
#endif

The client would simply call it as:

TCHAR cChar;
_TPrintChar(cChar);

Note that both TCHAR and _TPrintChar would map to either Unicode or ANSI, and therefore cChar and the argument to function would be either char or wchar_t.

Macros do avoid these complications, and allows us to use either ANSI or Unicode function for characters and strings. Most of the Windows functions, that take string or a character are implemented this way, and for programmers convenience, only one function (a macro!) is good. SetWindowText is one example:

// WinUser.H
#ifdef UNICODE
#define SetWindowText  SetWindowTextW
#else
#define SetWindowText  SetWindowTextA
#endif // !UNICODE

There are very few functions that do not have macros, and are available only with suffixed W or A. One example is ReadDirectoryChangesW, which doesn't have ANSI equivalent.


You all know that we use double quotation marks to represent strings. The string represented in this manner is ANSI-string, having 1-byte each character. Example:

"This is ANSI String. Each letter takes 1 byte."

The string text given above is not Unicode, and would be quantifiable for multi-language support. To represent Unicode string, you need to use prefix L. An example:

L"This is Unicode string. Each letter would take 2 bytes, including spaces."

Note the L at the beginning of string, which makes it a Unicode string. All characters (I repeat all characters) would take two bytes, including all English letters, spaces, digits, and the null character. Therefore, length of Unicode string would always be in multiple of 2-bytes. A Unicode string of length 7 characters would need 14 bytes, and so on. Unicode string taking 15 bytes, for example, would not be valid in any context.

In general, string would be in multiple of sizeof(TCHAR) bytes!

When you need to express hard-coded string, you can use:

"ANSI String"; // ANSI
L"Unicode String"; // Unicode

_T("Either string, depending on compilation"); // ANSI or Unicode
// or use TEXT macro, if you need more readability

The non-prefixed string is ANSI string, the L prefixed string is Unicode, and string specified in _T or TEXT would be either, depending on compilation. Again, _T and TEXT are nothing but macros, and are defined as:

// SIMPLIFIED
#ifdef _UNICODE 
 #define _T(c) L##c
 #define TEXT(c) L##c
#else 
 #define _T(c) c
 #define TEXT(c) c
#endif

The ## symbol is token pasting operator, which would turn _T("Unicode") into L"Unicode", where the string passed is argument to macro - If _UNICODE is defined. If _UNICODE is not defined, _T("Unicode") would simply mean "Unicode". The token pasting operator did exist even in C language, and is not specific about VC++ or character encoding.

Note that these macros can be used for strings as well as characters. _T('R') would turn into L'R' or simple 'R' - former is Unicode character, latter is ANSI character.

No, you cannot use these macros to convert variables (string or character) into Unicode/non-Unicode text. Following is not valid:

char c = 'C';
char str[16] = "CodeProject";

_T(c);
_T(str);

The bold lines would get successfully compiled in ANSI (Multi-Byte) build, since _T(x) would simply be x, and therefore _T(c) and _T(str) would come out to be c and str, respectively. But, when you build it with Unicode character set, it would fail to compile:

error C2065: 'Lc' : undeclared identifier
error C2065: 'Lstr' : undeclared identifier

I would not like to insult your intelligence by describing why and what those errors are.

There exist set of conversion routine to convert MBCS to Unicode and vice versa, which I would explain soon.

It is important to note that almost all functions that take string (or character), primarily in Windows API, would have generalized prototype in MSDN and elsewhere. The function SetWindowTextA/W, for instance, be classified as:

BOOL SetWindowText(HWND, const TCHAR*);

But, as you know, SetWindowText is just a macro, and depending on your build settings, it would mean either of following:

BOOL SetWindowTextA(HWND, const char*);
BOOL SetWindowTextW(HWND, const wchar_t*);

Therefore, don't be puzzled if following call fails to get address of this function!

HMODULE hDLLHandle;
FARPROC pFuncPtr;

hDLLHandle = LoadLibrary(L"user32.dll");

pFuncPtr = GetProcAddress(hDLLHandle, "SetWindowText");
//pFuncPtr will be null, since there doesn't exist any function with name SetWindowText !

From User32.DLL, the two functions SetWindowTextA and SetWindowTextW are exported, not the function with generalized name.

Interestingly, .NET Framework is smart enough to locate function from DLL with generalized name:

[DllImport("user32.dll")]
extern public static int SetWindowText(IntPtr hWnd, string lpString);

No rocket science, just bunch of ifs and else around GetProcAddress!

All of the functions that have ANSI and Unicode versions, would have actual implementation only in Unicode version. That means, when you call SetWindowTextA from your code, passing an ANSI string - it would convert the ANSI string to Unicode text and then would call SetWindowTextW. The actual work (setting the window text/title/caption) will be performed by Unicode version only!

Take another example, which would retrieve the window text, using GetWindowText. You call GetWindowTextA, passing ANSI buffer as target buffer. GetWindowTextA would first call GetWindowTextW, probably allocating a Unicode string (a wchar_t array) for it. Then it would convert that Unicode stuff, for you, into ANSI string.

This ANSI to Unicode and vice-versa conversion is not limited to GUI functions, but entire set of Windows API, which do take strings and have two variants. Few examples could be:

  • CreateProcess
  • GetUserName
  • OpenDesktop
  • DeleteFile
  • etc

It is therefore very much recommended to call the Unicode version directly. In turn, it means you should always target for Unicode builds, and not ANSI builds - just because you are accustomed to using ANSI string for years. Yes, you may save and retrieve ANSI strings, for example in file, or send as chat message in your messenger application. The conversion routines do exist for such needs.

Note: There exists another typedef: WCHAR, which is equivalent to wchar_t.


The TCHAR macro is for a single character. You can definitely declare an array of TCHAR. What if you would like to express a character-pointer, or a const-character-pointer - Which one of the following?

// ANSI characters 
foo_ansi(char*); 
foo_ansi(const char*); 
/*const*/ char* pString; 

// Unicode/wide-string 
foo_uni(WCHAR*); 
wchar_t* foo_uni(const WCHAR*); 
/*const*/ WCHAR* pString; 

// Independent 
foo_char(TCHAR*); 
foo_char(const TCHAR*); 
/*const*/ TCHAR* pString;

After reading about TCHAR stuff, you would definitely select the last one as your choice. There are better alternatives available to represent strings. For that, you just need to include Windows.h. Note: If your project implicitly or explicitly includes Windows.h, you need not include TCHAR.H

First, revisit old string functions for better understanding. You know strlen:

size_t strlen(const char*);

Which may be represented as:

size_t strlen(LPCSTR);

Where symbol LPCSTR is typedef'ed as:

// Simplified
typedef const char* LPCSTR;  

The meaning goes like:

  • LP - Long Pointer
  • C - Constant
  • STR - String

Essentially, LPCSTR would mean (Long) Pointer to a Constant String.

Let's represent strcpy using new style type-names:

LPSTR strcpy(LPSTR szTarget, LPCSTR szSource);

The type of szTarget is LPSTR, without C in the type-name. It is defined as:

typedef char* LPSTR;

Note that the szSource is LPCSTR, since strcpy function will not modify the source buffer, hence the const attribute. The return type is non-constant-string: LPSTR.

Alright, these str-functions are for ANSI string manipulation. But we want routines for 2-byte Unicode strings. For the same, the equivalent wide-character str-functions are provided. For example, to calculate length of wide-character (Unicode string), you would use wcslen:

size_t nLength;
nLength = wcslen(L"Unicode");

The prototype of wcslen is:

size_t wcslen(const wchar_t* szString); // Or WCHAR*

And that can be represented as:

size_t wcslen(LPCWSTR szString);

Where the symbol LPCWSTR is defined as:

typedef const WCHAR* LPCWSTR;
// const wchar_t*

Which can be broken down as:

  • LP - Pointer
  • C - Constant
  • WSTR - Wide character String

Similarly, strcpy equivalent is wcscpy, for Unicode strings:

wchar_t* wcscpy(wchar_t* szTarget, const wchar_t* szSource)

Which can be represented as:

LPWSTR wcscpy(LPWSTR szTarget, LPWCSTR szSource);

Where the target is non-constant wide-string (LPWSTR), and source is constant-wide-string.

There exist set of equivalent wcs-functions for str-functions. The str-functions would be used for plain ANSI strings, and wcs-functions would be used for Unicode strings.

Though, I already advised to use Unicode native functions, instead of ANSI-only or TCHAR-synthesized functions. The reason was simple - your application must only be Unicode, and you should not even care about code portability for ANSI builds. But for the sake of completeness, I am mentioning these generic mappings.

To calculate length of string, you may use _tcslen function (a macro). In general, it is prototyped as:

size_t _tcslen(const TCHAR* szString); 

Or, as:

size_t _tcslen(LPCTSTR szString);

Where the type-name LPCTSTR can be classified as:

  • LP - Pointer
  • C - Constant
  • T = TCHAR
  • STR = String

Depending on the project settings, LPCTSTR would be mapped to either LPCSTR (ANSI) or LPCWSTR (Unicode).

Note: strlen, wcslen or _tcslen will return number of characters in string, not the number of bytes.

The generalized string-copy routine _tcscpy is defined as:

size_t _tcscpy(TCHAR* pTarget, const TCHAR* pSource);

Or, in more generalized form, as:

size_t _tcscpy(LPTSTR pTarget, LPCTSTR pSource);

You can deduce the meaning of LPTSTR!

Usage Examples

First, a broken code:

int main()
{
    TCHAR name[] = "Saturn";
    int nLen; // Or size_t

    lLen = strlen(name);
}

On ANSI build, this code will successfully compile since TCHAR would be char, and hence name would be an array of char. Calling strlen against name variable would also work flawlessly.

Alright. Let's compile the same with with UNICODE/_UNICODE defined (i.e. "Use Unicode Character Set" in project settings). Now, the compiler would report set of errors:

  • error C2440: 'initializing' : cannot convert from 'const char [7]' to 'TCHAR []'
  • error C2664: 'strlen' : cannot convert parameter 1 from 'TCHAR []' to 'const char *'

And the programmers would start committing mistakes by correcting it this way (first error):

TCHAR name[] = (TCHAR*)"Saturn";

Which will not pacify the compiler, since the conversion is not possible from TCHAR* to TCHAR[7]. The same error would also come when native ANSI string is passed to a Unicode function:

nLen = wcslen("Saturn");
// ERROR: cannot convert parameter 1 from 'const char [7]' to 'const wchar_t *'

Unfortunately (or fortunately), this error can be incorrectly corrected by simple C-style typecast:

nLen = wcslen((const wchar_t*)"Saturn");

And you'd think you've attained one more experience level in pointers! You are wrong - the code would give incorrect result, and in most cases would simply cause Access Violation. Typecasting this way is like passing a float variable where a structure of 80 bytes is expected (logically).

The string "Saturn" is sequence of 7 bytes:

'S' (83) 'a' (97) 't' (116) 'u' (117) 'r' (114) 'n' (110) '\0' (0)

But when you pass same set of bytes to wcslen, it treats each 2-byte as a single character. Therefore first two bytes [97, 83] would be treated as one character having value: 24915 (97<<8 | 83). It is Unicode character: ?. And the next character is represented by [117, 116] and so on.

For sure, you didn't pass those set of Chinese characters, but improper typecasting has done it! Therefore it is very essential to know that type-casting will not work! So, for the first line of initialization, you must do:

TCHAR name[] = _T("Saturn");

Which would translate to 7-bytes or 14-bytes, depending on compilation. The call to wcslen should be:

wcslen(L"Saturn");

In the sample program code given above, I used strlen, which causes error when building in Unicode. The non-working solution is C-sytle typecast:

lLen = strlen ((const char*)name);

On Unicode build, name would be of 14-bytes (7 Unicode characters, including null). Since string "Saturn" contains only English letters, which can be represented using original ASCII, the Unicode letter 'S' would be represented as [83, 0]. Other ASCII characters would be represented with a zero next to them. Note that 'S' is now represented as 2-byte value 83. The end of string would be represented by two bytes having value 0.

So, when you pass such string to strlen, the first character (i.e. first byte) would be correct ('S' in case of "Saturn"). But the second character/byte would indicate end of string. Therefore, strlen would return incorrect value 1 as the length of string.

As you know, Unicode string may contain non-English characters, the result of strlen would be more undefined.

In short, typecasting will not work. You either need to represent strings in correct form itself, or use ANSI to Unicode, and vice-versa, routines for conversions.

(There is more to add from this location, stay tuned!)


Now, I hope you understand the following signatures:

BOOL SetCurrentDirectory( LPCTSTR lpPathName );
DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);

Continuing. You must have seen some functions/methods asking you to pass number of characters, or returning the number of characters. Well, like GetCurrentDirectory, you need to pass number of characters, and not number of bytes. For example:

TCHAR sCurrentDir[255];
 
// Pass 255 and not 255*2 
GetCurrentDirectory(sCurrentDir, 255);

On the other side, if you need to allocate number or characters, you must allocate proper number of bytes. In C++, you can simply use new:

LPTSTR pBuffer; // TCHAR* 

pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.

But if you use memory allocation functions like malloc, LocalAlloc, GlobalAlloc, etc; you must specify the number of bytes!

pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );

Typecasting the return value is required, as you know. The expression in malloc's argument ensures that it allocates desired number of bytes - and makes up room for desired number of characters.


결론 : 영어는 너무 어렵다..

가아니고. TCHAR을 이용하면 유니코드,멀티바이트 상관없이 쓸수 있다는것. 많은 설명을 봤지만, 요기나온 개념이 가장 이해하기 쉬웠던것 같습니다.

영어로된 설명을 많이 봐버릇 해야겠습니다. 외국사이트에 좋은게 많네요


Posted by JinFluenza

http://blog.naver.com/wondo21c/30043174174

'프로그래밍 > Windosw API 개념' 카테고리의 다른 글

Windows API 텍스트 저장 및 출력 방법 (펌)  (0) 2016.04.03
(펌) wcs _tcs 비교  (0) 2016.03.01
Posted by JinFluenza