[ad_1]
Hello there .
First issues first , watch the next video :
Good , now , that is my first openCL program so there could also be points in terminology and so on however the purpose is to have the best instance attainable , not solely as a result of its useful but additionally as a result of that is all i can do for now 😇.
Let’s go
As you noticed within the video we have to create a context with a tool and to that context we’ll feed our perform (kernel)
If you already know C you might be in luck , i do not preserve that in thoughts.
So , let’s begin with the fundamentals , setup the inspiration and when that works go for the calculations too.
Observe , some tutorials could provide the sense you may assign a number of units to at least one context , however its really one gadget per context.
I believe you may have a number of concurrent contexts nevertheless ,(one per gadget) -> as i deduce , not examined ,not acquainted.
Very first thing we do is create a context , we discover that if the creation is profitable the log receives the title of the gadget that was assigned:
#property copyright "Lorentzos Roussos" #property hyperlink "https://www.mql5.com/en/customers/lorio" #property model "1.00" #embrace <OpenCLOpenCL.mqh> bool busy=false,loaded=false; int ctx=INVALID_HANDLE; int OnInit() { ctx=INVALID_HANDLE; busy=false; loaded=false; EventSetMillisecondTimer(44); return(INIT_SUCCEEDED); } void OnTimer(){ if(!busy){ busy=true; if(!loaded){ EventKillTimer(); ctx=CLContextCreate(CL_USE_GPU_DOUBLE_ONLY); if(ctx!=INVALID_HANDLE){ Print("CL.Context Created"); } } busy=false; } } void OnDeinit(const int motive) { if(ctx!=INVALID_HANDLE){CLContextFree(ctx);} } void OnTick() { }
That is the construction of the code we’ll run on the whole , on deinit we unencumber the context if it have been legitimate.
Easy stuff to this point.
Additionally word there’s 2 documentation pages for OpenCL , one in the usual library and one for the native instructions help
I believe the usual library is the outdated one , unsure , not clear, not clarified anyway , not that something is on this web site however let’s work with the native and if one thing is lacking we all know the place to look.
So , the native features have this command known as ProgramCreate that receives a string supply code written in “OpenCL C” (The openCl language)
The KernelCreate command nevertheless receives the deal with of a program and the title of the kernel so that means that we throw all our applications supply code within the string that we move within the creation of this system after which we declare the kernels .
For this take a look at we’ll solely have one “perform” (kernel) so it will not matter , for now.
So , a easy perform in OpenCL C , kay , and i do not know C , so how about we ship an array and a quantity and the “kernel” multiplies the values within the array by that quantity .
However as pressured within the video one of the best ways to squeeze efficiency out of parallel computations is that if we do not execute them in a linear style . Consider it as a loop with iterator i that’s not really accessing the index with the iteration variable i however with one other “pool” of indexes that’s accessible to all “compute items”.
Let’s give it some thought otherwise so it is smart . If you happen to wished to create parallel operations in mql5 , by yourself , you would wish a typical log of what “duties” are nonetheless accessible and as soon as one among your charts completed a activity they’d then decide the following one from that widespread pool of unfinished duties.
So , the “perform” can know which index it has within the pool when it’s executed with get_global_id(0) (1st dimension)
__kernel void biscuit(__global double *array, double by, int total_items){ int idx=get_global_id(0); if(idx>total_items){return;} array[idx]*=by; }
Okay , so , one other query that arises is is the get_global_id ranging from 0 or 1 ? , the instance in this system create of the docs signifies its from 1.We are going to see thought , a technique is to multiply the array by the index , yeah let’s do this .
So , do i’ve to typecast ?…hmmm
So 3 questions that want solutions :
- is the get_global_id(0) (and get_local_id(0) in that matter) ranging from 1?
- if the pool of “leftover” duties is the same as the quantity of duties we create why do i have to exit if the index goes above complete duties ?
is not working over the restrict unimaginable ? - do i have to typecast the int to multiply the array?
Let’s then change the perform to this and discover out :
__kernel void biscuit(__global double *array){ int idx=get_global_id(0); array[idx]*=idx; }
And let’s create this system with this , anticipating 3 errors from the openCL compiler right here.
Okay it says program created! superior
string biscuit_source_code="__kernel void biscuit(__global double *array){rn" "int idx=get_global_id(0);rn" "array[idx]*=idx;}rn"; string build_log=""; program_handle=CLProgramCreate(ctx,biscuit_source_code,build_log); if(program_handle!=INVALID_HANDLE){ Print("Program created!"); }else{ Alert("Errorsn"+build_log); }
Then i create the reminiscence i suppose
buffer_handle=CLBufferCreate(ctx,1000,CL_MEM_READ_WRITE);
i am utilizing numerous handles , i deal with handler (and unloader) may very well be created right here , however that is a take a look at.
Then the kernel , we’re sending this system deal with right here so the kernel title have to be the identical because the one within the supply code we despatched.
The docs state “The title of the kernel that execution begins from” so any “sub-functions” do not should be “kerneled” ? that query 4 i assume.
Okay , no errors to this point , they are going to pop up on execution most likely .
That is what i’ve obtained until now
#property copyright "Lorentzos Roussos" #property hyperlink "https://www.mql5.com/en/customers/lorio" #property model "1.00" bool busy=false,loaded=false; int ctx=INVALID_HANDLE; int program_handle,kernel_handle,buffer_handle; int OnInit() { ctx=INVALID_HANDLE; program_handle=INVALID_HANDLE; kernel_handle=INVALID_HANDLE; buffer_handle=INVALID_HANDLE; busy=false; loaded=false; EventSetMillisecondTimer(44); return(INIT_SUCCEEDED); } void OnTimer(){ if(!busy){ busy=true; if(!loaded){ EventKillTimer(); ResetLastError(); ctx=CLContextCreate(CL_USE_GPU_DOUBLE_ONLY); if(ctx!=INVALID_HANDLE){ ResetLastError(); Print("CL.Context Created"); string biscuit_source_code="__kernel void biscuit(__global double *array){rn" "int idx=get_global_id(0);rn" "array[idx]*=idx;}rn"; string build_log=""; program_handle=CLProgramCreate(ctx,biscuit_source_code,build_log); if(program_handle!=INVALID_HANDLE){ ResetLastError(); Print("Program created!"); buffer_handle=CLBufferCreate(ctx,1000,CL_MEM_READ_WRITE); if(buffer_handle!=INVALID_HANDLE){ ResetLastError(); Print("buffer created"); kernel_handle=CLKernelCreate(program_handle,"biscuit"); if(kernel_handle!=INVALID_HANDLE){ ResetLastError(); Print("Kernel created"); }else{Print("Can't create kernel #"+IntegerToString(GetLastError()));} }else{Print("Can't create buffer #"+IntegerToString(GetLastError()));} }else{Alert("Errors #"+IntegerToString(GetLastError())+"n"+build_log);} }else{Print("Can't create CL.context #"+IntegerToString(GetLastError()));} } busy=false; } } void OnDeinit(const int motive) { if(kernel_handle!=INVALID_HANDLE){CLKernelFree(kernel_handle);} if(buffer_handle!=INVALID_HANDLE){CLBufferFree(buffer_handle);} if(program_handle!=INVALID_HANDLE){CLProgramFree(program_handle);} if(ctx!=INVALID_HANDLE){CLContextFree(ctx);} } void OnTick() { }
Now , i’ve to declare the arguments for the kernel
There are 3 variants right here :
- CLSetKernelArg
- CLSetKernelArgMem
- CLSetKernelArgMemLocal
So , the primary one -i assume- is for passing constants , like if we despatched a a number of it would be with this
the second is for the worldwide reminiscence and the third one for the native reminiscence ,the native reminiscence one receives an argument in dimension and never a buffer deal with so it allocates reminiscence within the gadget regionally within the CUs .
fifth query is the place is the fixed reminiscence or is it dealt with internally? most likely
So right here i’ve a worldwide array so i will use the CLSetKernelArgMem for the first argument
Okay
if(CLSetKernelArgMem(kernel_handle,0,buffer_handle)){ ResetLastError(); Print("Reminiscence arg assigned to kernel"); }else{Print("Can't assign reminiscence arg#"+IntegerToString(GetLastError()));}
Now what ? i need to fill the reminiscence , i am sending the array down , that is useful.
However wait i’ve no array , rattling it . We’re testing the index on the identical time (the get_global_id(0)) so let’s create a mock array with the worth of 1.0 for all emelents .
All proper i hit the primary error right here , lastly 😂 it says error 5110
Print("Reminiscence arg assigned to kernel"); double arr[]; ArrayResize(arr,1000,0); ArrayFill(arr,0,1000,1.0); uint stuffed=CLBufferWrite(buffer_handle,arr,0,0,1000); if(stuffed==1000){ Print("Crammed "+IntegerToString(stuffed)+"objects in buffer"); }else{Print("Can't fill buffer #"+IntegerToString(GetLastError()));}
What error is that allow’s see ” ERR_OPENCL_WRONG_BUFFER_SIZE” incorrect buffer dimension , however why ?
Okay , the buffer dimension when creating the buffer refers to bytes not objects ! good to know ,it was within the docs to be truthful . my dangerous.
So , what now ? execute ?
Sure , okay so that is asynchronous by default i assume , that is query 6 so , i name the execution then set the timer up once more and querry the standing of the kernel execution .
Let’s go along with the default execute variant ,i am not seeing a blocking flag (like within the vids) so it have to be async by default.
right here it’s :
void OnTimer(){ if(!busy){ busy=true; if(!loaded){ EventKillTimer(); ResetLastError(); ctx=CLContextCreate(CL_USE_GPU_DOUBLE_ONLY); if(ctx!=INVALID_HANDLE){ ResetLastError(); Print("CL.Context Created"); string biscuit_source_code="__kernel void biscuit(__global double *array){rn" "int idx=get_global_id(0);rn" "array[idx]*=idx;}rn"; string build_log=""; program_handle=CLProgramCreate(ctx,biscuit_source_code,build_log); if(program_handle!=INVALID_HANDLE){ ResetLastError(); Print("Program created!"); buffer_handle=CLBufferCreate(ctx,1000*8,CL_MEM_READ_WRITE); if(buffer_handle!=INVALID_HANDLE){ ResetLastError(); Print("buffer created"); kernel_handle=CLKernelCreate(program_handle,"biscuit"); if(kernel_handle!=INVALID_HANDLE){ ResetLastError(); Print("Kernel created"); if(CLSetKernelArgMem(kernel_handle,0,buffer_handle)){ ResetLastError(); Print("Reminiscence arg assigned to kernel"); double arr[]; ArrayResize(arr,1000,0); ArrayFill(arr,0,1000,1.0); uint stuffed=CLBufferWrite(buffer_handle,arr,0,0,1000); if(stuffed==1000){ ResetLastError(); Print("Crammed "+IntegerToString(stuffed)+"objects in buffer"); if(CLExecute(kernel_handle)){ Print("Executing"); EventSetMillisecondTimer(44); loaded=true; }else{Print("Can't execute kernel #"+IntegerToString(GetLastError()));} }else{Print("Can't fill buffer #"+IntegerToString(GetLastError()));} }else{Print("Can't assign reminiscence arg#"+IntegerToString(GetLastError()));} }else{Print("Can't create kernel #"+IntegerToString(GetLastError()));} }else{Print("Can't create buffer #"+IntegerToString(GetLastError()));} }else{Alert("Errors #"+IntegerToString(GetLastError())+"n"+build_log);} }else{Print("Can't create CL.context #"+IntegerToString(GetLastError()));} } else if(loaded){ ENUM_OPENCL_EXECUTION_STATUS standing=(ENUM_OPENCL_EXECUTION_STATUS)CLExecutionStatus(kernel_handle); Remark("Kernel("+IntegerToString(kernel_handle)+" Standing("+EnumToString(standing)+")"); } busy=false; } }
That -obviously- completed very quick , however what we wish is to take a peep into the array .
So if it accomplished , learn , print and go dash (exit) 🤓
if(standing==CL_COMPLETE){ double get[]; ArrayResize(get,1000,0); ArrayFill(get,0,1000,0.0); ResetLastError(); if(CLBufferRead(buffer_handle,get,0,0,1000)){ string msg=""; for(int i=0;i<10;i++){ msg+=DoubleToString(get[i],2)+"n"; } Alert(msg); }else{Print("Can't learn buffer #"+IntegerToString(GetLastError()));} Print("Exit"); ExpertRemove(); }
aaand heres what we obtained again , the primary factor is 0 , which suggests the get_global_id(0) begins from 0 ? however the remaining are 1.00
Now i need to discover out if i have to typecast earlier than i multiply , however let’s change the multiplication line actual fast to this , i’ve a hunch
array[idx]=array[idx]*idx;
nope , so i will create a second buffer , an int and we’ll fill it up with the index values to unravel this.
so what will we do :
- alter the supply code string
- create a buffer
- add the arg buffer , write solely this time (i assume these enums are from the units aspect not ours)
- learn the brand new int buffer and if we see 0,0,0,0,0,0 we panik
Additionally , word we didn’t get any indication of an error and we aren’t checking for the restrict of the array , which suggests i didn’t perceive one thing , its not broadly and immediately evident or it doesn’t matter in the way in which the work teams are relayed to the gadget.So many unknowns.
Anyway.additionally i hate this construction so it is going to look even uglier with the addition of the 2nd buffer , however its a take a look at.
right here is the up to date code :
void OnTimer(){ if(!busy){ busy=true; if(!loaded){ EventKillTimer(); ResetLastError(); ctx=CLContextCreate(CL_USE_GPU_DOUBLE_ONLY); if(ctx!=INVALID_HANDLE){ ResetLastError(); Print("CL.Context Created"); string biscuit_source_code="__kernel void biscuit(__global double *array,__global int *idx_array){rn" "int idx=get_global_id(0);rn" "idx_array[idx]=idx;rn" "array[idx]=array[idx]*idx;}rn"; string build_log=""; program_handle=CLProgramCreate(ctx,biscuit_source_code,build_log); if(program_handle!=INVALID_HANDLE){ ResetLastError(); Print("Program created!"); buffer_handle=CLBufferCreate(ctx,1000*8,CL_MEM_READ_WRITE); buffer_handle2=CLBufferCreate(ctx,1000*4,CL_MEM_WRITE_ONLY); if(buffer_handle!=INVALID_HANDLE&&buffer_handle2!=INVALID_HANDLE){ ResetLastError(); Print("buffer created"); kernel_handle=CLKernelCreate(program_handle,"biscuit"); if(kernel_handle!=INVALID_HANDLE){ ResetLastError(); Print("Kernel created"); if(CLSetKernelArgMem(kernel_handle,0,buffer_handle)&&CLSetKernelArgMem(kernel_handle,1,buffer_handle2)){ ResetLastError(); Print("Reminiscence arg assigned to kernel"); double arr[]; ArrayResize(arr,1000,0); ArrayFill(arr,0,1000,1.0); uint stuffed=CLBufferWrite(buffer_handle,arr,0,0,1000); if(stuffed==1000){ ResetLastError(); Print("Crammed "+IntegerToString(stuffed)+"objects in buffer"); if(CLExecute(kernel_handle)){ Print("Executing"); EventSetMillisecondTimer(44); loaded=true; }else{Print("Can't execute kernel #"+IntegerToString(GetLastError()));} }else{Print("Can't fill buffer #"+IntegerToString(GetLastError()));} }else{Print("Can't assign reminiscence arg#"+IntegerToString(GetLastError()));} }else{Print("Can't create kernel #"+IntegerToString(GetLastError()));} }else{Print("Can't create buffer #"+IntegerToString(GetLastError()));} }else{Alert("Errors #"+IntegerToString(GetLastError())+"n"+build_log);} }else{Print("Can't create CL.context #"+IntegerToString(GetLastError()));} } else if(loaded){ ENUM_OPENCL_EXECUTION_STATUS standing=(ENUM_OPENCL_EXECUTION_STATUS)CLExecutionStatus(kernel_handle); Remark("Kernel("+IntegerToString(kernel_handle)+" Standing("+EnumToString(standing)+")"); if(standing==CL_COMPLETE){ double get[]; ArrayResize(get,1000,0); ArrayFill(get,0,1000,0.0); int get_idx[]; ArrayResize(get_idx,1000,0); ArrayFill(get_idx,0,1000,-1); ResetLastError(); if(CLBufferRead(buffer_handle,get,0,0,1000)&&CLBufferRead(buffer_handle2,get_idx,0,0,1000)){ string msg=""; for(int i=0;i<10;i++){ msg+=DoubleToString(get[i],2)+"(idx:"+IntegerToString(get_idx[i])+")n"; } Alert(msg); }else{Print("Can't learn buffer #"+IntegerToString(GetLastError()));} Print("Exit"); ExpertRemove(); } } busy=false; } }
And i am getting 0 on all of the index values … hmmm . which begs the query , the primary array worth is multiplied then the remaining are usually not ?
So does it solely execute the primary one ?
Okay , let’s add the restrict verify actual fast .
No not it.
Okay what if the CLExecute executes solely as soon as ?
Hmm , okay , so what if i create a counter of my very own and preserve pumping it unti it finishes that may take 44 seconds although (44ms*1000) so i will cut back the objects to 100 .
However how is that parallel ? wtf . That is the issue with the mql5 docs , the individual that understands the code writes up the docs and they’re bored or the individual that writes the code and the docs is totally different . On the peak second you perceive one thing totally that’s when it must be extensively defined to us peasants . Why? as a result of it grows your ecosystem quicker ! Think about if 10000 coders learn the docs for this , if a much bigger % grasps it in much less time then they are going to create extra stuff earlier . Extra stuff will entice extra exercise and so on. We won’t be guessing what their thought course of was after they have been deploying these instruments everytime… i imply…anyway.
Rant over , let’s examine at what they’ve shared anyway , i am seeing arrays there as an alternative of integers for the sizes too . okay
Yeah , okay so there needs to be further learning and crystal ball interplay to try to relate the global_work_offset[] and the global_work_size[] and the local_work_size[] arrays to the tutorial video above however i set the offset array with one factor (one dimension) to 0 and the work dimension array with one factor (dimension) to 1000 and it labored.
So , the get_global_id(0) begins from 0 , so their docs have a little bit error except i am lacking one thing else -that just isn’t documented wherever either-
Right here is the code
Your’s and mine first open cl , i am attaching it because it went over 64k

And right here is the second a part of the video above going in additional element on OpenCL C
Questions left :
- if the pool of “leftover” duties is the same as the quantity of duties we create why do i have to exit if the index goes above complete duties ?
is not working over the restrict unimaginable ? - do i have to typecast the int to multiply the array?
- the place is the fixed reminiscence features ?
- do sub features should be created as kernels ?
Second video .
I typed this out as i believed it by , so , i hope it is useful.
[ad_2]